[HN Gopher] Git's list of banned C functions
___________________________________________________________________
Git's list of banned C functions
Author : muds
Score : 320 points
Date : 2021-03-04 20:33 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| moomin wrote:
| They should probably add sscanf.
| ed25519FUUU wrote:
| First thing I looked for. It looks like it _was_ used here:
|
| https://github.com/git/git/blob/master/object-file.c#L1293
|
| And currently used here (at least):
|
| https://github.com/git/git/blob/master/refs.c#L1235
| TheRealSteel wrote:
| I'm an idiot, I read the headline and thought these were banned
| from Git entirely. As in, you couldn't commit them to _any_ repo
| using Git, at all. Thought that seemed a bit harsh.
|
| Turns out you just can't use them when you contribute code to the
| Git project. That makes sense, and seems reasonable.
| [deleted]
| maxk42 wrote:
| What would be helpful is an explanation of how each function ends
| up being misused so people can learn from this.
| petters wrote:
| Git blame is helpful here. See e.g.https://github.com/git/git/c
| ommit/1b11b64b815db62f93a04242e4...
| jsmith45 wrote:
| View the git history for the file. Each commit that adds
| functions has a detailed explanation of what is wrong with the
| functions.
| zbendefy wrote:
| Are there some details on whats wrong with these?
| bvaldivielso wrote:
| The commit messages that added them explain the reasoning
| ufo wrote:
| I wish they would have put that on comments instead of on the
| commit messages. It's not the first time that I've seen this
| particular list of banned functions being shared online and
| every time it happens someone has to explain that the most
| interesting info is hidden in the commit messages.
| alexchamberlain wrote:
| All the string functions have buffer overrun vulnerabilities if
| not used carefully. I'm not sure about the time functions
| though.
| trilinearnz wrote:
| Very much this. I frequently write small games in C, and the
| number of times I have been bitten by baffling behaviour
| because a string somewhere was copied into an array that was
| too short, are many! Apart from that, I love the simplicity
| of the language and the stdlib, and it's definitely my
| preferred hobby programming environment.
|
| It would be good to know what the commonly-accepted
| alternatives are.
| edflsafoiewq wrote:
| The time functions are either non-reentrant, or, for the _r
| versions, have the same problem with buffer overruns.
|
| https://github.com/git/git/commit/1fbfdf556f2abc708183caca53.
| ..
|
| https://github.com/git/git/commit/91aef030152d121f6b4bc3b933.
| ..
| [deleted]
| csours wrote:
| I'm pretty sure you could google each of these with the word
| 'dangerous'
|
| For example: https://lgtm.com/rules/2154840805/
| whydoyoucare wrote:
| I am so thankful git isn't forcefully including this header in
| every C language project and that we have a choice when using
| git! :-)
| bvaldivielso wrote:
| Ah this is a very good idea. I guess you still have to make sure
| that all your translation units include this header, which isn't
| completely foolproof.
|
| Static analysis would probably be more robust, but way more
| involved.
| radus wrote:
| Best of both worlds: use static analysis to ensure the header
| is included?
| koenigdavidmj wrote:
| gcc has a -include option, so this can be done once in the
| Makefile and get the benefit everywhere (unless you're being
| clever).
| Athos_vk wrote:
| I remember visual studio having an option to force include a
| file, surely something like that would exist for other
| toolchains
| kccqzy wrote:
| You don't need fancy static analysis. You can find out whether
| the banned functions are called just by inspecting the compiled
| object file. Add it to the build step and done.
| EdSchouten wrote:
| Funnily enough, strtok() is not listed :)
| kgrimes2 wrote:
| Can a C guru provide a TL;DR of why these are bad?
| drfuchs wrote:
| It would be nice if the error messages generated would suggest
| replacement functions that they deem appropriate. I see that I'm
| not supposed to use gmtime, localtime, ctime, ctime_r, asctime,
| and asctime_r; but what do they think I _should_ use?
| dev_tty01 wrote:
| It would be even nicer if it redefined the call to a safe
| version and then generated a warning message informing the
| programmer of the substitution.
| pjc50 wrote:
| You can't do that because the semantics are different in most
| cases.
| [deleted]
| cle wrote:
| From the commit messages
|
| > The ctime_r() and asctime_r() functions are reentrant, but
| have no check that the buffer we pass in is long enough (the
| manpage says it "should have room for at least 26 bytes").
| Since this is such an easy-to-get-wrong interface, and since we
| have the much safer strftime() as well as its more convenient
| strbuf_addftime() wrapper, let's ban both of those.
|
| (https://github.com/git/git/commit/91aef030152d121f6b4bc3b933..
| .)
|
| > The traditional gmtime(), localtime(), ctime(), and asctime()
| functions return pointers to shared storage. This means they're
| not thread-safe, and they also run the risk of somebody holding
| onto the result across multiple calls (where each call
| invalidates the previous result). All callers should be using
| their reentrant counterparts.
|
| (https://github.com/git/git/commit/1fbfdf556f2abc708183caca53..
| .)
| tinus_hn wrote:
| Strangely there is no mention of strtok which has a similar
| issue.
| drfuchs wrote:
| Yes, but every hapless user shouldn't have to go searching
| through a bunch of commit messages to find the suggested
| replacement. Bad UX.
| capableweb wrote:
| The UX of using this list is not by manually searching
| through the list and seeing the reason behind them. You
| include the file together with the rest of your sources and
| now you get compilation errors if you try to use them.
| Can't think of a better UX for banned functions.
|
| Discovering why the thing is banned you only have to do
| once, if you care. If you're just modifying something
| quickly and minor in Git, you might not even care why.
| grncdr wrote:
| It seems pretty safe to assume a developer contributing C
| code to _git itself_ would know how to use git blame (or
| the GitHub interface for it).
| orf wrote:
| Why make it harder, and why make it impossible to update
| if there are other suggested alternatives that are
| available since whenever the commit was made?
| masklinn wrote:
| > Why make it harder
|
| Because there is no way for a commit message to become
| outdated or detached from what it talks about, both of
| which are very much issues with comments.
|
| > why make it impossible to update if there are other
| suggested alternatives that are available since whenever
| the commit was made?
|
| Because that doesn't really matter.
| cma wrote:
| > Because there is no way for a commit message to become
| outdated or detached from what it talks about, both of
| which are very much issues with comments.
|
| What if they think of another reason why one of the same
| functions should be disabled?
| underwater wrote:
| Code is evergreen, whereas a git commit represents a
| change at a single point in time. It will always be
| limited by the knowledge the author had available to
| them.
|
| The commit message from 2020 with suggested alternatives
| might very well go stale. Does the author go and force a
| noop commit so they can document new best practice in a
| new commit message?
| orf wrote:
| > Because that doesn't really matter.
|
| Ok, so maybe rather than have this file we should run
| "git log | grep BANNED" and build a list of functions
| from that? Or maybe we could change all error messages to
| be "go look at the commit history to work out why this
| happened".
|
| No? Maybe putting context in source files (or better yet,
| an error message!) rather than in a side channel like the
| commit message has value when it comes to understanding
| and updating, and it won't be lost under the weight of
| future commits.
| [deleted]
| capableweb wrote:
| Your source code should describe what the program should
| do today. It should not contain all historical artifacts
| about your source code, as it'll grow to big and
| unmanageable then. Instead, use Git to store temporal
| information, data that is about change and reasoning
| behind it. Git is basically a timeline, instead of hard
| facts of today.
|
| That's why it makes sense to describe the background and
| reasoning behind a change in a Git commit, instead of
| inside your source files as comments.
| orf wrote:
| Totally agree, which is why nobody is suggesting adding
| the background and reasoning behind the change to the
| source file as a comment.
|
| They are suggesting adding a more informative error,
| which may include a subset of that background and
| reasoning. An error message that points you to the
| functions you should use instead is infinitely more
| informative than one that says "this is banned. Bye."
| jorl17 wrote:
| I find it highly backwards that documentation on "what to
| use instead of X" is in the commit message disabling X.
| One _might_ do it and might remember to do it, but IMO it
| makes absolutely no sense for this not to be documented
| properly in code, as suggested by OP.
|
| By that logic, a non-insignificant amount of (good)
| comments in code could be removed and people asked to
| "git blame the code and check out the commit that made it
| for the documentation". Of course this could be done, but
| it sounds ridiculous even typing it out.
| blitz_skull wrote:
| I disagree. Commits messages exist for the very purpose
| of adding context to your code base. If you added
| <complex_function> for something that needs context, sure
| MAYBE add a comment, but I really pray that I'm going to
| find a few paragraphs disambiguating the problem within a
| git commit. If I'm _really_ lucky, maybe I find a PR
| number or Jira ticket reference as well.
|
| If you're truly clueless as to what could be substituted
| for these commands, then you don't understand why they're
| banned. So our first step? Figure out why they're banned.
| And how would we sanely approach this? Probably by
| checking the commit message for _why that code is there
| in the first place_. That's a very safe, sane, and not-
| at-all backwards assumption. After you understand why
| it's there, a quick google search might help out if the
| commit message didn't already include information on
| alternatives.
|
| Lastly, yeah, I totally agree a large amount of GOOD
| comments should be relegated to the git commits if all
| they're doing is adding additional context around a
| complex piece of logic. Comments do not exist to edifying
| a code base in any way other than context. They're too
| easy to let become stale, whereas a git commit will
| always reference exactly the code you're blaming.
|
| So, I have to really disagree that it's ridiculous or in
| any way absurd. In fact, I think a lot of code suffers
| from NOT using git as a way to extend context around a
| code base. It's SUPER easy with most development
| environments to select a block of text and blame it. It's
| so easy that it's almost always my go-to to increase my
| context of what's been happening around a particular part
| of the code base.
| barnaclejive wrote:
| So, you are tied to Git for eternity to preserve
| documentation?
|
| Might work in practice for a long time, but Git is a
| version control system, not a documentation system.
| dpedu wrote:
| For developer documentation - yes, absolutely!
| capableweb wrote:
| > By that logic, a non-insignificant amount of (good)
| comments in code could be removed and people asked to
| "git blame the code and check out the commit that made it
| for the documentation". Of course this could be done, but
| it sounds ridiculous even typing it out.
|
| Yes, exactly. You want to understand how a codebase
| changed and evolved over time? Git is your friend. If you
| want the facts of the code today? The source code is your
| friend. That's why the way Linux and Gits Git repository
| method of storing history makes sense. See also
| https://news.ycombinator.com/item?id=26348965
|
| Try navigating the Git codebase with a git-blame sidebar
| (probably VS Code has that somewhere) so you can see the
| history of the source files. If you wonder why something
| is what it is, you can checkout the commit that last
| modified it. Or go even further backwards and figure out
| in the context it was first added. If you truly want to
| understand a change, a git repository with well written
| git messages is a pleasure to understand and dig into.
| [deleted]
| chris_wot wrote:
| The commits actually do give that info. Take for instance this
| commit:
|
| https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...
|
| It actually gives examples and a lengthy explanation and
| reasoning behind the ban.
| xorcist wrote:
| Now _that 's_ what a good commit message looks like!
| cesarb wrote:
| Commit messages like that are common in the Linux kernel
| project, which is where git came from (though this
| particular commit message is a bit on the longer side).
|
| It makes more sense if you think of it as an email message
| justifying why the project maintainer should accept that
| change, because that's what they were before git even
| existed. Still today, unless you're one of the Linux kernel
| subsystem maintainers, you have to convert your changes to
| emails with git-format-patch/git-send-email and send them
| to the right mailing list. Even the Linux kernel subsystem
| maintainers keep writing commits in that style out of habit
| (and because Linus will rant at them if they don't).
| mamon wrote:
| But why put that info in commit message instead of a comment
| in the file itself?
| chris_wot wrote:
| Because comments can be tedious and get out of sync with
| the repo. Why not check the git history? I wish more repos
| could be like this!
| adrianmonk wrote:
| > _Why not check the git history?_
|
| Because that is effort every person who uses the file has
| to do over and over again, whereas maintaining the file
| is effort that has to be done once by one person.
| skeletal88 wrote:
| Someone here commented to use git blame to find the
| commit that banned the functions and read the commits.
| These people making the suggestions.. must hate other
| people and their time. Also, what if someone.. for
| example runs a code formatter on the file, making git
| blame useless? Is it really so difficult to make a manual
| or explain properly in the comments about what
| replacements to use?
| chris_wot wrote:
| It sounds like you want a manual. Personal preference I
| guess. The maintainers seem to have decided to keep it in
| the history. It's not like this was ever meant for
| anything other than git itself.
| aendruk wrote:
| I really wish tooling like this was more common:
|
| https://github.com/eamodio/vscode-
| gitlens/tree/v11.2.1#curre... (screenshot)
|
| > Current Line Blame: Adds an unobtrusive, customizable,
| and themable, blame annotation at the end of the current
| line
| colordrops wrote:
| Or even in the compile error message itself.
| [deleted]
| colordrops wrote:
| Also, _why_ the functions are banned.
| lerax wrote:
| Yes, this is right. Any C decent programmer knows that functions
| are cursed.
| Animats wrote:
| About 20 years too late. Those should have been moved to a
| "deprecated" header file decades ago.
| xvilka wrote:
| I hope, one day to see it's rewritten in a safer language.
| qbasic_forever wrote:
| There's a nice Go implementation of git: https://github.com/go-
| git/go-git
| sys_64738 wrote:
| scanf?
| abetusk wrote:
| The Git Mailing List Archive on lore.kernel.org (found in the
| README from the git mirror on GitHub) has more context [0] [1]
| [2]. From Jeff King on 2018-07-24: The strncpy()
| function is less horrible than strcpy(), but is still
| pretty easy to misuse because of its funny termination
| semantics. Namely, that if it truncates it omits the NUL
| terminator, and you must remember to add it yourself. Even
| if you use it correctly, it's sometimes hard for a reader
| to verify this without hunting through the code. If you're
| thinking about using it, consider instead: -
| strlcpy() if you really just need a truncated but NUL-
| terminated string (we provide a compat version, so it's
| always available) - xsnprintf() if you're sure that
| what you're copying should fit - strbuf or
| xstrfmt() if you need to handle arbitrary-length heap-
| allocated strings
|
| I just did a search on the keywords 'banned' and 'strncpy' [2]
|
| [0]
| https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...
|
| [1]
| https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...
|
| [2]
| https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...
|
| [3] https://lore.kernel.org/git/?q=banned+strncpy
| js2 wrote:
| Psst:
|
| https://github.com/git/git/commits/master/banned.h
|
| (Git development is done by emailing patches. Those patches
| include the git commit message, which we can see just by
| looking at the history of the file. Sometimes there's
| additional discussion on the ML, but the most important details
| are in the commit message because the git development team is
| very disciplined about that.)
| captainmuon wrote:
| It would be interesting to see the rationale behind these bans,
| and what the suggested alternatives are. Some are obvious, like
| `strcpy`, but I can't remember what the problem with `sprintf` or
| the time functions are.
|
| If you are doing something like `sprintf(buffer, "%f, %f", a,
| b)`, yes it is tricky to choose the size of buffer frugally, but
| if you replace that by `ftoa` and constructing the string by
| hand, you are likely to introduce more bugs.
|
| Edit: as pointed out in another post, you can do git blame to see
| the rationale for each ban, quite interesing.
| monocasa wrote:
| snprintf will always terminate the string, and won't overflow
| the buffer.
| Aanok wrote:
| The trouble with printf-family functions is their variadic
| nature. If the arguments don't match the format string, you can
| wreak all sorts of havoc.
|
| A fun exercise you can do is put a "%s" in the format string,
| omit the string argument and see what happens to the stack.
| anyfoo wrote:
| That's however relatively easy to verify programmatically,
| and indeed any recent compiler will complain about that.
|
| I'd say the usual trap is rather the size of the target
| buffer, because that requires bigger static analysis guns.
| (I'm ignoring things like "%n", because then you're playing
| with fire already.)
| Gibbon1 wrote:
| I think the big three C compilers have pragma's that you
| can tag printf/scanf with that will cause the compiler to
| verify the argument list.
| danaliv wrote:
| There's that, but with sprintf/vsprintf specifically, there's
| no way to keep it from storing characters past the end of
| your buffer. For example: char buf[2];
| sprintf(buf, "%d", n);
|
| This will happily write to buf[2] and beyond if n is negative
| or greater than 9.
| SloopJon wrote:
| sprintf() warnings have gotten pretty sophisticated these days.
| I discovered GCC's -Wformat-overflow the other day. It
| complained that the buffer for a date string wasn't big enough;
| e.g., sprintf(buf, "%04d-%02u-%02u", year, month, day), where
| year, month, and day are 16-bit shorts, and buf was probably
| eleven or twelve bytes.
|
| It may actually be a bug that I got the warning, because the
| range of each input was checked, and I think the compiler is
| supposed to be smart enough to remember that.
| dahfizz wrote:
| This was my reaction as well. Banning strncpy just encourages
| haphazard manual copying.
| smasher164 wrote:
| From the commit message:
|
| If you're thinking about using it, consider instead:
| - strlcpy() if you really just need a truncated but
| NUL-terminated string (we provide a compat version, so
| it's always available) - xsnprintf() if you're
| sure that what you're copying should fit -
| strbuf or xstrfmt() if you need to handle arbitrary-
| length heap-allocated strings
| nwmcsween wrote:
| strlcpy is safer but effectively running strlen(src) every
| call is a good wtf
| azurezyq wrote:
| maybe this https://github.com/git/git/blob/master/strbuf.h ?
| ben_bai wrote:
| strlcpy is the safe way, that is used by git.
| [deleted]
| syncsynchalt wrote:
| strncpy doesn't do what you think it does (it is not
| analogous to strncat). strncpy does not terminate strings on
| overflow. In C terms, it is not actually a string function
| and shouldn't be named with `str`.
|
| snprintf or nul-plus-strncat do what you want, but snprintf
| has portability problems on overflow. Most projects I've been
| on rely on strlcpy (with a polyfill implementation where not
| available).
| asdfasgasdgasdg wrote:
| I think you're meant to use snprintf instead. It would be
| great to see documentation on the alternatives!
| sys_64738 wrote:
| getc?
| ape4 wrote:
| Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX)
|
| /joke
| fatnoah wrote:
| I'm pretty sure I've seen similar logic in my life.
| attractivechaos wrote:
| I wonder how they copy strings with strcpy and strncpy both
| banned. strlcpy? But it is not conforming to major standards. Or
| just memcpy with extra code?
| dgentile wrote:
| Edited: Looks like they have safe alternatives: "
| - strlcpy() if you really just need a truncated but
| NUL-terminated string (we provide a compat version, so
| it's always available) - xsnprintf() if you're sure
| that what you're copying should fit - strbuf
| or xstrfmt() if you need to handle arbitrary-length
| heap-allocated strings "
| lights0123 wrote:
| https://github.com/git/git/commit/e488b7aba743d23b830d239dcc...
| Yes:
|
| > we provide a compat version, so it's always available
| [deleted]
| attractivechaos wrote:
| This gets me interested. Link [1] below shows their
| implementation of strlcpy(). This is a questionable
| implementation. With strncpy, the source string "src" may not
| be NULL terminated IIRC. The git implementation requires
| "src" to be NULL terminated. If not, an invalid read. EDIT:
| according to the strlcpy manpage [2], "src" is required to be
| NULL terminated, so strlcpy imposes more restrictions and is
| not a proper replacement of strncpy.
|
| Furthermore, imagine "src" has 1Mb characters but we only
| want to copy the first 3 chars. The git implementation would
| traverse the entire 1Mb to find the length first, but a
| proper implementation only needs to look at the first 3
| chars. So, they banned strncpy and provided a worse solution
| to that.
|
| [1]: https://github.com/git/git/blob/master/compat/strlcpy.c
|
| [2]: https://linux.die.net/man/3/strlcpy
| alcover wrote:
| Agreed. It's O(n) inefficient. I guess looping though chars
| up to `size` would perform better on average.
|
| I see this `strlcpy` recommanded everywhere.
| kzrdude wrote:
| You have found the answer - strlcpy is not a replacement
| for strncpy at all (it's arguably a safer version of
| strcpy), and git people didn't invent this, it's the
| existing BSD strlcpy interface.
| attractivechaos wrote:
| Thanks for the confirmation. But my concern remains: they
| banned strncpy without a proper replacement. In addition,
| I didn't know the extra restriction of strlcpy until
| today (I have never used it before because it is not
| conforming to C99/POSIX). I might have fallen into this
| trap.
| notaplumber wrote:
| The problem is the actually often the opposite, in the
| real world many treat strncpy as if it behaves like
| strlcpy. Note that strlcpy is equivalent to:
| snprintf(buf, sizeof(buf), "%s", string);
|
| strlcpy is on track for future standardization in POSIX,
| for Issue 8, but even as a de facto standard, it exists
| in libc on *BSD, macOS, Android, Solaris, QNX, and even
| Linux using musl.
|
| https://www.austingroupbugs.net/view.php?id=986#c5050
|
| But you're correct in that it is not a replacement for
| strncpy because no code should be using strncpy.
| tedunangst wrote:
| Take a step back and consider strlcpy isn't supposed to be
| a drop in replacement for strncpy (a function which already
| exists).
| [deleted]
| jabl wrote:
| memccpy? Most platforms have it, and it's being added to C2X.
|
| See https://developers.redhat.com/blog/2019/08/12/efficient-
| stri...
| paultopia wrote:
| Its really wild, as a person coming from other languages who has
| written maybe ten lines of C in his life that the functions that
| seem to be massive footguns in C are, like, "format a string" or
| "get time in GMT." That's... really scary.
| Communitivity wrote:
| I remember an entire lecture about the use and abuse of sprintf
| and related functions as a means of exploit. Yeah, when you
| delve into the internals of C you find things that are
| terrifying if you are concerned about reliability, security, or
| performance. The same is true though for many languages. The
| problem is, as is often the case, the Iron Triangle: good,
| fast, cheap - pick two. Different sections of the language are
| written by developers under different constraints and
| pressures, which leads to different choices. In my experience
| every language implementation has at least one area that was
| done quickly for expediency or done poorly because no one else
| was able to (or wanted to) work on it.
| throwaway09223 wrote:
| Many of C's problems relate to string handling. These are all
| legacy functions which have been replaced with safe
| alternatives many decades ago.
|
| strcpy() was replaced with a safer strncpy() and in turn has
| been replaced with strlcpy().
|
| The list is a ban of the less safe versions, where more modern
| alternatives exist.
| Kaze404 wrote:
| Why are these functions deprecated in favor of others but not
| removed? I know in Javascript this can happen so as to not
| break older websites, but in a compiled language this
| shouldn't be a problem right?
| syncsynchalt wrote:
| There are actually very few _dangerous_ functions in C
| (gets is the only one that comes to mind). Others have
| massive caveats (strncpy) but still have their place.
| Others are just known to have certain gotchas (strcpy,
| strcat, sprintf).
|
| The reality of C is that if we deprecated every
| objectionable function in the stdlib we wouldn't have
| anything left.
| maxlybbert wrote:
| The C Standard Committee doesn't actually ship a compiler
| the way the people behind Java, Python, Lua, C#, Go, Rust,
| etc. do. The best they can do is deprecate particular
| functions and hope compiler writers and standard library
| writers follow along. But the compiler writers have vocal
| customers who insist the depreciations are overly-cautious.
| sudomakeup wrote:
| Why wouldn't it be an issue with a compiled language?
|
| Its nearly the exact same reasoning as "we're not going to
| break older websites"
| lalaithion wrote:
| The expectation of a C89 programmer is that a valid C89
| program can be compiled for any machine that has a C89
| compiler, and likewise for C95, C99, C11, and C17.
| Furthermore, it's expected that any C89 program can be
| compiled unchanged on any future version of C, and the
| standard library is part of the definition of the language,
| and therefore functions cannot be removed.
| DaiPlusPlus wrote:
| At a certain point we have to say that _it's wrong_ for
| someone to expect C89 should still be the LCD.
|
| And yes: it should all still compile, but none of that
| prohibits the compiler from issuing flashing red/yellow
| warning messages to your terminal for using footgun
| functions, preferably with uncomfortable audible
| notifications too.
|
| All of this is silly though, because even in a strict C89
| environment you can still have your own safe wrappers
| over the unsafe functions. I find that very little of
| modern programming has a hard dependency on ultramodern
| compiler features (e.g. you can theoretically build
| React/Redux using only ES3 (1998ish) if you like.
| Generics using type-erasure can be implemented with
| macros. Etc.).
|
| Also, C89 conformance doesn't mean much: you can have a
| confirming C89 system that doesn't even have a heap - nor
| a stack for autos! (IBM Z/series uses a linked-list for
| call-frames, crazy stuff!)
| pjc50 wrote:
| In a compiled language, when you remove a function it fails
| to compile. So removing them from the standard library
| _forces_ code changes - they 're not usually drop in
| replacements because the semantics were wrong in the first
| place.
|
| Removing strcpy would make the Python transition look easy.
| badsectoracula wrote:
| Removing anything breaks existing source code that has been
| tested to work. After all just because something _may_ lead
| to issues it doesn 't mean it will _always_ lead to issues.
|
| Also in many systems the C library is linked dynamically
| and shared among all programs so even though a program is
| compiled it still relies on the underlying system to
| provide the function.
|
| Finally i'm certain that if a C standard removes something,
| it'll be treated as the equivalent to that standard not
| existing. C programmers are already a conservative bunch
| without such changes.
| gvx wrote:
| It's not great if you're working on a new release and you
| realize you also need to change something unrelated because
| the language changed under you, especially if it's just a
| bugfix but a high-priority one, or consider the head-aches
| caused by source-only distributions suddenly breaking for
| all your new users (or existing users switching to a new
| computer or spinning up a fresh VM).
| ChrisLomont wrote:
| These still lead to lots of bugs via off by one errors on
| lengths or other buffer misuse.
| cestith wrote:
| Still, unless you're writing something that has to be very
| low-level all the way through, it's better to use a string-
| handling library than the stdlib tools for strings.
| stefan_ wrote:
| The first thing you do is _not use any strings_. You 'll be
| amazed how much you can get done in languages that aren't
| so obsessively centered around stringified programming.
| cestith wrote:
| Most of the code I write has a spec of input and output
| being some form of text. Still, I tend to write that in
| languages that have safe string handling and drop into C
| only when the profiler indicates that's useful.
|
| When handling strings in C, it's useful to use the string
| functions from glib or pull in one of the specifically
| safe string handling libraries and not use any C stdlib
| functions for strings at all.
|
| There are a number of C strings libraries safer to use
| than the standard library, and many of them are simpler,
| more feature-rich, or both.
|
| * https://github.com/intel/safestringlib (MIT licensed) *
| https://github.com/rurban/safeclib (MITish) *
| https://github.com/mpedrero/safeString (MIT licensed) *
| https://github.com/antirez/sds (BSD 2-clause, and gives
| you dynamic strings) * https://github.com/maxim2266/str
| (BSD 3-clause) * https://github.com/xyproto/egcc (GPL
| 2.0, includes GC on strings) *
| https://github.com/composer927/stringstruct (GPL 3.0) *
| https://github.com/c-factory/strings (MIT licensed) *
| https://github.com/cavaliercoder/c-stringbuilder (MIT
| licensed, does dynamic)
|
| If one does use the C standard library directly for
| handling strings, the advisories from CERT, NASA, Github,
| and others should be welcome advice (CERT's advice, BTW,
| includes recommending a safer strings library right off).
| derefr wrote:
| Yes, sure, write Unix CLI plumbing tools without strings.
| pjc50 wrote:
| Until you want to communicate with the user, filesystem,
| or web.
| Animats wrote:
| It was a design decision of QNX that the kernel never
| uses strings. Everything the kernel handles is fixed
| length, except messages, and messages go from one user
| process to another. The kernel does not allocate space
| for them. I think they go that right.
|
| There's a QNX user process that's always present, called
| "proc", which handles pathnames and the "resource
| managers", programs which respond to path names. But
| that's in user space, and has all the tools of a user-
| space program.
| cestith wrote:
| There are absolutely things that can be written without
| string handling. Then again, there are things that can't.
| Not handling strings in the kernel probably was a good
| decision. That userland I'll bet has string handling
| though, to be useful to users.
| _kst_ wrote:
| strncpy() is not a "safer" strcpy(). It can avoid some errors
| involving writing past the end of the target array ( _if_ you
| tell it the correct length for that array), but it 's not a
| true string function, and it can leave the target
| unterminated and therefore not a valid string.
|
| http://the-flat-trantor-society.blogspot.com/2012/03/no-
| strn...
| rrauenza wrote:
| I never could really understand the point of strncpy()...
| we always end up wrapping to deal with writing an
| unterminated string.
|
| Was it intended for fixed length records?
| [deleted]
| tedunangst wrote:
| It is for fixed length records, which is why it also
| zeroes the remaining space.
| ironmagma wrote:
| Arguably naming it with "str" is itself a security
| vulnerability.
| tedunangst wrote:
| No argument. At best it is a "string to fixed record"
| function, hence the name, but it is not a string
| function.
| Someone wrote:
| Yes. _strncpy_ was intended for copying file names into a
| buffer that was only zero terminated when the name was
| shorter than the maximum length of a file name in Unix
| (14 bytes. See https://stackoverflow.com/a/1454071, https
| ://devblogs.microsoft.com/oldnewthing/20050107-00/?p=36..
| .)
|
| You can also use it to overwrite part of an existing
| string, but I think that's a side effect of the above.
| throwaway09223 wrote:
| In the interest of satisfying pedantry I think we can agree
| that strncpy() is _intended_ to be a safer strcpy().
|
| As you say, it does in fact obviate some errors. A value
| judgement as to which errors are more or less safe may be
| subjective, but the intent is not.
| icedchai wrote:
| This is true, and many people don't realize it. I used to
| call a wrapper function that would always set the last byte
| to 0.
| draw_down wrote:
| Now ponder how many people find that state of affairs
| acceptable but also think JS is a terrible garbage language
| that idiots like.
| kazinator wrote:
| gmtime is just not thread-safe that's all, since it returns a
| static structure; gmtime_r is not banned.
| syncsynchalt wrote:
| Thanks, I am now a decade out of the C game and I was
| wracking my brain on what the problem with gmtime would be.
| My best guess was dodgy is_dst portability /shrug
| cperciva wrote:
| A better way of looking at it is that functions which expose
| very simple operations were among the first ones to be placed
| into the standard library -- and consequentially are the least
| well thought out.
| jchw wrote:
| Unfortunately, much of the pain with C surrounds dealing with
| strings. It's been a bit of a theme on Hacker News for the past
| few days, but it's actually a pretty good spotlight on
| something I feel is not always appreciated - strings in C are
| actually hard, and even the most safe standard functions like
| strlcpy and strlcat are still only good if truncation is a safe
| option in a given circumstance (it isn't always.)
|
| (~~Technically~~ Optionally, C11 has strcpy_s and strcat_s
| which fail explicitly on truncation. So if C11 is acceptable
| for you, that might be the a reasonable option, provided you
| always handle the failure case. Apparently, though, it is not
| usually implemented outside of Microsoft CRT.)
|
| edit: Updated notes regarding C11.
| masklinn wrote:
| > Technically C11 has strcpy_s and strcat_s
|
| "Theoretically" is the word you're looking for: they're part
| of the _optional_ Annex K so technically you can 't rely on
| them being available in a portable program.
|
| And they're basically not implemented by anyone but microsoft
| (which created them and lobbied for their inclusion).
| jchw wrote:
| I didn't know that it was Microsoft that lobbied for them;
| that perplexes me since I thought Microsoft's version of
| them were a bit different (for example, I think C11's
| explicitly fail on overlapping inputs where Microsoft
| specifies undefined behavior) and because Microsoft didn't
| bother supporting C99 for the longest time. (Probably still
| don't, since VLA was not optional in C99, IIRC. I think
| Microsoft was right to avoid VLA, though.)
| InvOfSmallC wrote:
| I teach at university as external lecturer. Teaching strings
| in C is the hardest thing I have to do every time. The
| university decided to explain C to first year student without
| previous experience. My feedback was to do a precourse in
| Python to let them relax a bit with programming as a concept
| and then teach C in a second course.
| kazinator wrote:
| > _I teach at university as external lecturer. Teaching
| strings in C is the hardest thing I have to do every time._
|
| But if you keep up the good work you will one day go from
| extern void *lecturer;
|
| to static const lecturer;
| ritmatter wrote:
| +1, my university's program seemed to work well with
| "program anything" (Python), "program with objects" (Java),
| "program some cool lower-level stuff" (C)
| gravypod wrote:
| Sorry to bug you since this is unrelated. I'm a huge fan of
| teaching others and I was wondering how you got to be an
| external lecturer at a college? I'd love to teach classes
| related to software engineering and data structures. Would
| you mind emailing me (in my profile) about this?
| _the_inflator wrote:
| Yep, agree. I used a lot of assembler on C64 and Amiga
| until I touched so called high level programming languages
| for the first time. For me thinking in strings was really a
| weird concept.
|
| Nowadays I find it extremely strange to think of bits and
| bytes when being confronted with strings.
| austinl wrote:
| Most of the C I wrote was while in college. I think
| understanding the question, "why are strings in C hard?" is
| a good gateway to understanding how programming languages
| and memory work generally. I agree with you though that
| teaching C as introductory is probably not the best -- our
| "Programming in C" course was taken in sophomore year.
|
| I wouldn't want to use it my day job, but I'm glad that it
| was taught in university just to give the impression that
| string manipulation is not quite as straightforward as it's
| made to appear in other languages.
|
| The early days of Swift also reminded me of this problem -
| strings get even more challenging when you begin to deal
| with unicode characters, etc.
| orwin wrote:
| In my school, we had two days to understand the basics of
| text editors, git (add, commit, rebase, reset, push) and
| basic bash functions (ls, cd, cp, mv, diff and patch, find,
| grep...) + pipes, then a day to understand how while,
| if/else and function calls work, then a day to understand
| how pointer work, then a day to understand how malloc(),
| free() and string works (we had to remake strlen, strcpy,
| and protect them). Two days, over the weekend, to do a
| small project to validate this.
|
| Then on the monday, it was makefiles if i remember
| correctly, then open(), read(), close() and write(). Then
| linking (and new libc functions, like strcat) . A day to
| consolidate everything, including bash and git (a new small
| project every hour for 24 hours, you could of course wait
| until the end of the day to execute each of them). And then
| some recursivity and the 8 queen problem. Then a small
| weekend project, a sudoku solver (the hard part was to work
| with people you never met before tbh).
|
| The 3rd week was more of the same: basic struct/enums
| exercises, then linked list the next day, maybe static and
| other keyword in-between. I used the Btree day to
| understand how linked list worked (and understand how did
| pointer incrementation and casting really work), and i
| don't remember the last day (i was probably still on linked
| lists). Then a big, 5-day project, and either you're in, or
| you're out.
|
| I assure you, strings were not the hardest part. Not having
| any leaks was.
| PoignardAzur wrote:
| Ooh, the Epitech cursus. Nice.
|
| Also, I'd say "not having segfaults" is the hardest thing
| to get right when you're going through that.
| liuliu wrote:
| Yeah. I just avoid str manipulations in general in C and when
| I have to, fuzz it ... (but still, the perf cliff is
| definitely new to learn in the past few days).
| swlkr wrote:
| I'm partial to https://github.com/antirez/sds these days
| macjohnmcc wrote:
| strcpy is a coding challenge where I work for interviews. I
| typically ask them to write it as the standard version and
| ask them why they might not want to use it to see if they are
| aware of the risks. After that I ask them to modify the code
| to be buffer safe. And for those claiming C++ knowledge ask
| them to make it work for wchar_t as well to see if they can
| write a template. Some people really struggle with this.
| IgorPartola wrote:
| This is a lot like how in JavaScript you have footguns like the
| with statement or in Python 2 where you have Unicode issues,
| etc. I am sure we could definitely a new C standard that
| excludes these functions as obsolete, but the linked header
| file is a pretty sensible interim solution. C is an old
| language and it's kind of amazing that code written 30 years
| ago can still by and large be compiled by a modern compiler.
| Ever try to run 3 year old React projects using today's React?
| :)
| detaro wrote:
| Because individual libraries choosing to change quickly is
| comparable to language stability how? The relevant comparison
| would be "run a 3y old react app (or a 20 year old website
| using JS) in a modern browser or interpreter"
| _the_inflator wrote:
| Yes, and it would still run fine I guess. I think only
| eval() changed over time. APIs and so on are still the same
| except for some Netscape stuff.
| ggregoire wrote:
| > in JavaScript you have footguns like the with statement
|
| I've been coding in JS on a daily basis for more than 10
| years and today I learned there is a `with` statement in JS.
|
| https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| Edit: well, seems like it's been deprecated/forbidden since
| ES5 (2009), so it makes sense I've never seen it.
| GordonS wrote:
| And me around 20 years - also never even heard of the
| `with` statement! I think to qualify as a footgun, people
| actually need to be using it in the real world.
| viklove wrote:
| It amuses me that HN hates JS so much, that even a topic
| about problems with C turns into a JS-bashing thread.
|
| Also, I just want to remind you that JS isn't just React.
| There are plenty of libraries written in C that introduce
| breaking changes over the course of 3 years. Nothing will
| stop people from finding ways to complain about JS though, I
| know. The hate-boner is very real.
| sadgrip wrote:
| I think in most cases it's probably not hate but a deep,
| deep love.
| jrimbault wrote:
| JavaScript, LISP under C disguise. No wonder it's
| "popular" on HN.
|
| Assorted musing : Rust, OCaml under C disguise.
| orwin wrote:
| I think most people on HN like Javascript, or at least its
| idea? I mean, its a very C-like functionnal language,
| especially since ES6 put Js on the right road (for me at
| least)?
| lliamander wrote:
| I appreciate Javascript's LISPy qualities, but it has an
| inordinate number of footguns and a relative lack of
| standard, stable libraries. Coming from languages like Java
| and Erlang that are relatively scrupulous about such things
| is a bit jarring.
|
| I do like Typescript though, as it adds some really nice
| ergonomics.
| matheusmoreira wrote:
| Yeah, because of NUL-terminated strings. They cause so many
| problems it's not even funny. Even something simple like
| computing the length of the string is a linear time operation
| that risks overflowing the buffer. People attempted to fix
| these problems by creating variations of those functions with
| added length parameters, thereby negating nearly all benefits
| of NUL-terminated strings.
|
| Why can't we just have some nice structures instead?
| struct memory { size_t size; unsigned char
| *address; }; enum text_encoding {
| TEXT_ENCODING_UTF8, /* ... */ }; struct text {
| enum text_encoding encoding; struct memory bytes;
| };
|
| All I/O functions should use structures like these. This alone
| would probably prevent an incredible amount of problems. Every
| high-level language implements strings like this under the
| hood. Only reason C can't do it is the enormous amount of
| legacy code already in existence...
| guerrilla wrote:
| That would be nice. You hit on the other hell with C strings:
| modern encodings where wchar_t and mb* are useless and
| replacements essentially don't exist yet with char8_t,
| char32_t etc. Then there's the locale chaotic nonsense [1]. A
| new libc starting fresh would be nice.
|
| 1. https://github.com/mpv-
| player/mpv/commit/1e70e82baa9193f6f02...
| Camillo wrote:
| Many of the problems with C descend from a common root, the
| decision to use bare pointers (memory addresses) as the basic
| way to refer to strings, arrays etc.
|
| If they had used a {pointer, size} pair instead, it would have
| avoided all of these string problems, most buffer overflows,
| even the GTA Online loading problem that was on HN recently.
| cb321 wrote:
| For what it's worth, while what @Camillo says is both true
| and important, people usually do not mention the trade offs
| involved or why that decision was attractive at the time.
|
| These days (ptr,size) is probably 16 bytes -- longer than
| almost all words in the English language (the scrabble
| SOWPODS maxes out at 15). A pointer alone is 8B. Back at the
| dawn of C in 1970, memory was 6..7 orders of magnitude more
| expensive than today..maybe more inflation adjusted. (Today,
| cache memory can be almost as precious, but I agree that the
| benefits of bounded buffers probably outweigh their costs.)
|
| 8B pointers today are considered memory-costly enough "in the
| large" that even with dozens of GiB machines common, Intel
| introduced an x32 mode to go back to 32-bit addressing aka 4B
| pointers. [1] There are obviously more pointers than just
| char* in most programs, but even so.
|
| Anyway, trade offs are just something people should bear in
| mind when opining on the "how it should be"s and "What kind
| of wacky drugs were the designers of language XYZ on?!!?".
|
| [1] https://stackoverflow.com/questions/9233306/32-bit-
| pointers-...
| Animats wrote:
| Pascal, which had sized strings, was in wide use before C.
| Many people, including Bill Atkinson, who wrote many of the
| original Macintosh applications, thought C was a step
| backwards.
|
| Pascal, to save one byte, limited strings to length 255. Bad
| decision.
| [deleted]
| SavantIdiot wrote:
| If you list the languages you use, I'd be happy to point out
| the "footguns" in each of them. For all the warts on C, there
| really is no language that can compete for what it has
| accomplished over ~50 years.
|
| Recall that during the rise of C, people were writing machine
| code on punch cards. Assembly -> Machine code has far more
| footbullets than C, it is a tradeoff between hand holding and
| tiny fast code.
|
| Wow, this blew up.
|
| To all the people popping off about how great other languages
| are, tell me: when will we see the Unreal Engine written in
| Python, or Pascal, or Algol, or Rust, or Go... the next big
| step is WebASM (or .cu), and that's way more footbullet-y than
| C. And what is the native language all of your sub-30 year old
| interpreted languages were written in? Thank you!
| eschaton wrote:
| This is a grossly inaccurate description of computing at the
| time of the rise of C. C was competing with Pascal/Modula,
| BLISS, PL/I, BCPL, and so on, not assembly on punched cards.
|
| The "C competing with assembly" meme was very specific to
| _microcomputer_ game and operating system development, not
| more general microcomputer application development, and not
| to minicomputer or mainframe development.
| JoeAltmaier wrote:
| Mainframes very quickly were outclassed by minicomputers.
| They could not respond quickly to technology changes as
| fast. C was indeed king for decades.
| rodgerd wrote:
| There's far more critical code in the world running on COBOL
| and s3[79]0 assembler. COBOL is vastly more important than C.
| varjag wrote:
| Nearly everything around you runs code that was written in
| C, and absolutely nothing you can actually see runs COBOL
| code.
| mkipper wrote:
| _citation needed_
|
| I'm sure there's a lot of important things that rely on
| COBOL, but by most definitions of "critical", I think this
| is way off the mark.
| burnished wrote:
| COBOL is still used in many banking systems such as ATMs.
| These are 'critical' systems by most any definition of
| the word 'critical'.
| varjag wrote:
| That's a hugely broad definition of critical, enough to
| encompass most of business and finance software.
| slt2021 wrote:
| which language z/OS is written in?
| cygx wrote:
| _Recall that during the rise of C, people were writing
| machine code on punch cards._
|
| Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...
| samatman wrote:
| Your edit really isn't helping your case.
|
| Those of us who have always known about less dangerous
| 'system' languages (Pascal probably being the most popular)
| lament the fact that so much code got written in C instead.
|
| It wasn't inevitable. It was preventable! It just didn't
| happen that way for reasons which are largely historical.
|
| I don't work for the Rust Evangelism Strike Force, my main
| project is written in (as little) C (as possible), but I beg
| anyone who has a choice: use something else! Rust is... fine,
| Zig is promising. Ada still works!
|
| Writing out the set {Python, Pascal, Algol, Rust, Go} tempts
| me to say uncharitable things about your understanding of the
| profession, but I accept you were just being snarky so I'll
| just gesture in the direction of how $redacted that is.
| Gibbon1 wrote:
| My favorite assembly foot gun was a guy I worked with had a
| cute routine. You had a call to the routine, followed by a
| null terminated string after that. The routine would spit the
| string to the terminal. And then return to the location after
| the string.
|
| He had some bug where in one place it returned to the start
| of the string, executed it, and kept going. The end result
| just happened to be a nop. Had been like that in production
| for a couple of years.
| atoav wrote:
| Yeah there are footguns in every language. But this is not a
| boolean question about the presence of footguns, this is
| about how much one has to know to be able to handle a
| language safely.
|
| I know C/C#/Python/Rust/Javascript.
|
| After a decade of using C I am still not totally sure if I
| didn't dangle a pointer somwhere in precisely the wrong way
| to create havoc. And yeah, that means I have to get better,
| etc. But that is not the point. The point is, that even with
| a lot of experience in the language you can still easily
| shoot yourself into the foot and don't even notice it.
|
| Meanwhile after a month of using Rust I felt confident that I
| didn't shoot myself in the foot, because I know what the
| compilers e.g. ownership guarantuees. While in C shooting
| myself into the foot happen quite often in Rust I would have
| to specifically find a way to shoot myself into the foot
| without the compiler yelling at me, and quite frankly I
| havent found such a way yet.
|
| Javascript is odd, because the typesystem has quite a few
| footguns in it. This is why such things like Elm or
| Typescript exist: to avoid these footguns.
|
| I don't want to take away from the accomplishments of C, and
| I still like the language, but to claim it is equally likely
| in all languages to shoot yourself into the foot is not true.
| maerF0x0 wrote:
| Not that I dont believe there are any, but I'd love to hear
| your perspective...
|
| Go (golang)
| SavantIdiot wrote:
| Well, shit. Got me there.
| boolemancer wrote:
| defer having function scope instead of, well, scope scope.
|
| Using defer to unlock locks can lead to some fun deadlocks
| if you don't realize the issue with the scope, and it's
| completely unintuitive to someone with experience with
| other implementations of similar concepts.
| crimper wrote:
| channel programming and the races caused by closing
| channels. channels seem nice and easy until they don't.
|
| the whole var/:=/= assignment combined with the error
| handling style and the shorthand is another one
| maerF0x0 wrote:
| yeah the lack of determinism in selecting a channel can
| be tricky for causing bugs where order matters. Luckily
| in smaller cases you're likely to encounter them as
| flakey tests (eg 1/2 the time) select {
| case <-ch1: case <-ch2: }
| dpatterbee wrote:
| Only close channels when trying to tell the receiver that
| you're not sending more data. Otherwise let the garbage
| collector deal with it. Channels seem easy until they
| don't until they do again in my experience.
|
| Don't understand your second point.
| badsectoracula wrote:
| > when will we see the Unreal Engine written in...
|
| Why would a huge C++ (not C, btw) codebase with roots going
| back to the 90s be rewritten in any other language?
|
| And in fact how is the language Unreal Engine written in
| relevant to C having footguns?
| ironmagma wrote:
| Yeah, there is a culture of complacency in C probably owing to
| the enormous historical baggage of legacy code that has to be
| supported and the blurred line between stdlib and system call.
| Spivak wrote:
| I mean on Linux you're not encumbered by this because the
| syscall api is stable but in practice most GNU/Linux distros
| assume glibc. You can't correctly resolve a hostname on Linux
| without farming out to glibc -- hell even the kernel punts to
| userspace for dns names but you can technically ignore it if
| you want.
|
| On BSDs and macOS you're always SOL because the syscall api
| isn't stable and only the C wrappers are.
| dangerbird2 wrote:
| c standard library doesn't really relate directly to system
| calls (at least in modern os'es). In particular, the stdio.h
| functions are buffered by default, while their system call
| analogues are not. For unixes, system call wrappers are
| typically found in <unistd.h>, not the "official" c standard
| library
| freedomben wrote:
| I disagree completely. Devs who use C are the least
| complacent about security in my experience. The problems are
| from previous eras before they knew about many of these
| things. A ton of people in modern languages couldn't name a
| single dangerous function, though they do exist in every
| language. You'd be amazed at how many race condition vulns
| result from TOCTOU errors just in authentication, or checking
| for the existence of a file before opening it, etc.
|
| It's absolutely true that decades ago the C community was
| complacent, but it's not true now. Source: I taught secure
| coding in C/C++ in the 00s.
| IgorPartola wrote:
| What you said. Nobody is complacent. Anyone who thinks the
| Linux or OpenBSD (etc.) kernel developers take the lazy way
| out is talking about a thing they know little about. I do
| think better languages than C exist and maybe could even be
| used as a basis for new systems. But I have yet to see a
| mature OS that's as secure and as performant as these.
| Closest might be the chips I've seen that have an embedded
| Java byte code interpreter.
| ironmagma wrote:
| I agree in principle but think these security-focused C
| developers are focusing on the trees for the forest.
| Every developer having the responsibility of cultivating
| their own pet list of banned functions is, frankly, NOT
| the way to achieve security. Those things need to be
| enforced at the widest level possible (OS, or language)
| to have the needed effect.
| dangerbird2 wrote:
| It's not really complacency: it's that the standard library
| is intentionally minimalistic to maintain portability and
| backwards compatibility. If you want sensible string
| handling, it's usually best to use a high level utility
| library like GLib(https://developer.gnome.org/glib/stable/)
| or Apache Portable Runtime(http://apr.apache.org/), or roll
| your own safe string type (preferably non-null terminating)
| yxhuvud wrote:
| No, if you want sensible string handling, the sane choice
| is usually to choose to use a language that is not C. Not
| always, but definitely usually.
| IgorPartola wrote:
| It's not hard to have strings like you do in other
| languages in C. It is hard when you treat _char foo[]_ as
| if it was a string object like you have in JavaScript or
| Java or Python. C strings are just chunks of memory
| terminated by \0. They can still be mildly useful that
| way but if you actually want to do string operations you
| need to use a library designed for the problem (variable
| length, storing length with the object, Unicode support,
| etc.). Problem is that most people don't start with such
| a library so they end up doing the hard work themselves
| in an ad hoc manner.
|
| You can't fuck up _String("Hello ") + String("world")_
| but you can definitely fuck up _strcat(buf, "Hello ");
| strcat(buf, "world");_.
| Ar-Curunir wrote:
| there's nothing inherently unportable about strings though.
| ironmagma wrote:
| Why do you need backward compatibility with a compiled
| language? Other languages like Rust and JavaScript (even)
| avoid that with a pragma tag on the source.
| hctaw wrote:
| Because not everything is recompiled from source. That's
| why stable ABIs need to exist.
| ironmagma wrote:
| Good point, thanks. Could the headers contain the
| pragmas?
| hctaw wrote:
| That assumes you have a header, which only exists at
| compile time for the developer. The running program knows
| nothing about it.
| ironmagma wrote:
| Why would a program need to know (e.g.) the details of
| what system calls or stdlib functions that a procedure it
| invokes uses? Aren't C functions pretty well separated
| from each other except for the odd signal handler and
| assuming a stable ABI? In my view most of the issues with
| C are semantics within the function blocks.
| rightbyte wrote:
| The parameters and return value is not in the object
| files.
| oleganza wrote:
| Notice that it's a giant PITA to work with any variable-length
| data. Because language lacks adequate means to abstract away
| safe fast memory access with generic types, RAII and borrow
| checkers. Comparing to C, both C++ and Rust (very different
| beasts) feel like pals of JavaScript: basic operations with
| dynamic strings and arrays just work(tm).
| frob wrote:
| As someone who learned C as their first language, strings in
| every single language after that have felt like cheating.
|
| "What? You mean I can type an arbitrary string and it works? I
| don't need to worry about terminators or the amount of memory
| I've allocated? You can concatenate two strings with +?!? What
| is this magic?"
| macintux wrote:
| Yeah, every time I decide to play with C for nostalgia's
| sake, I immediately get hung up on just how painful
| everything is, especially strings.
|
| I still love C, but I'd do my best not to have to write
| anything serious with it again.
| munchbunny wrote:
| The decision to make C strings null terminated with implied
| length instead of length + blob continues to trip us up, 30+
| years later. There's a good reason the "safe" versions of those
| functions all take length parameters. But way back when this
| approach was chosen, I don't think the state of the art could
| fully predict this outcome.
|
| But also, "strings" and "time" are actually very complex
| concepts, and these functions operate on often outdated
| assumptions about those underlying abstractions.
| jrimbault wrote:
| 30+ years -> 50+ years
|
| Funny mind thing to forget to increment counters each year.
| segf4ult wrote:
| C89 was 32 years ago, so I think saying 30+ years is fair.
| lamontcg wrote:
| Some of us learned C off of the original K&R book.
| coliveira wrote:
| Null terminated strings are remnants of an era when computers
| had little memory available. So, at the time it seemed smart
| to discard the length field and use a single byte-sized
| terminator (null). If you are writing an operating system for
| a machine with little memory to spare, this seems like a good
| decision. Of course things are very different now when memory
| is not a problem and the goal is safety.
| Blikkentrekker wrote:
| > _But also, "strings" and "time" are actually very complex
| concepts, and these functions operate on often outdated
| assumptions about those underlying abstractions._
|
| Even in safer languages such as _Rust_ , there are often
| quaestions as to why certain string operations are either
| impossible, or need to be quite complicated for a rather
| simple operation and are then met with responses such as
| "*Did you know that the length of a string can grow from a
| capitalization operation depending on locale settings of
| environment variables?
|
| _P.s._ : In fact, I would argue that strings are not
| necessarily all that complicated, but simply that many assume
| that they are simpler than they are, and that code that
| handles them is thus written on such assumptions that the
| length of a string remain the same after capitalization, or
| that the result not be under influence of environment
| variables.
| munchbunny wrote:
| > locale settings of environment variables
|
| Also known as "why does my code that parses floats fail in
| Turkey?"
|
| Also also known as the discrepancy between a string's
| length-as-in-bytes, its length-as-in-code-points, and its
| length-as-in-how-humans-count-glyphs.
|
| Strings are hard.
| kazinator wrote:
| > _Why does my code that parses floats fail in Turkey_
|
| Because you, or someone, called
| fuck_my_program();
|
| which is defined in "idiot.h" as #define
| fuck_my_program() setlocale(LC_ALL, "")
|
| and the project is missing: #define
| setlocale(x, y) BANNED(setlocale)
|
| Hope that helps!
| retrac wrote:
| For reasons that were never clearly articulated, the prefix
| approach was considered odd, backwards, and to have numerous
| downsides, at least where I learned C. In hindsight, I can
| only cringe at that attitude. Strings as added in later
| Pascal, about 40 years ago now, were memory safe in a way
| that C strings still are not.
| lordgroff wrote:
| Oh Pascal, why couldn't we have had you instead.
| kazinator wrote:
| Pascal strings are not inherently memory safe:
| cat_pascal_strings(pascalstr *uninited_memory,
| pascalstr *left, pascalstr
| *right);
|
| how big is uninited_memory? Can left and right fit into it?
|
| You need to design language constructs around Pascal srings
| to make them actually safe. Such as, oh, make it impossible
| to have an uninitialized such object. The object has o know
| both its allocation size and the actual size of the string
| stored in it.
|
| What is unsafe is constructing new objects in an anonymous
| block of memory that knows nothing about its size.
|
| C programs run aground there not just with strings!
| struct foo *ptr = malloc(sizeof ptr); // should be sizeof
| *ptr!! if (ptr) { ptr->name = name;
| ptr->frobosity = fr;
|
| Oops! The wrong size of allocated only the size of a
| pointer: 4 or 8 bytes, typically nowadays, but the
| structure is 48 bytes wide.
|
| "struct foo" itself isn't inferior to a Pascal RECORD; the
| problem is coming from the wild and loose allocation side
| of things.
|
| Working with strings in Pascal is relatively safe, but
| painfully limiting. It's a dead end. You can't build
| anything on top of it. Can you imagine trying to make a
| run-time for a high level language in Pascal? You need to
| be in the driver's seat regarding how strings work.
| munchbunny wrote:
| The prefix approach turns the neat "strings are just
| character arrays are just pointers" pattern into something
| a lot more clunky, because now you've got this really basic
| data type that is actually a struct and now you have to
| have an opinion on how wide the length value is and short
| strings get a lot of memory overhead in just lengths, and
| so on.
|
| In hindsight, I think the complexity is worth the safety,
| but I could see why it felt more elegant to use null-
| terminated strings at the time.
| jdlshore wrote:
| It's a classic case of moving the complexity from one
| part of the system to another. "Strings are just
| character arrays" seems simple and elegant, but in
| reality is a giant mess, because strings are not just
| character arrays, any more than dates are just an offset
| from an epoch.
|
| Human concepts are inherently messy. "Elegant" solutions
| just shove the mess down the road.
| JoeAltmaier wrote:
| Hey, languages used length,blob even when C was invented.
| HP Access BASIC used that kind.
|
| It was a limitation, because they chose a byte length (to
| save space). So strings up to 255 characters only. It was
| decades before folks were comfortable with 32-bit length
| fields. And that still limited you to 4GB strings. In the
| bad old days, memory usage was king.
| selfhoster11 wrote:
| The funny thing is that you can just use the topmost bit
| of the length to indicate that the string length is >127,
| and chain as many length bytes as you want before you
| begin the string proper (to save space). It would be
| still a better encoding than a null at the end.
| [deleted]
| kazinator wrote:
| The reason that the safe functions take length parameters is
| that they produce a new object in uninitialized memory, a
| pointer to which is specified by the caller.
|
| It has nothing to do with null termination.
|
| And _that_ uninitialized memory is not self-describing in any
| way in the C language. Which is that way in machine language
| also.
|
| This is a problem you have to bootstrap yourself somehow if
| you are to have any higher level language.
|
| The machine just gives you a way to carve out blocks of
| memory that don't know their own type or size. C doesn't
| improve on that, but it is not the root cause of the
| situation. Without C, you still have to somehow go from that
| chaos to order.
|
| Copying two null terminated strings _into an existing null-
| terminated string_ can be perfectly safe without any size
| parameters. void replace_str(char *dest_str,
| const char *src_left, const char *src_right);
|
| If dest_str is a string of 17 characters, we know we have 18
| bytes in which to catenate src_left and src_right.
|
| This is not very useful though.
|
| Now what might be a bit more useful would be if dest_str had
| two sizes: the length of string currently stored in it, and
| the size of the underlying storage. This particular operation
| would ignore the former, and use the latter. It could replace
| a string of three characters with a 27 character one.
| amir734jj wrote:
| Maybe instead of just writing a banned message, it should be the
| name of alternative function to use.
| [deleted]
| 1337_d00dZ wrote:
| In compilers that implement GCC extensions (such as Clang), you
| can use the "poison" directive to achieve the same effect (but
| with a better error message):
|
| #pragma GCC poison printf sprintf fprintf
|
| [0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html
| shadowgovt wrote:
| To its credit, it's convenient that the C pre-processor is so
| powerful that it facilitates baking a "C the good parts" concept
| directly into the compilation process.
| rcgorton wrote:
| But it isn't even April 1 yet! This is truly a BAAAD joke. So GIT
| is not implemented in C? Or C++?
| snvsn wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=20792938
| StillBored wrote:
| These functions are one of the many reasons why I tend to have a
| C with some C++ classes dialect I use in my own projects.
|
| std::string needs some tweaks, but it can mostly be treated as a
| built in and it wipes out a huge set of C string issues.
| jancsika wrote:
| I love seeing "strncpy" right after "strcpy."
|
| If someone wants some fun, try this:
|
| 1. Slurp up all the FOSS projects that extend back to 90s or
| early 2000s.
|
| 2. Filter by starting at earliest snapshot and finding
| occurrences of strcpy and friends who don't have the "n" in the
| middle.
|
| 3. For those occurrences, see which ones were "fixed" by changing
| them to strncpy and friends in a later commit somewhere.
|
| 4. See if you can isolate that part of the code that has the
| strncpy/etc. and run gcc on it. Gcc-- for certain cases (string
| literals, I think)-- can report a warning if "n" has been set to
| a value that could cause an overflow.
|
| I'm going to speculate that there was a period where C
| programmers were furiously committing a large number of errors to
| their codebases because the "n" stands for "safety."
| commandlinefan wrote:
| Ok, memcpy(dst, src, strlen(src)) it is then!
| gilbetron wrote:
| Meh, most of us understood the sharp edges of strings pretty
| well. Before, we'd check the len of strings before strcpy,
| strncpy let us do it without doing that, and just slap a 0 in
| if needed. Safe? No. Better? A bit. Do I ever want to do string
| manipulation again with C? Nope.
| tomjakubowski wrote:
| Understanding the sharp edges is one thing. Being able to
| avoid them in practice is another. The history of memory
| safety problems in C string handling, especially involving
| strcpy/strncpy, strongly suggests to me that they're
| unavoidable even for skilled, knowledgeable, and experienced
| C programmers.
| Luyt wrote:
| It would be great if the BANNED() macro could suggest the correct
| function to use.
| tinus_hn wrote:
| You could send a pull request, it doesn't seem too complicated
| to implement
| lmilcin wrote:
| To respond to some of the comments.
|
| It is not that there is anything intrinsically wrong with these
| functions. You can technically use all of them and I have been
| using all of them, safely, for decades.
|
| The issue is they are huge traps to the point that in a larger
| piece of software one can say "well, it's just not worth it".
|
| You can go much, much, much further than that.
|
| In couple embedded projects I worked some of the rules were:
|
| * dynamic allocation after application has started is banned --
| any heap buffers and data structures must be allocated at the
| start of the application and after that any allocation is a
| compile time error,
|
| * any constructs that would prevent statically calculating stack
| usage were banned (for example any form of recursion except when
| exact recursion depth is ensured statically),
|
| * any locks were banned,
|
| * absolutely every data structure must have size ensured, in a
| simple way, beyond any reasonable doubt,
|
| etc.
| whatisthiseven wrote:
| It is interesting to read the rules you came up with to limit
| memory usage, and then to think of the criticisms one gets in
| Java for limiting memory usage. In Java we try to limit new as
| much as possible to prevent the GC from pausing too much, or
| inconveniently, or for too long. And basically all the rules
| you say are what we also use in Java.
|
| Except when you have these rules in Java, the ironic counter-
| point is "if you are doing this much memory control yourself,
| you should just use C or C++ or something".
|
| I'll keep your comment in mind next time I see that rebuttal.
| Thank you.
| zwieback wrote:
| The stack thing was always the big worry for me. Without a
| comprehensive static code analysis tool that's hard to do. And
| runtime stack checking adds quite a bit of overhead, especially
| if you also have to worry about running on the interrupt stack
| and possibly switching.
| xondono wrote:
| Anything enforcing MISRA has essentially (almost) no way of
| allocating memory at runtime.
| fsociety wrote:
| It's funny, I worked exclusively with MISRA at the start of
| my career. Eventually I started a job at a FAANG and received
| quizzical comments on why I implemented a memory arena.
|
| The argument was to allocate memory freely and let it pool
| memory as necessary. Fair enough, it was simpler and fit the
| standard expectation of development.
|
| The issue is that if you talk with the allocator team they
| complain of not being able to fix performance issues fast
| enough due to allocations firing off left and right in the
| middle of a request.
|
| I never realized that my view of C programming is heavily
| influenced by MISRA until your comment.
|
| I know game engine programming follows a similar, perhaps
| unspoken, convention.
| munchbunny wrote:
| The lack of runtime allocations in game engine programming
| comes from a different motivation: allocations are
| expensive, garbage collections are expensive, cache
| coherency matters, and you're chucking around a lot of very
| similar looking objects, so... object pools!
| orwin wrote:
| Yeah, the first time we coded a scroller shooting game
| with my friend (at school), we were baffled that our
| terminal-based scroller lagged more than the raycaster we
| did two weeks prior. Was it a C vs C++ thing?
|
| Turns out, creating then destroying every single
| missile/enemy was extremely costly
| hctaw wrote:
| Custom allocators are quite common, it's not an arcane
| convention. I think the rule of thumb is preallocate until
| it gets questionable in complexity, then write your custom
| allocator - and really it's only applicable to code with a
| real-time deadline (hard or soft). Otherwise the system
| allocator is going to be a lot smarter than yours once it
| leaves microbenchmarks.
| closeparen wrote:
| How often does the dynamic allocation rule lead to an ad-hoc
| allocator appearing inside the program?
|
| Also doesn't the OS lie? I thought the memory wasn't really
| physically assigned until first use.
| syncsynchalt wrote:
| In my experience dynamic allocation is banned in either (a)
| small embedded environments or (b) high scrutiny environments
| (soft realtime, safety critical, etc).
|
| In both cases the project size is small enough, or the
| scrutiny is high enough that the ad-hoc allocator doesn't
| develop. The environment is also simple enough that the
| memory cheats you're thinking of don't exist (or you can
| squash them by touching all allocated memory up front).
| at_a_remove wrote:
| I have only ever dabbled in C, just to look at other people's
| code and occasionally when I really needed speed, so I am at what
| I would call a "Pretty Pathetic" level, able to recognize that I
| am looking _at_ C.
|
| However, I look at old books on C, and then I look at this list,
| and I wonder if it would not have been helpful to, after
| mentioning that a function was banned, suggest what the
| replacement is, even as a comment.
___________________________________________________________________
(page generated 2021-03-04 23:00 UTC)