[HN Gopher] strcpy: A niche function you don't need
       ___________________________________________________________________
        
       strcpy: A niche function you don't need
        
       Author : grep_it
       Score  : 55 points
       Date   : 2021-07-30 20:06 UTC (2 hours ago)
        
 (HTM) web link (nullprogram.com)
 (TXT) w3m dump (nullprogram.com)
        
       | einpoklum wrote:
       | One-liner summary: Author suggests using just memcpy() typically,
       | strncpy() rarely/maybe, and even more rarely, or never, strcpy().
        
         | ajanuary wrote:
         | I don't think that's an accurate summary. "Use memcpy()
         | instead. Use strncpy() in limited circumstances."
        
           | icedchai wrote:
           | Doesn't memcpy have the same issue as strncpy? the
           | destination will not be null terminated if the source is too
           | long.
           | 
           | Many projects just implement a safe_strncpy wrapper that
           | always terminates the destination. Example: https://github.co
           | m/brgl/busybox/blob/master/libbb/safe_strnc...
        
         | typical182 wrote:
         | I think though it is recommending against strncpy as well...
         | 
         |  _linters and code reviewers commonly recommend alternatives
         | such as strncpy (difficult to use correctly; mismatched
         | semantics) [...] Besides their individual shortcomings, these
         | answers are incorrect. strcpy and friends are, at best,
         | incredibly niche, and the correct replacement is memcpy._
        
           | tialaramex wrote:
           | strncpy() does a specific thing really well. If it's the
           | 1970s where you are and you're writing Unix filesystem code
           | or some 1970s data processing application your program likely
           | has a use for strncpy() and it matches the semantics you need
           | exactly.
           | 
           | But it isn't intended as a solution for buffer overflow bugs
           | in your program, and so if you try to abuse it to solve that
           | problem you likely introduce _more_ problems.
           | 
           | Imagine your car's air conditioning doesn't work properly. On
           | sunny days it's really much too hot in the car. So, you buy a
           | sunroof. Says "Sun" right in the name, surely that will help
           | right? No. That's not what a sunroof is for. The sunroof
           | works fine _as a sunroof_ but that is not what you needed.
        
       | forrestthewoods wrote:
       | God the C runtime library is so bad. So is the C++ STL.
       | 
       | I think it's a travesty that these languages defined an API but
       | didn't provide an implementation. Hindsight is 20/20, but what a
       | nightmare!
       | 
       | It is far more rational to provide an implementation using
       | standard language features. It's not like strcpy needs to make a
       | syscall!
        
       | csnover wrote:
       | Related: Designing a Better strcpy, from last month[0]
       | 
       | [0] https://news.ycombinator.com/item?id=27537900
        
       | corndoge wrote:
       | Not sure I agree with the recommendation against strlcpy. While
       | it is technically true that if you can't replace strcpy with
       | memcpy you're using strcpy wrong, it's also true that most uses
       | of strcpy are wrong, which I think is a better point to make. The
       | stated purpose of strcpy is to copy a string, and if you're
       | copying a string your best bet is strlcpy. The article is worded
       | in such a way that you'd walk away thinking "I should always use
       | memcpy."
        
         | tptacek wrote:
         | I'm not a strlcpy fan, but I'll never understand recommending
         | against string functions because they're "nonstandard". They're
         | tiny and portable almost by definition. Vendor them in to your
         | project.
        
       | jonathrg wrote:
       | It's too bad that many of the string handling functions in the C
       | standard library are ticking time bombs. I like the approach
       | taken in e.g. git which converts problematic function calls into
       | compile errors https://github.com/git/git/blob/master/banned.h
        
         | teknopurge wrote:
         | Honestly not trolling - I disagree with the "ticking time
         | bombs" comment. If you feel that way the devs should be using
         | Rust.
         | 
         | C is a sharp knife with no handle; this is it's purpose as a
         | language and tool.
         | 
         | cc: theo@openbsd.org
        
           | jonathrg wrote:
           | These are just simple API design issues in the standard
           | library, nothing to do with the language itself.
        
           | tialaramex wrote:
           | > the devs should be using Rust.
           | 
           | Yes, they should. Or several other things depending on what
           | exactly they need.
           | 
           | > C is a sharp knife with no handle; this is it's purpose as
           | a language and tool.
           | 
           | Help me out here HN resident survivalists, carpenters, maybe
           | circus knife throwers. What is the "purpose" of a "sharp
           | knife with no handle" exactly? How often have you thought,
           | "Man, it'd be so much easier to gather firewood, carve
           | decorations or score a bullseye if only the blade would sink
           | into my own flesh while I was using it because it doesn't
           | have a handle" ?
           | 
           | Historically the argument was, "We're using C because
           | alternatives like Java or Python or whatever aren't fast
           | enough or capable enough". OK. But, somewhere in the last few
           | years it moved to, "We're using C because alternatives aren't
           | dangerous enough" and that's crazy.
        
             | teknopurge wrote:
             | BYOH - Bring You Own Handle
             | 
             | The purpose is not to assume what the developer wants, but
             | provide access to all the resources(and manipulation) they
             | need.
             | 
             | If you need guardrails there are 10s of languages designed
             | for specific purposes.
        
         | Someone1234 wrote:
         | I like it too, but then you're going down the uncanny valley
         | of:
         | 
         | - The project is new, in which case you can easily/safely ban
         | functions, but then why are you starting a _new_ C project in
         | 2021?
         | 
         | - The project already exists, and now you need to refactor out
         | all the compile-time errors in order to move forward (time-
         | consuming).
         | 
         | Keep in mind the first is a real question that should be
         | answered. If your goal is to avoid undefined behavior/potential
         | security headaches, then C should be entered into after careful
         | consideration of cost/benefits. There are better alternatives
         | for _some_ projects but not others YMMV.
        
           | tick_tock_tick wrote:
           | > The project is new, in which case you can safely ban
           | functions, but then why are you starting a new C project in
           | 2021?
           | 
           | Interoperability? A tons of chips have a single C compiler
           | forked from an old version of GCC without even full C99
           | support and that's it. Good luck getting anything else
           | running on that hardware.
        
             | Someone1234 wrote:
             | That's why you should evaluate the cost/benefit, like I
             | said.
             | 
             | > If your goal is to avoid undefined behavior/potential
             | security headaches, then C should be entered into after
             | careful consideration of cost/benefits. There are better
             | alternatives for some projects but not others YMMV.
        
           | jonathrg wrote:
           | The ban can be rolled out gradually by not including banned.h
           | everywhere.
        
             | Someone1234 wrote:
             | But then you're undermining your own hard compile time
             | check, wherein specific files or areas of the project could
             | fall through the cracks and continue to used supposedly
             | "banned" functions.
             | 
             | I'd argue doing it globally but as warning would be a
             | better alternative than doing it file by file.
        
               | jonathrg wrote:
               | A warning would be nice, is there a way to implement that
               | without changing the compiler though?
        
           | hdjjhhvvhga wrote:
           | Both cases are real. If you start a new project in C, it
           | means you have a very good reason - and hopefully a strategy
           | of dealing with strings and other problematic issues.
           | 
           | If you need to deal with an older codebase, the all-or-
           | nothing approach might not be appropriate - incremental
           | improvements might be a better option. Yes, you will have
           | more problems to deal with initially, but with time the
           | situation will get better.
        
       | baby wrote:
       | C: a niche language you don't need
        
       | legulere wrote:
       | Even better is to not null terminate strings and use pointer plus
       | length everywhere.
        
         | st_goliath wrote:
         | > Even better is to not null terminate strings and use pointer
         | plus length everywhere.
         | 
         | Yes. Except that we aren't programming in a void. Particularly
         | if you are writing C to begin with, you will have to interface
         | with decades of existing code. Some of which has interfaces
         | that crept into standards.
         | 
         | You _eventually have to_ pass a string to some function that
         | does not have a length argument and expects a null-terminated
         | string, be it to a library function or _the operating system
         | itself_ (e.g. the `open` system call). You _will_ still need to
         | keep that null-terminator around.
        
           | diegocg wrote:
           | The only reason why that happens is that C refuses to
           | standardise support for anything else.
           | 
           | Standardise support for pointer+length strings and the most
           | active parts of the ecosystem will start using it. It will
           | take a long time to get widespread but the sooner you start
           | the sooner it will happen.
           | 
           | Sure, you will have to revert to traditional strings. Some
           | times often. That's no big deal, there should be helper
           | functions. In D you just add .toStringZ to any D string and
           | you get a C string which makes interacting with C code easy.
           | 
           | Of course none of this will happen because C is dead from a
           | evolutionary point of view. Hopefully new CS students will
           | likely not have to deal with any of this bullshit in a few
           | decades.
        
         | topspin wrote:
         | I thought we'd all decided the Chad way was the best way to do
         | strings in C.
         | 
         | https://github.com/skullchap/chadstr
         | 
         | Maybe not though. Issue #6 is unresolved.
        
         | tialaramex wrote:
         | Right, when I was younger, I was convinced that NUL termination
         | was a reasonable strategy. Learning C in the 1990s it made
         | plenty of sense, even though I was also learning about buffer
         | overflows and underflows.
         | 
         | One of the last things that finally changed my mind was the
         | observation that the _length_ shouldn 't live with the text,
         | but with the structure _describing_ the text. Some of you might
         | be laughing now, because that was obvious to you, but I
         | genuinely had gone years without considering that. I 'd been
         | imagining a hack like the length of the string lives in a few
         | bytes "before" the text.
         | 
         | Once I was envisioning the mutable string as [length, pointer]
         | itself, that seemed obviously better and I was onboard with
         | abolishing NUL termination in software.
        
           | amelius wrote:
           | It might sound obvious to you now, but most functional
           | languages conceptually store strings as nil-terminated lists
           | ...
        
           | thaumasiotes wrote:
           | > I'd been imagining a hack like the length of the string
           | lives in a few bytes "before" the text.
           | 
           | That's normal, usually called a "Pascal string".
           | 
           | As I recall, the C standard makes no assumption of whether
           | strings are null-terminated or not.
        
             | moefh wrote:
             | > As I recall, the C standard makes no assumption of
             | whether strings are null-terminated or not.
             | 
             | I'm not sure what you mean by assumptions made by the C
             | standard, but it definitely says strings are null-
             | terminated:
             | 
             | > A byte with all bits set to 0, called the null character,
             | shall exist in the basic execution character set; it is
             | used to terminate a character string.
             | 
             | and
             | 
             | > A string literal need not be a string [...], because a
             | null character may be embedded in it by a \0 escape
             | sequence.
             | 
             | (the second one is noting that if a string literal contains
             | "\0", then it's not a string but _contains_ a string with
             | more stuff after it).
        
         | williamvds wrote:
         | Unfortunately C is locked into null-terminated strings, given
         | that all the printf-style functions work on the assumption
         | there'll be a null terminator. C++ has std::string_view which
         | is pointer + length, but you've still got the same problem if
         | you need to call older printf-style functions.
        
           | nicoburns wrote:
           | Why do you have to use printf? You could have a string
           | library would could with it's own formatting routines.
           | There's also the option of using both a length AND a null
           | terminator.
        
             | codesections wrote:
             | > There's also the option of using both a length AND a null
             | terminator.
             | 
             | I first encountered that idea in this classic Joel on
             | Software post, which rather put me off the idea of using
             | them in production:
             | 
             | > Notice in this case you've got a string that is null
             | terminated (the compiler did that) as well as a Pascal
             | string. I used to call these fucked strings because it's
             | easier than calling them null terminated pascal strings but
             | this is a rated-G channel so you will have use the longer
             | name.
             | 
             | https://www.joelonsoftware.com/2001/12/11/back-to-basics/
        
           | AlexanderDhoore wrote:
           | Nope, printf can print strings without NULL-terminator:
           | printf("%.*s", <int>length, <char*>string);
        
             | OskarS wrote:
             | And the first argument to printf (the format), what kind of
             | string is that? And what kind of string does sprintf()
             | produce?
        
               | 10000truths wrote:
               | In practice, it is almost always a compile-time-known
               | string. gcc will warn you if it isn't, especially since
               | allowing the use of untrusted input for the format can
               | lead to vulnerabilities:
               | 
               | https://en.wikipedia.org/wiki/Uncontrolled_format_string
        
               | TazeTSchnitzel wrote:
               | sprintf tells you how many characters it has written, so
               | there's no reason you can't use it for non-null-
               | terminated strings.
        
               | comfydragon wrote:
               | williamvds's point is that the first argument to printf
               | is still itself a null-terminated string, so it's
               | basically turtles all the way down if you're using the C
               | standard library.
        
               | TazeTSchnitzel wrote:
               | Their comment talked about two things, and the sibling
               | comment addressed the other one.
        
               | tialaramex wrote:
               | Anywhere that this format is a variable, you probably
               | already screwed up. C allows that, but if I see it that's
               | getting flagged in my review.
               | 
               | So long as the format string is a literal you _needn 't
               | care_ how it works.
               | 
               | Now, one of the places where C makes this nastier than it
               | needed to be is that C built-in types are silly, and so
               | any non-trivial program is using better fundamental types
               | like uint32_t (or the more succinct u32), for which the
               | built-in formatter offers no syntax. So you end up
               | writing format strings like "There are "PRIu32" dogs\n"
               | using macros to bring in the appropriate specifier for
               | your literal. Blergh.
        
       | waynesonfire wrote:
       | This shit happens because people think they're clever and can
       | walk a string in a for loop.
       | 
       | Its not the fault of c or strcpy just as it's not the gun on
       | trial for taking human life.
       | 
       | You're a bad programmer and its too painful to admit. Do everyone
       | a favor, use training wheels and stop coding in C.
        
         | opheliate wrote:
         | Are you doing okay? Seems like an unnecessarily acidic response
         | to this post, which definitely seems like it's intended for C
         | beginners.
        
           | waynesonfire wrote:
           | Lol ops.
           | 
           | If you're a beginner, tell your lead to switch to rust.
           | Strncpy wont help you.
        
           | b33j0r wrote:
           | (not parent, guessing he was being snarky) While you make a
           | good point, I don't generally find beginner content on hacker
           | news.
           | 
           | I personally evaluated this trending post as something worth
           | my consideration and analysis. Perhaps my news PID needs a
           | tune-up!
        
       | b33j0r wrote:
       | If this is the case, why aren't the stdlib functions defined this
       | way? In all of the history of the longest-lived production
       | language family besides FORTRAN, this blog post is the first
       | voice to point out that memcpy should be the same operation as
       | strcpy for null-terminated strings? What is going on here?
       | 
       | (Rust crowd snickers as they unwrap<'jk> &mut *foo_buf)
        
         | gompertz wrote:
         | Curious as well, but when I look at the glibc source for
         | strncpy it is calling memcpy....https://code.woboq.org/userspac
         | e/glibc/string/strncpy.c.html
         | 
         | It all seems to depend on the compiler vendor.
        
       | jabl wrote:
       | C2X will be adding memccpy() (note two c's in the middle, not
       | memcpy!). Overview and justification at
       | https://developers.redhat.com/blog/2019/08/12/efficient-stri...
        
         | AlexanderDhoore wrote:
         | Great naming choice. Won't be confusing at all.
        
       | TazeTSchnitzel wrote:
       | My favourite C string function is snprintf:
       | 
       | * It takes a buffer size and truncates the output to the buffer
       | size if it's too large.
       | 
       | * The buffer size includes the null terminator, so the simplest
       | pattern of snprintf(buf, sizeof(buf), ...) is correct.
       | 
       | * It always null-terminates the output for you, even if
       | truncated.
       | 
       | * By providing NULL as the buffer argument, it will tell you the
       | buffer size you need if you want to dynamically allocate.
       | 
       | And of course, it can safely copy strings:
       | snprintf(dst_buf, sizeof(dst_buf), "%s", src_str);
       | 
       | Including non-null-terminated ones:
       | snprintf(dst_buf, sizeof(dst_buf), "%.*s", (int)src_str_len,
       | src_str_data);
       | 
       | And it's standard and portable, unlike e.g. strlcpy. It's one of
       | the best C99 additions.
        
         | guidovranken wrote:
         | snprintf works most of the time but it can fail, and people
         | almost never check the return value. For example it will always
         | fail if you attempt to print a string >= 2GB. If that happens
         | the output buffer may remain uninitialized (depends on the
         | implementation) and you're at risk for a Heartbleed-like
         | scenario.
        
         | jstimpfle wrote:
         | snprintf is underlying most logging modules I've done (logging
         | to memory / file / network / console...) - I've been thinking
         | about doing custom formatting routines but there's surprisingly
         | little need for them.
         | 
         | You probably know this, but sizeof is not a function. I prefer
         | the easier to type                   snprintf(buf, sizeof buf,
         | ...);
        
       | andrewmcwatters wrote:
       | What's the standard practice these days in C to move strings with
       | lengths around? I've been out of C for at least a couple of years
       | now, but I can't imagine it's changed in that time.
        
         | jstimpfle wrote:
         | If you're asking what to do about _copying_ strings, then it 's
         | either memcpy(), or rarely str[n]cpy(). strcpy when I can
         | assume the source length is safe but don't know the size of the
         | underlying buffer. strncpy when I want to check the return
         | value and maybe issue an error.
         | 
         | For passing references around, I use whatever works. Plain
         | `const char *` argument is certainly a frequent choice for
         | simple name or filepath arguments. That can even mean doing the
         | occasional strlen() when making a copy of that string. It
         | doesn't bother me at all; overall zero-terminated strings are
         | very easy to use. Can't understand why people never stop
         | bitching about it.
         | 
         | When the string is not just an opaque ID, but needs to be
         | examined more closely, it's usually more of a "slicey" or a
         | buffer-processing problem - then I'll add an `int len` to the
         | list of arguments, or to the members in a struct.
         | 
         | Very rarely I'll create a String class, but usually I don't
         | bother. It feels to me like going against the grain of the
         | language. I don't want to create my own host of string
         | processing functions that take this String as argument, when
         | it's usually simpler to operate directly on the data.
         | 
         | Something that I close to never need is the "growable" string
         | class with memory management like std::string. I have no idea
         | right now why I would need such a thing. I tend to write my
         | programs to work on fixed buffers. At most I'll create
         | dynamically sized strings, but a generic string that can grow
         | after creation isn't a frequent use case.
        
           | andrewmcwatters wrote:
           | I agree with everything you've stated. Just curious what
           | others think. Thanks for sharing.
        
         | dragontamer wrote:
         | I'd assume memcpy.
         | 
         | But a more serious answer is to use C++ strings instead, lol.
         | Writing "C-like C++" is probably more beneficial.
         | 
         | I do realize that a lot of people prefer to write in pure C
         | (ex: Linux kernel team), but more and more people are realizing
         | the benefits of C-like C++ code.
        
           | andrewmcwatters wrote:
           | Yeah, I would assume as much, too. When I last worked in C,
           | there was no de facto way to do this since so many functions
           | just worked on null termination.
        
       | barosl wrote:
       | When I used C for a serious project, I always used `snprintf(dst,
       | sizeof(dst), "%s", src);` to copy a string. It might be a little
       | bit slow, but it freed me from all the headaches of identifying
       | different string functions of C and remembering their subtle
       | differences. It also is useful for other purposes, e.g. prefixing
       | a string.
        
         | AlexanderDhoore wrote:
         | I do this as well. At some point I googled around and didn't
         | get a straight answer. So I've been using snprintf() for all my
         | string manipulation ever since. My productivity is more
         | important than a sliver of performance.
        
       ___________________________________________________________________
       (page generated 2021-07-30 23:00 UTC)