[HN Gopher] strcpy: A niche function you don't need
___________________________________________________________________
strcpy: A niche function you don't need
Author : grep_it
Score : 55 points
Date : 2021-07-30 20:06 UTC (2 hours ago)
(HTM) web link (nullprogram.com)
(TXT) w3m dump (nullprogram.com)
| einpoklum wrote:
| One-liner summary: Author suggests using just memcpy() typically,
| strncpy() rarely/maybe, and even more rarely, or never, strcpy().
| ajanuary wrote:
| I don't think that's an accurate summary. "Use memcpy()
| instead. Use strncpy() in limited circumstances."
| icedchai wrote:
| Doesn't memcpy have the same issue as strncpy? the
| destination will not be null terminated if the source is too
| long.
|
| Many projects just implement a safe_strncpy wrapper that
| always terminates the destination. Example: https://github.co
| m/brgl/busybox/blob/master/libbb/safe_strnc...
| typical182 wrote:
| I think though it is recommending against strncpy as well...
|
| _linters and code reviewers commonly recommend alternatives
| such as strncpy (difficult to use correctly; mismatched
| semantics) [...] Besides their individual shortcomings, these
| answers are incorrect. strcpy and friends are, at best,
| incredibly niche, and the correct replacement is memcpy._
| tialaramex wrote:
| strncpy() does a specific thing really well. If it's the
| 1970s where you are and you're writing Unix filesystem code
| or some 1970s data processing application your program likely
| has a use for strncpy() and it matches the semantics you need
| exactly.
|
| But it isn't intended as a solution for buffer overflow bugs
| in your program, and so if you try to abuse it to solve that
| problem you likely introduce _more_ problems.
|
| Imagine your car's air conditioning doesn't work properly. On
| sunny days it's really much too hot in the car. So, you buy a
| sunroof. Says "Sun" right in the name, surely that will help
| right? No. That's not what a sunroof is for. The sunroof
| works fine _as a sunroof_ but that is not what you needed.
| forrestthewoods wrote:
| God the C runtime library is so bad. So is the C++ STL.
|
| I think it's a travesty that these languages defined an API but
| didn't provide an implementation. Hindsight is 20/20, but what a
| nightmare!
|
| It is far more rational to provide an implementation using
| standard language features. It's not like strcpy needs to make a
| syscall!
| csnover wrote:
| Related: Designing a Better strcpy, from last month[0]
|
| [0] https://news.ycombinator.com/item?id=27537900
| corndoge wrote:
| Not sure I agree with the recommendation against strlcpy. While
| it is technically true that if you can't replace strcpy with
| memcpy you're using strcpy wrong, it's also true that most uses
| of strcpy are wrong, which I think is a better point to make. The
| stated purpose of strcpy is to copy a string, and if you're
| copying a string your best bet is strlcpy. The article is worded
| in such a way that you'd walk away thinking "I should always use
| memcpy."
| tptacek wrote:
| I'm not a strlcpy fan, but I'll never understand recommending
| against string functions because they're "nonstandard". They're
| tiny and portable almost by definition. Vendor them in to your
| project.
| jonathrg wrote:
| It's too bad that many of the string handling functions in the C
| standard library are ticking time bombs. I like the approach
| taken in e.g. git which converts problematic function calls into
| compile errors https://github.com/git/git/blob/master/banned.h
| teknopurge wrote:
| Honestly not trolling - I disagree with the "ticking time
| bombs" comment. If you feel that way the devs should be using
| Rust.
|
| C is a sharp knife with no handle; this is it's purpose as a
| language and tool.
|
| cc: theo@openbsd.org
| jonathrg wrote:
| These are just simple API design issues in the standard
| library, nothing to do with the language itself.
| tialaramex wrote:
| > the devs should be using Rust.
|
| Yes, they should. Or several other things depending on what
| exactly they need.
|
| > C is a sharp knife with no handle; this is it's purpose as
| a language and tool.
|
| Help me out here HN resident survivalists, carpenters, maybe
| circus knife throwers. What is the "purpose" of a "sharp
| knife with no handle" exactly? How often have you thought,
| "Man, it'd be so much easier to gather firewood, carve
| decorations or score a bullseye if only the blade would sink
| into my own flesh while I was using it because it doesn't
| have a handle" ?
|
| Historically the argument was, "We're using C because
| alternatives like Java or Python or whatever aren't fast
| enough or capable enough". OK. But, somewhere in the last few
| years it moved to, "We're using C because alternatives aren't
| dangerous enough" and that's crazy.
| teknopurge wrote:
| BYOH - Bring You Own Handle
|
| The purpose is not to assume what the developer wants, but
| provide access to all the resources(and manipulation) they
| need.
|
| If you need guardrails there are 10s of languages designed
| for specific purposes.
| Someone1234 wrote:
| I like it too, but then you're going down the uncanny valley
| of:
|
| - The project is new, in which case you can easily/safely ban
| functions, but then why are you starting a _new_ C project in
| 2021?
|
| - The project already exists, and now you need to refactor out
| all the compile-time errors in order to move forward (time-
| consuming).
|
| Keep in mind the first is a real question that should be
| answered. If your goal is to avoid undefined behavior/potential
| security headaches, then C should be entered into after careful
| consideration of cost/benefits. There are better alternatives
| for _some_ projects but not others YMMV.
| tick_tock_tick wrote:
| > The project is new, in which case you can safely ban
| functions, but then why are you starting a new C project in
| 2021?
|
| Interoperability? A tons of chips have a single C compiler
| forked from an old version of GCC without even full C99
| support and that's it. Good luck getting anything else
| running on that hardware.
| Someone1234 wrote:
| That's why you should evaluate the cost/benefit, like I
| said.
|
| > If your goal is to avoid undefined behavior/potential
| security headaches, then C should be entered into after
| careful consideration of cost/benefits. There are better
| alternatives for some projects but not others YMMV.
| jonathrg wrote:
| The ban can be rolled out gradually by not including banned.h
| everywhere.
| Someone1234 wrote:
| But then you're undermining your own hard compile time
| check, wherein specific files or areas of the project could
| fall through the cracks and continue to used supposedly
| "banned" functions.
|
| I'd argue doing it globally but as warning would be a
| better alternative than doing it file by file.
| jonathrg wrote:
| A warning would be nice, is there a way to implement that
| without changing the compiler though?
| hdjjhhvvhga wrote:
| Both cases are real. If you start a new project in C, it
| means you have a very good reason - and hopefully a strategy
| of dealing with strings and other problematic issues.
|
| If you need to deal with an older codebase, the all-or-
| nothing approach might not be appropriate - incremental
| improvements might be a better option. Yes, you will have
| more problems to deal with initially, but with time the
| situation will get better.
| baby wrote:
| C: a niche language you don't need
| legulere wrote:
| Even better is to not null terminate strings and use pointer plus
| length everywhere.
| st_goliath wrote:
| > Even better is to not null terminate strings and use pointer
| plus length everywhere.
|
| Yes. Except that we aren't programming in a void. Particularly
| if you are writing C to begin with, you will have to interface
| with decades of existing code. Some of which has interfaces
| that crept into standards.
|
| You _eventually have to_ pass a string to some function that
| does not have a length argument and expects a null-terminated
| string, be it to a library function or _the operating system
| itself_ (e.g. the `open` system call). You _will_ still need to
| keep that null-terminator around.
| diegocg wrote:
| The only reason why that happens is that C refuses to
| standardise support for anything else.
|
| Standardise support for pointer+length strings and the most
| active parts of the ecosystem will start using it. It will
| take a long time to get widespread but the sooner you start
| the sooner it will happen.
|
| Sure, you will have to revert to traditional strings. Some
| times often. That's no big deal, there should be helper
| functions. In D you just add .toStringZ to any D string and
| you get a C string which makes interacting with C code easy.
|
| Of course none of this will happen because C is dead from a
| evolutionary point of view. Hopefully new CS students will
| likely not have to deal with any of this bullshit in a few
| decades.
| topspin wrote:
| I thought we'd all decided the Chad way was the best way to do
| strings in C.
|
| https://github.com/skullchap/chadstr
|
| Maybe not though. Issue #6 is unresolved.
| tialaramex wrote:
| Right, when I was younger, I was convinced that NUL termination
| was a reasonable strategy. Learning C in the 1990s it made
| plenty of sense, even though I was also learning about buffer
| overflows and underflows.
|
| One of the last things that finally changed my mind was the
| observation that the _length_ shouldn 't live with the text,
| but with the structure _describing_ the text. Some of you might
| be laughing now, because that was obvious to you, but I
| genuinely had gone years without considering that. I 'd been
| imagining a hack like the length of the string lives in a few
| bytes "before" the text.
|
| Once I was envisioning the mutable string as [length, pointer]
| itself, that seemed obviously better and I was onboard with
| abolishing NUL termination in software.
| amelius wrote:
| It might sound obvious to you now, but most functional
| languages conceptually store strings as nil-terminated lists
| ...
| thaumasiotes wrote:
| > I'd been imagining a hack like the length of the string
| lives in a few bytes "before" the text.
|
| That's normal, usually called a "Pascal string".
|
| As I recall, the C standard makes no assumption of whether
| strings are null-terminated or not.
| moefh wrote:
| > As I recall, the C standard makes no assumption of
| whether strings are null-terminated or not.
|
| I'm not sure what you mean by assumptions made by the C
| standard, but it definitely says strings are null-
| terminated:
|
| > A byte with all bits set to 0, called the null character,
| shall exist in the basic execution character set; it is
| used to terminate a character string.
|
| and
|
| > A string literal need not be a string [...], because a
| null character may be embedded in it by a \0 escape
| sequence.
|
| (the second one is noting that if a string literal contains
| "\0", then it's not a string but _contains_ a string with
| more stuff after it).
| williamvds wrote:
| Unfortunately C is locked into null-terminated strings, given
| that all the printf-style functions work on the assumption
| there'll be a null terminator. C++ has std::string_view which
| is pointer + length, but you've still got the same problem if
| you need to call older printf-style functions.
| nicoburns wrote:
| Why do you have to use printf? You could have a string
| library would could with it's own formatting routines.
| There's also the option of using both a length AND a null
| terminator.
| codesections wrote:
| > There's also the option of using both a length AND a null
| terminator.
|
| I first encountered that idea in this classic Joel on
| Software post, which rather put me off the idea of using
| them in production:
|
| > Notice in this case you've got a string that is null
| terminated (the compiler did that) as well as a Pascal
| string. I used to call these fucked strings because it's
| easier than calling them null terminated pascal strings but
| this is a rated-G channel so you will have use the longer
| name.
|
| https://www.joelonsoftware.com/2001/12/11/back-to-basics/
| AlexanderDhoore wrote:
| Nope, printf can print strings without NULL-terminator:
| printf("%.*s", <int>length, <char*>string);
| OskarS wrote:
| And the first argument to printf (the format), what kind of
| string is that? And what kind of string does sprintf()
| produce?
| 10000truths wrote:
| In practice, it is almost always a compile-time-known
| string. gcc will warn you if it isn't, especially since
| allowing the use of untrusted input for the format can
| lead to vulnerabilities:
|
| https://en.wikipedia.org/wiki/Uncontrolled_format_string
| TazeTSchnitzel wrote:
| sprintf tells you how many characters it has written, so
| there's no reason you can't use it for non-null-
| terminated strings.
| comfydragon wrote:
| williamvds's point is that the first argument to printf
| is still itself a null-terminated string, so it's
| basically turtles all the way down if you're using the C
| standard library.
| TazeTSchnitzel wrote:
| Their comment talked about two things, and the sibling
| comment addressed the other one.
| tialaramex wrote:
| Anywhere that this format is a variable, you probably
| already screwed up. C allows that, but if I see it that's
| getting flagged in my review.
|
| So long as the format string is a literal you _needn 't
| care_ how it works.
|
| Now, one of the places where C makes this nastier than it
| needed to be is that C built-in types are silly, and so
| any non-trivial program is using better fundamental types
| like uint32_t (or the more succinct u32), for which the
| built-in formatter offers no syntax. So you end up
| writing format strings like "There are "PRIu32" dogs\n"
| using macros to bring in the appropriate specifier for
| your literal. Blergh.
| waynesonfire wrote:
| This shit happens because people think they're clever and can
| walk a string in a for loop.
|
| Its not the fault of c or strcpy just as it's not the gun on
| trial for taking human life.
|
| You're a bad programmer and its too painful to admit. Do everyone
| a favor, use training wheels and stop coding in C.
| opheliate wrote:
| Are you doing okay? Seems like an unnecessarily acidic response
| to this post, which definitely seems like it's intended for C
| beginners.
| waynesonfire wrote:
| Lol ops.
|
| If you're a beginner, tell your lead to switch to rust.
| Strncpy wont help you.
| b33j0r wrote:
| (not parent, guessing he was being snarky) While you make a
| good point, I don't generally find beginner content on hacker
| news.
|
| I personally evaluated this trending post as something worth
| my consideration and analysis. Perhaps my news PID needs a
| tune-up!
| b33j0r wrote:
| If this is the case, why aren't the stdlib functions defined this
| way? In all of the history of the longest-lived production
| language family besides FORTRAN, this blog post is the first
| voice to point out that memcpy should be the same operation as
| strcpy for null-terminated strings? What is going on here?
|
| (Rust crowd snickers as they unwrap<'jk> &mut *foo_buf)
| gompertz wrote:
| Curious as well, but when I look at the glibc source for
| strncpy it is calling memcpy....https://code.woboq.org/userspac
| e/glibc/string/strncpy.c.html
|
| It all seems to depend on the compiler vendor.
| jabl wrote:
| C2X will be adding memccpy() (note two c's in the middle, not
| memcpy!). Overview and justification at
| https://developers.redhat.com/blog/2019/08/12/efficient-stri...
| AlexanderDhoore wrote:
| Great naming choice. Won't be confusing at all.
| TazeTSchnitzel wrote:
| My favourite C string function is snprintf:
|
| * It takes a buffer size and truncates the output to the buffer
| size if it's too large.
|
| * The buffer size includes the null terminator, so the simplest
| pattern of snprintf(buf, sizeof(buf), ...) is correct.
|
| * It always null-terminates the output for you, even if
| truncated.
|
| * By providing NULL as the buffer argument, it will tell you the
| buffer size you need if you want to dynamically allocate.
|
| And of course, it can safely copy strings:
| snprintf(dst_buf, sizeof(dst_buf), "%s", src_str);
|
| Including non-null-terminated ones:
| snprintf(dst_buf, sizeof(dst_buf), "%.*s", (int)src_str_len,
| src_str_data);
|
| And it's standard and portable, unlike e.g. strlcpy. It's one of
| the best C99 additions.
| guidovranken wrote:
| snprintf works most of the time but it can fail, and people
| almost never check the return value. For example it will always
| fail if you attempt to print a string >= 2GB. If that happens
| the output buffer may remain uninitialized (depends on the
| implementation) and you're at risk for a Heartbleed-like
| scenario.
| jstimpfle wrote:
| snprintf is underlying most logging modules I've done (logging
| to memory / file / network / console...) - I've been thinking
| about doing custom formatting routines but there's surprisingly
| little need for them.
|
| You probably know this, but sizeof is not a function. I prefer
| the easier to type snprintf(buf, sizeof buf,
| ...);
| andrewmcwatters wrote:
| What's the standard practice these days in C to move strings with
| lengths around? I've been out of C for at least a couple of years
| now, but I can't imagine it's changed in that time.
| jstimpfle wrote:
| If you're asking what to do about _copying_ strings, then it 's
| either memcpy(), or rarely str[n]cpy(). strcpy when I can
| assume the source length is safe but don't know the size of the
| underlying buffer. strncpy when I want to check the return
| value and maybe issue an error.
|
| For passing references around, I use whatever works. Plain
| `const char *` argument is certainly a frequent choice for
| simple name or filepath arguments. That can even mean doing the
| occasional strlen() when making a copy of that string. It
| doesn't bother me at all; overall zero-terminated strings are
| very easy to use. Can't understand why people never stop
| bitching about it.
|
| When the string is not just an opaque ID, but needs to be
| examined more closely, it's usually more of a "slicey" or a
| buffer-processing problem - then I'll add an `int len` to the
| list of arguments, or to the members in a struct.
|
| Very rarely I'll create a String class, but usually I don't
| bother. It feels to me like going against the grain of the
| language. I don't want to create my own host of string
| processing functions that take this String as argument, when
| it's usually simpler to operate directly on the data.
|
| Something that I close to never need is the "growable" string
| class with memory management like std::string. I have no idea
| right now why I would need such a thing. I tend to write my
| programs to work on fixed buffers. At most I'll create
| dynamically sized strings, but a generic string that can grow
| after creation isn't a frequent use case.
| andrewmcwatters wrote:
| I agree with everything you've stated. Just curious what
| others think. Thanks for sharing.
| dragontamer wrote:
| I'd assume memcpy.
|
| But a more serious answer is to use C++ strings instead, lol.
| Writing "C-like C++" is probably more beneficial.
|
| I do realize that a lot of people prefer to write in pure C
| (ex: Linux kernel team), but more and more people are realizing
| the benefits of C-like C++ code.
| andrewmcwatters wrote:
| Yeah, I would assume as much, too. When I last worked in C,
| there was no de facto way to do this since so many functions
| just worked on null termination.
| barosl wrote:
| When I used C for a serious project, I always used `snprintf(dst,
| sizeof(dst), "%s", src);` to copy a string. It might be a little
| bit slow, but it freed me from all the headaches of identifying
| different string functions of C and remembering their subtle
| differences. It also is useful for other purposes, e.g. prefixing
| a string.
| AlexanderDhoore wrote:
| I do this as well. At some point I googled around and didn't
| get a straight answer. So I've been using snprintf() for all my
| string manipulation ever since. My productivity is more
| important than a sliver of performance.
___________________________________________________________________
(page generated 2021-07-30 23:00 UTC)