hngopher.com

       [HN Gopher] Setenv Is Not Thread Safe and C Doesn't Want to Fix It
       ___________________________________________________________________
        
       Setenv Is Not Thread Safe and C Doesn't Want to Fix It
        
       Author : r4um
       Score  : 188 points
       Date   : 2023-11-20 05:19 UTC (17 hours ago)
        
 (HTM) web link (www.evanjones.ca)
 (TXT) w3m dump (www.evanjones.ca)
        
       | turtleyacht wrote:
       | _Should_ apps orchestrate a super-global lock of a foreign
       | namespace?
       | 
       | An environment variable's value, for a running process, is just
       | what it is: an initial value from outside.
       | 
       | Adding complexity around it smells like an attempt to control a
       | distributed mutex, like checking an API for real-time value
       | changes in a while loop across several instances of the same app.
       | 
       | I thought there would be alternatives to this, like pubsub,
       | Kafka, or other asynchronous event handling.
       | 
       | Imagine having to test an app for its ability to handle safe
       | read-write of OS-level state. It's definitionally bankrupt: not
       | really a unit, not easy to set up quickly, and not isolated.
        
         | xxs wrote:
         | >Imagine having to test an app for its ability to handle safe
         | read-write of OS-level state. I
         | 
         | Also it should be able to handle invariants as modifying
         | multiple variables is not an atomic process, either.
        
         | aidenn0 wrote:
         | > An environment variable's value, for a running process, is
         | just what it is: an initial value from outside.
         | 
         | Then calling setenv() from anywhere except in the time between
         | a fork() and exec() call should be banned, but it's not.
         | Honestly, calling abort() if setenv is called in the presence
         | of threads would seem like a better status-quo than what we
         | have today.
        
       | Khelavaster wrote:
       | Slap a mutex on that beast!
       | 
       | This is expected behavior for setting a GLOBAL variable without a
       | lock on the memoryspace..
        
         | krackers wrote:
         | You can't guarantee that whatever libraries you pull in use the
         | same mutex as you though.
        
           | JaDogg wrote:
           | who pull in libraries in C like this?
        
             | agevag wrote:
             | I'm not sure I understand the question.
             | 
             | If you are writing, say, a GUI application, and use GTK,
             | GTK may use getenv (without your mutexes, of course) at the
             | same time you call setenv in another thread, potentially
             | crashing your entire application if you are unlucky.
        
             | Tobu wrote:
             | Who doesn't? libc itself calls getenv when getting system
             | time: https://news.ycombinator.com/item?id=38344224
             | 
             | You may have a mutex on getenv/setenv, like the Rust stdlib
             | does, but when libc doesn't look at that mutex, even on the
             | read side, you run into UB.
             | 
             | So the next step is never calling into seemingly innocent
             | libc functions in safe code (which you have to enforce on
             | your dependencies as well), implementing safe alternatives
             | to a good chunk of libc (and making sure your dependencies
             | use those), to cordon off anything that _looks_ at the
             | environment. This makes a good chunk of POSIX functionality
             | useless.
        
             | JaDogg wrote:
             | OK thank you for the explanation.. This makes more more
             | scared to bring in libraries :(
        
           | seeknotfind wrote:
           | I thought they were talking about inside the libc
           | implementation. Though, if that was done and people call
           | getenv from async contexts, it could deadlock.
        
       | pjmlp wrote:
       | WG14 also doesn't want to provide safer string and array
       | manipulation libraries for decades, even Dennis Ritchie failed to
       | get fat pointers into ISO C, why should fixing this be any
       | different?
        
         | raverbashing wrote:
         | Yeah, at this point the languages and OSs should (IMHO) be
         | distancing them from the multiple craziness of the committee
        
       | grandinj wrote:
       | The right thing to do is to create a new thread safe api and
       | implement that and then standardize it. Hard but very unsexy
       | work.
        
         | xxs wrote:
         | The right thing is not to modify a global state once you have
         | started all the threads. If you need to do that, use your own
         | data structures. The case of getaddrinfo - would need a copy of
         | the entire env. in a thread safe manner, then return the
         | result. That would pretty much apply to anything that uses env
        
         | saagarjha wrote:
         | The person who fixed this would be incredibly sexy in my book.
         | Though, that probably tells you more about my own sexiness than
         | it invalidates your comment.
        
           | xxs wrote:
           | The sexy part is more like having an extremely
           | brushed/retouched image in a magazine - it's not real.
        
           | cryptonector wrote:
           | It is fixed in Solaris/Illumos. glibc just needs to copy the
           | approach.
        
       | DangerousDoctor wrote:
       | The essential problem is that there is no thread-safe way to
       | implement this while maintaining backwards compatibility --
       | applications can alter the environment block by changing the
       | environ global pointer, applications can also alter the
       | environment block by replacing individual pointers in the environ
       | array, applications can also alter the environment block by
       | altering the strings pointed to by the individual members of the
       | environ array, applications can also alter the environment block
       | by using setenv/putenv/etc.
       | 
       | Inserting a mutex into the setenv/getenv/etc. functions is
       | pointless because applications are explicitly allowed to modify
       | the environ pointer and array directly without any locking.
        
         | forrestthewoods wrote:
         | Yup. It's a bad, broken, and unfixable API.
         | 
         | The world really needs a new C standard library that doesn't
         | suck.
        
           | eviks wrote:
           | Or transition to post-C
        
             | xxs wrote:
             | Java has System.getenv()... and it's an unmodifiable
             | Map<String, String>. The real culprit it's the attempt to
             | modify it, not C.
        
           | rewmie wrote:
           | > Yup. It's a bad, broken, and unfixable API.
           | 
           | Is it, though?
           | 
           | The only argument I see is that an API can be misused. There
           | are ergonomics debates we can have about it, but a user
           | intentionally abusing an API wanting to do something that's
           | completely wrong is hardly indication the API is broken.
        
             | forrestthewoods wrote:
             | Exposing global mutable access to something and providing
             | no thread safe version counts as broken in my book. Your
             | book may differ.
             | 
             | A good API would probably only allow constant access. If
             | mutation is for some reason deemed necessary then it should
             | be through a separate set API and the return results of get
             | should be guaranteed safe.
        
               | rewmie wrote:
               | > Exposing global mutable access to something and
               | providing no thread safe version counts as broken in my
               | book. Your book may differ.
               | 
               | Please show me from which book you got the idea that env
               | variables are expected to change throughout the lifetime
               | of a process.
        
               | necovek wrote:
               | It seems they are arguing that setenv should not exist in
               | the first place: the fact it exists suggests it can and
               | should be used, and thus not be a footgun.
        
               | orwin wrote:
               | > the fact it exists suggests it can and should be used
               | 
               | I think most people argue about that. Just because it
               | exists doesn't mean it should be used imho.
               | 
               | I've used it exactly once, and that was a school exercise
               | where I had to write a Posix shell (most of a posix shell
               | actually), including built-ins. I do not see another use
               | case tbh.
        
               | jstimpfle wrote:
               | > can and should be used, and thus not be a footgun
               | 
               | It can and should be used in the cases where it makes
               | sense, with the restrictions that are documented. It's an
               | API that is fundamentally not thread-safe, you can not
               | use it "safely" (in the modern sense of using it after a
               | lobotomy, in any way that the compiler allows) in a
               | multi-threaded context.
               | 
               | There are other such APIs, and if those APIs were removed
               | it would hurt a lot of old software that is running
               | perfectly fine.
        
               | jeroenhd wrote:
               | > Please show me from which book you got the idea that
               | env variables are expected to change throughout the
               | lifetime of a process.
               | 
               | POSIX specifies two functions that alter environment
               | variables. It could've specified that env variables are
               | supposed to be mapped into a read-only memory page where
               | available to indicate that they shouldn't be altered, but
               | it didn't, and instead provided an explicit read/write
               | system.
        
               | rewmie wrote:
               | > POSIX specifies two functions that alter environment
               | variables.
               | 
               | The same POSIX spec you're citing also states in no
               | ambiguous terms that setenv is not thread-safe.
               | 
               | It's pointless to quote a section of a spec to try to
               | justify failing to comply with the very same section of
               | the very same spec.
        
               | jeroenhd wrote:
               | I'm not saying it doesn't. I'm just saying the spec
               | indicates that environment variables can change during
               | runtime.
               | 
               | The spec is the problem in my opinion, you can't
               | implement it in a way that doesn't introduce footguns.
        
               | wredue wrote:
               | Your book is wrong.
               | 
               | As with most things programming related, "it depends".
               | 
               | Functional programming has really, truly done untold and
               | massive damage to the industry. Fortunately, you are free
               | to unpoison your mind. I just hope people eventually do.
        
         | jcelerier wrote:
         | > Inserting a mutex into the setenv/getenv/etc. functions is
         | pointless because applications are explicitly allowed to modify
         | the environ pointer and array directly without any locking.
         | 
         | by that logic mutex themselves are pointless because nothing
         | ever forces you to use them, even in memory-safe languages you
         | can still access /dev/mem and change bytes? It's stil a useful
         | thing to have.
        
           | PhilipRoman wrote:
           | The difference is that modifying the environ pointer is
           | explicitly supported behaviour in the standard, poking
           | through /dev/mem is not.
           | 
           | Although I guess a middle ground solution wouldn't be too bad
           | either - most programs don't modify environ directly, so
           | POSIX could offer thread safety for the functions and make
           | multithreading through "environ" UB. This is already kind of
           | explained in the standard:
           | 
           | https://pubs.opengroup.org/onlinepubs/9699919799.2018edition.
           | ..
        
             | jcelerier wrote:
             | > The difference is that modifying the environ pointer is
             | explicitly supported behaviour in the standard
             | 
             | they just have to fix the standard. e.g. in my country they
             | manage to improve for instance the standard for electrical
             | plugs every three years, there is NO REASON posix cannot do
             | the same
        
         | atoav wrote:
         | Another problem is that it is hard to reason about security in
         | a C program if all environment variables could change at all
         | times.
         | 
         | If we are talking about the C application deciding when it
         | wants to rescan the environment that is something different,
         | but if your environment can change potentially before and after
         | you check it this opens you up for a heap of new attacks.
        
         | jeroenhd wrote:
         | I think the memory leak solution (copy over the env variables
         | to a new location in memory every time you call setenv and keep
         | the old pointers alive) will cause the fewest crashes.
         | 
         | I would personally go for the aggressive approach (release a
         | new major version of libc that detects multithreaded
         | environments and intentionally crashes out when calling
         | setenv() so people actually notice and fix their broken
         | programs) but I suspect not many people will agree with me on
         | that.
         | 
         | The API is not necessarily bad (it's just very 80s UNIX), but
         | the lack of enforcement of thread-safety causing all kinds of
         | bugs and crashes.
        
           | fch42 wrote:
           | Leaking memory is not a "solution". Ever. Maybe for a
           | commercial problem. But not for one in systems API design.
           | 
           | If you copy, provide a new interface. It's time-honoured and
           | proven in Unix to give *_r() ones in such a case.
        
             | jeroenhd wrote:
             | It solves hard crashes during DNS lookups by wasting a few
             | kilobytes of RAM. Seems like a fine solution to me. The
             | memory leak only occurs in circumstances where the program
             | would've crashed or started messing with random memory
             | anyway.
             | 
             | A proper solution would be to either nuke put/setenv() in
             | the C standard library or redesign the *env() calls
             | entirely, but that would break existing programs.
        
             | slaymaker1907 wrote:
             | There are circumstances where it's a perfectly valid
             | solution. For example, suppose you're trying to acquire the
             | lock on something to destroy it. It can be the lesser of
             | two evils just to leak that memory instead of just waiting
             | forever/a very long time to acquire that lock. You just
             | need to ensure that you aren't leaking memory too quickly
             | for whatever your constraints are. For example, most
             | programs wouldn't care about a 1kb/day leak of memory
             | because that would take a very long time to actually become
             | noticeable. Furthermore, there's pretty much always some
             | degree of memory growth just due to heap fragmentation (at
             | least if you're using a language like C which can't do
             | memory compaction via GC).
        
           | lelanthran wrote:
           | Yeah, but the program may not be broken until you, the glibc
           | maintainer, calls `raise(SIGSEGV)`.
           | 
           | Most programs using setenv call it before starting any
           | threads. That is not broken.
           | 
           | Detecting the linkage of thread support and crashing that
           | program on purpose is, frankly, a pathological way to fix a
           | non-broken program.
           | 
           | Besides which, your proposal won't work anyway, because this
           | remains a potential problem in single threaded programs
           | anyway: a program calling getenv, storing the result, and
           | then calling setenv on the same variable and using the
           | previously stored result will break anyway.
           | 
           | In summary, your proposal is broken in two different ways: 1)
           | it breaks well-defined programs, and 2) it fails to break
           | broken programs.
        
             | jeroenhd wrote:
             | I wouldn't implement it during linkage, obviously single
             | threaded putenv/setenv calls should still be permitted as
             | part of initialisation routines. Count the number if
             | children in /proc/self/task for all I care, the detection
             | needs to happen during runtime.
             | 
             | You're right that putenv/setenv are also horribly broken in
             | other ways, and doing multi thread detection doesn't
             | prevent those problems. In a perfect world we would just
             | kill off these two functions all together, replacing them
             | with either crashes or no-ops, but that'd be an even harder
             | sell.
        
               | lelanthran wrote:
               | > I wouldn't implement it during linkage, obviously
               | single threaded putenv/setenv calls should still be
               | permitted as part of initialisation routines. Count the
               | number if children in /proc/self/task for all I care, the
               | detection needs to happen during runtime.
               | 
               | That still breaks well-defined, non-broken programs which
               | _don 't_ call getenv/setenv in racing ways. There is no
               | way for you do a conditional-upon-threads mechanism
               | without false positives.
               | 
               | > You're right that putenv/setenv are also horribly
               | broken in other ways, and doing multi thread detection
               | doesn't prevent those problems. In a perfect world we
               | would just kill off these two functions all together,
               | replacing them with either crashes or no-ops, but that'd
               | be an even harder sell.
               | 
               | But you don't need to in order to meet your original goal
               | - breaking programs which _do_ call setenv /getenv in the
               | wrong order. Proposing to remove them altogether doesn't
               | fulfill the goal of finding the breakages immediately and
               | introduces breakages in existing programs.
               | 
               | My alternative: use LD_PRELOAD and provide alternative
               | setenv/getenv functions which raise SIGSEGV when setenv
               | is called on a variable more than once, and when getenv
               | is called on a variable that was already setted once. It
               | requires nothing more than a counter for each of
               | setenv/getenv per variable.
               | 
               | That finds programs which actually are broken, with no
               | false positives, and ignores threads altogether because
               | they don't matter under the counter system[1].
               | 
               | Best of all, you can implement this in an afternoon,
               | without needing to modify glibc, and then test it with
               | every single executable on your system to see which ones
               | break.[2]
               | 
               | [1] Since the caller knows they are not thread-safe
               | anyway, we aren't looking for the error where the caller
               | calls setenv concurrently in different threads. That's a
               | different problem.
               | 
               | [2] I would wager good money that few, if any systems,
               | will break under this test.
        
         | fch42 wrote:
         | you're right; in addition to that though, I'd like to highlight
         | that the use of some form of locking "inside" set/getenv would
         | gain you nothing at all. That is not because of setenv, but
         | because of getenv. The latter returns you a _pointer_. Whether
         | you call that a reference leak, an ownership breakage ... it's
         | not "yours" and when you have it, you don't "hold" it even if
         | getenv internally were to lock whatever the underlaying data
         | structure might be.
         | 
         | _That_ is the issue. You can only solve that if you change the
         | interface. Make a new one, getenv_r(), have it _copy_ the env
         | var value into a user-provided, user-owned buffer. In that
         | case, you can then assure the returned value is both point-in-
         | time correct and immutable. You can never achieve that with
         | getenv() because if you copy/make the returned pointer owned,
         | the owner needs to free it. which is a break from the current
         | behaviour and so not backwards-compatible ... and hence out of
         | the question.
         | 
         | Lamenting about how broken the interfaces might be and then
         | insisting that the implementation should be fixed is ...
         | "conveniently shortsighted". Not saying this isn't worth
         | fixing, but fix it the right way in the right place.
        
           | SAI_Peregrinus wrote:
           | Make getenv copy to an OS-provided buffer. Free at program
           | exit, like any other memory leak. There's an obvious
           | drawback, but it's not changing the function signature.
        
           | cryptonector wrote:
           | > you're right;
           | 
           | Wrong. As I've pointed out several times in this thread and
           | in other recent threads about getenv(), Solaris/Illumos has
           | an implementation that is lock-less to read (except when you
           | change `_environ`, then it takes a lock at most once until
           | the next time you change `_environ`). It's made safe by
           | "leaking", and by locking in the functions that write. It's
           | only unsafe if you replace the value of `_environ` repeatedly
           | _and_ free the old settings (which I 've never seen any code
           | do, and which if you do then you get what you deserve).
        
         | cryptonector wrote:
         | All of this is false, including the last statement. Proof by
         | existence: Solaris/Illumos has a thread-safe (leaking, though
         | the leaks are hidden from memory debuggers), lock-less
         | (whenever you _don 't_ write to `environ`) `getenv()`, and
         | thread-safe, locking `setenv()`/`putenv()`/`unsetenv()`:
         | https://src.illumos.org/source/xref/illumos-gate/usr/src/lib...
         | 
         | Yes, it can be done and has been done. glibc has no excuse.
        
       | krackers wrote:
       | Cool, ever since that rachelbythebay article [1] I was wondering
       | how different libcs handle the issue! Nice to see that someone
       | else confirming the behavior of apple'c libc. It's not mentioned
       | in the article, but while apple's libc seems to suffer from the
       | use-after-free issue, if I'm reading it right it does seem to
       | have locking for setenv/getenv [3]
       | 
       | [1] https://news.ycombinator.com/item?id=37908655 [2]
       | https://news.ycombinator.com/item?id=37952916 [3]
       | https://github.com/apple-open-source-mirror/Libc/blob/master...
        
       | xxs wrote:
       | The described problem (thread safety) for a global configuration
       | seems mostly a misunderstanding by the author.
       | 
       | The usual case for modifying a global state is: modify once, then
       | proceed (e.g. start new threads). Even if all the calls become
       | thread safe, the behavior would be inconsistent, still.
        
         | hddqsb wrote:
         | It is perfectly reasonable and consistent for one thread to set
         | an environment variable while other threads are reading
         | _different_ environment variables.
        
       | rpcope1 wrote:
       | I wonder if you could work around this by using LD_PRELOAD to
       | load in a shim around get_env and set_env. You'd still have the
       | problem of environ potentially getting mutated, but it very well
       | may solve the problem if it's limited to those two functions.
        
         | jeroenhd wrote:
         | You would still need to design a fix. You'll probably either
         | break programs that modify the pointer returned by getenv()
         | while doing so.
         | 
         | However, this only makes sense for other people's software
         | crashing on getenv() related memory bugs. If you control the
         | software, you can simply prevent the setenv() call yourself. No
         | need to LD_PRELOAD anything, just load a library or write your
         | own hooking code to work around the POSIX madness.
        
       | eqvinox wrote:
       | > This is a list of some uses of environment variables from
       | fairly widely used libraries and services. This shows that
       | environment variables are pretty widely used.
       | 
       | Widely used, yes. Used as in read. Why do any of these need to
       | change at runtime? And if they do - why are they environment
       | variables?
       | 
       | (NB: starting a new process is not "at runtime")
        
         | withinboredom wrote:
         | Changing the env during runtime is actually quite handy for
         | debugging and forcing the program into specific states.
         | 
         | Other than that, it can also be handy in k8s with a VPA. You
         | get more/less memory and then update the env to reflect that.
         | Your service picks up the env change and updates the runtime.
         | 
         | IIRC, there is/was some way to listen to those changes in C#,
         | and automatically update runtime settings.
        
           | eqvinox wrote:
           | > Other than that, it can also be handy in k8s with a VPA.
           | You get more/less memory and then update the env to reflect
           | that. Your service picks up the env change and updates the
           | runtime.
           | 
           | You... can't change the env from outside the process...
           | 
           | are you saying this is used by disjoint components within a
           | single process? Or is this just a misunderstanding?
        
             | withinboredom wrote:
             | You can spawn as many processes as you want in a container,
             | did you not know that?
             | 
             | But you only need access to the /proc/pid directory to
             | change another processes env.
        
               | eqvinox wrote:
               | > But you only need access to the /proc/pid directory to
               | change another processes env.
               | 
               | /proc/$pid/environ is not writable
               | 
               | (and as a matter of fact, due to how the environment
               | works, it cannot be writable.)
        
               | LegionMammal978 wrote:
               | But /proc/pid/mem is, if you like living dangerously!
               | You'd just have to parse the dynamic-linker metadata to
               | find where libc's environ is hiding. (Though statically-
               | linked programs would be tougher.)
        
               | SAI_Peregrinus wrote:
               | Spawning a new process doesn't _require_ changing the
               | parent 's environment.
        
             | pjc50 wrote:
             | > You... can't change the env from outside the process...
             | 
             | Not with that attitude you can't.
             | 
             | (OK, without the joke: you can do this with an interactive
             | debugger. But I think OP just meant "change it in the
             | container and then restart the child process")
        
           | rewmie wrote:
           | > Changing the env during runtime is actually quite handy for
           | debugging and forcing the program into specific states.
           | 
           | Most debuggers nowadays support altering variables at runtime
           | after hitting breakpoints. In the meantime this was the very
           | first time I ever heard anyone even considering changing env
           | vars at runtime, let alone use it to debug stuff. Sounds like
           | an ass-backwards way of going about debugging.
        
           | riffraff wrote:
           | > Changing the env during runtime is actually quite handy for
           | debugging and forcing the program into specific states.
           | 
           | Wait, why would this need to happen at runtime? I have used
           | env cars a lot to trigger specific cases but why would you
           | want to do this while the process is running from within the
           | process itself?
           | 
           | If you control the process you can start it with the right
           | env to begin with, no?
        
         | oefrha wrote:
         | Have you ever exported anything in a shell script? Sure you can
         | keep the necessary changes in local state and pass those to
         | execve(2)/execvpe(3)/posix_spawn(3), and that would be safe
         | AFAIK, but setenv(3) is there and more convenient if you're
         | unaware of the hidden dangers. Also that doesn't work for PATH
         | in execvp/execvpe, which is read from the current process; how
         | do you change search paths for execvp without setenv (short of
         | doing the search yourself)?
         | 
         | Edit: I just realized macOS/FreeBSD has execvP() that allows
         | passing a custom search path, so PATH is now safe, but without
         | a -e variant, everything else is again unsafe.
        
           | quickthrower2 wrote:
           | Shell scripts are different as you are likely exporting
           | environment variables and then starting new processes.
        
             | oefrha wrote:
             | Shell scripts aren't different from "real" programs using
             | exec or posix_spawn in this regard, it's just that fewer
             | people have done the latter than the former, so the former
             | is a more relatable example. "Real" programs spawn other
             | processes too you know, sometimes with modified environ.
        
               | quickthrower2 wrote:
               | So I understand this right, I thought the issue is about
               | multiple threads but in shell you wouldn't have this just
               | new processes.
               | 
               | In a program you could have either.
        
           | rcxdude wrote:
           | Shell scripts are not really prone to this problem because
           | AFAIK no shells are multithreaded: subshells and the like are
           | implemented with fork()
        
             | oefrha wrote:
             | Yes, I'm not saying shell scripts are affected, merely
             | using them as an example to answer the question "Why do any
             | of these [env vars] need to change at runtime?"
        
               | xxs wrote:
               | The discussion is only relevant for a shared unguarded
               | resource (the env) modified and read by multiple threads.
               | Single threaded operations are just fine.
        
               | Someone wrote:
               | > Single threaded operations are just fine.
               | 
               | Sort of. https://pubs.opengroup.org/onlinepubs/9699919799
               | /functions/g...:
               | 
               |  _"The returned string pointer might be invalidated or
               | the string content might be overwritten by a subsequent
               | call to getenv()"_
               | 
               | There's little you can do with a broken API, so Linux has
               | that 'feature', too. https://man7.org/linux/man-
               | pages/man3/getenv.3.html:
               | 
               |  _"The string pointed to by the return value of getenv()
               | may be statically allocated, and can be modified by a
               | subsequent call to getenv(), putenv(3), setenv(3), or
               | unsetenv(3)."_
               | 
               | FreeBSD chooses to leak memory, instead.
               | https://man.freebsd.org/cgi/man.cgi?getenv(3):
               | 
               |  _"Successive calls to setenv() that assign a larger-
               | sized value than any previous value to the same name will
               | result in a memory leak. The FreeBSD semantics for this
               | function (namely, that the contents of value are copied
               | and that old values remain accessible indefinitely) make
               | this bug unavoidable"_
        
           | xxs wrote:
           | >Have you ever exported anything in a shell script
           | 
           | So, shells use a single thread that can safely modify the
           | environment - then start new child processes by the same
           | thread. The child processes get a =copy= of the said
           | environment. That's a textbook example how to use env.
           | 
           | Starting multiple threads on your own, then modifying env
           | should be considered a textbook example how not to do things
           | - env is not intended for interprocess communication.
        
           | anttihaapala wrote:
           | In the case of execvp, you would pretty much be required to
           | _fork_ before it and _then_ you could change PATH.
        
             | oefrha wrote:
             | Yeah, fork()+immediately exec() should be safe, but those
             | use cases are almost always better with posix_spawn(), due
             | to issues with fork(), like memory copying. And if you want
             | to use the p-variant of posix_spawn you're back to setting
             | PATH beforehand. These APIs designed back in Stone Age just
             | aren't very well thought-out wrt concurrency and high
             | performance.
        
               | jstimpfle wrote:
               | Why would you change the path just to call
               | posix_spawnp()? If you want that control, that is an
               | indication that you want to specify the path to the
               | executable, not use PATH.
        
           | eqvinox wrote:
           | Shells don't generally use the libc environment; this would
           | be too limited to implement even standard POSIX shell
           | functions with local variables, or non-exported variables.
           | It's much easier to set up purpose-built data structures to
           | track variables, and construct an argument for execve().
           | 
           | (Edit: removed unneeded pointing out execve)
           | 
           | Also shells generally have their own program search anyway
           | since they need to support built-in commands. It's not
           | particularly hard to implement PATH search.
        
             | oefrha wrote:
             | Once again, the OP asked why setenv is even needed, which
             | implies they likely don't have much experience with
             | spawning processes in low level languages, so I used the
             | more familiar shell script setting as an illustrative
             | example, as setenv is analogous to export in POSIX sh. I
             | never said export is implemented with setenv, or shell
             | script exports aren't thread safe. Unfortunately, replies
             | hung up on shell scripts.
             | 
             | As for I'm not aware of execve etc... You need to re-read
             | my comment which clearly mentions execve, execvep,
             | posix_spawn, as well as implementing PATH search on your
             | own.
        
               | eqvinox wrote:
               | > Once again, the OP asked why setenv is even needed,
               | which implies they likely don't have much experience with
               | spawning processes in low level languages
               | 
               | I am the OP and your assumption is incorrect. You may
               | consider why the post ends with:                 (NB:
               | starting a new process is not "at runtime")
        
               | Izkata wrote:
               | "export" in shells has to change the environ before they
               | start the new process. It may not be "at runtime" for the
               | new process, but it would be for the shell.
        
               | account42 wrote:
               | Wrong, export does not _have_ to change the shell 's
               | environment at all. There are plenty of exec variants
               | that accept a different environment pointer, same for
               | posix_spawn.
        
         | dmytroi wrote:
         | Mostly integration, for example some library can only be
         | configured via env variables, but a developer might want to
         | configure it from with-in the app it's integrated into and used
         | from.
         | 
         | Also, few weeks ago I found a use for them when trying to pass
         | configuration from Java/Kotlin to C++ library to be used during
         | static constructors (invoked during dlopen) on Android, because
         | at that phase native code cannot call back to JVM.
        
           | guappa wrote:
           | > for example some library can only be configured via env
           | variables
           | 
           | library has already loaded when you call setenv, so what
           | you're saying doesn't work in most cases.
           | 
           | It seems to be a need to use poorly written libraries. You
           | might consider fixing them instead.
        
             | jonhohle wrote:
             | I agree that would be a poor implementation, but the
             | library could be loaded at runtime using dlopen or
             | equivalent.
             | 
             | This issue with that "interface" is the environment is
             | process global. If the library is being loaded dynamically
             | (specifically for some task) it would seem that the
             | parameters are local to that task and should be taken by
             | some reentrent init method. Alternatively, the process
             | could be forked and environment set in the child without
             | concern for thread safety or polluting the environment
             | (think of the children!).
        
               | guappa wrote:
               | The only library I've seen to use env vars is libc, which
               | uses them to decide how malloc should behave for example.
        
             | the_svd_doctor wrote:
             | Some libraries behaviour/API can be tweaked with env var.
             | env var are read at runtime not loading time.
        
         | wzdd wrote:
         | Indeed -- it's an extremely unconvincing list, because any
         | sensible library which may require a library user to set env
         | variables (which includes all the ones I checked on the list)
         | can also be configured without setting env variables. Most of
         | the time the env variables set fallback defaults for parameters
         | not specified by the caller. In these cases, the sane thing to
         | do, regardless of the thread-safety of setenv(), is simply to
         | supply the parameter in code.
         | 
         | The only exception is things like debug logging, which is
         | unlikely even to work dynamically.
         | 
         | On the other hand, setenv() is clearly broken in modern code,
         | particularly in a library context, and the man page (at least
         | on my Linux machine) does not make that particularly obvious --
         | "Thread safety: MT-Unsafe" is the only note, with a reference
         | to attributes(7) for more information. It could definitely be
         | made more obvious.
        
         | qwertox wrote:
         | Just asking: If you pass security tokens via environment
         | variables to the process, doesn't it make sense to delete them
         | from within the process after they have been used?
        
           | eqvinox wrote:
           | Yes it _would_ make sense, but no there is no way to actually
           | ensure they have been deleted. A trivial but nonetheless very
           | common case would be if your process is started with a
           | wrapper shell script. But even just within your process,
           | there is no guarantee at all against some random library (or
           | the kernel) making a copy of the entire environment.
           | 
           | If you want to pass secrets into a process at startup, I
           | would strongly recommend passing a pipe as an additional open
           | file descriptor (e.g. fd #4, but this _FD number_ you can
           | then put in an env variable) and writing it onto the pipe. It
           | can only be read once, and you can control where the value
           | propagates.
        
             | Zandikar wrote:
             | Damn, learning new tricks everyday, thanks for the tip.
        
         | leoh wrote:
         | Testing, for one thing...
         | 
         | I mean YES you can factor your code (tests, whatever) to make
         | this a non-issue but supposing some person wrote some code 10
         | years ago in an OSS project or on your team and you start
         | banging into this issue.
         | 
         | It's not going to be trivial to unwind let alone find the root
         | issue.
         | 
         | Let's start fixing things like this for our future selves,
         | right?
         | 
         | Digging heels in and saying "eh, you just got to learn this one
         | weird quirk.. oh yeah this other one too.." is kind of a fun
         | glass bead game until it's not; as is not a winnning way to
         | endear hearts and minds.
        
       | pitdicker wrote:
       | This also caused a lot of trouble for time libraries in Rust. The
       | two foundational libraries, chrono and time, rely on localtime_r
       | to get the local time instead of the clock value in UTC.
       | localtime_r reads the TZ environment variable (and optionally
       | others like TZ_DIR). Rust declares it safe to modify the
       | environment, while POSIX declares it unsafe.
       | 
       | CVE-2020-26235, RUSTSEC-2020-0071 and RUSTSEC-2020-0159 where
       | opened against the crates. That left the Rust ecosystem with a
       | pretty much unsolvable issue for many months. Chrono went with
       | the solution to parse the timezone database of the OS natively
       | and read the environment using the Rust locks. Time tries to
       | detect if the libc version has thread-safety guarantees to access
       | the environment, and otherwise panics if there are multiple
       | threads.
       | 
       | More reading: https://docs.rs/chrono/latest/chrono/#security-
       | advisories
        
         | rewmie wrote:
         | > Rust declares it safe to modify the environment, while POSIX
         | declares it unsafe.
         | 
         | There's your problem right there, and it ain't the behavior
         | specified in the standard.
        
           | pitdicker wrote:
           | You are right. POSIX specifies one thing, the standard
           | library in Rust and some other libraries specifies something
           | different. 'Safe to use unless there are other threads' is
           | not really something you can or want to encode in a type
           | system.
           | 
           | But libraries and users are caught in the middle.
        
             | eptcyka wrote:
             | It is safe to use the Rust standard library interface.
        
               | pitdicker wrote:
               | Unless the environment is also touched by a part of the
               | program written in Go, Julia, I don't know... The lock is
               | not shared across languages.
        
               | the_mitsuhiko wrote:
               | > The lock is not shared across languages.
               | 
               | Which just to be clear: it cannot without changing the
               | standard. There is really nothing anyone can do without a
               | change in the standard.
        
               | bbatha wrote:
               | However, to access any of those languages from rust you
               | need to use unsafe.
        
               | eptcyka wrote:
               | There is no safe way to access the environment, even if
               | you mark this API unsafe, what are you going to do?
        
               | bbatha wrote:
               | You can safely access the environment so long as you use
               | the rust apis and don't have unsafe code that calls
               | `setenv` without synchronization.
        
           | Sytten wrote:
           | There is an issue in the std to name setenv unsafe but that
           | is a breaking change so it's complicated.
        
             | kibwen wrote:
             | One problem is that marking that function as unsafe would
             | unfairly penalize platforms like Windows that don't have
             | this issue. Even if it turns out to be the least-bad
             | compromise solution, it sure would be nice if we could have
             | nice things.
        
           | kibwen wrote:
           | But Rust doesn't declare it safe to modify the environment in
           | general. It declares it safe to modify the environment using
           | std::env::set_var, which uses locking internally. The docs
           | explicitly note that there's potential unsafety if non-Rust
           | code modifies the environment:
           | 
           |  _" Note that while concurrent access to environment
           | variables is safe in Rust, some platforms only expose
           | inherently unsafe non-threadsafe APIs for inspecting the
           | environment. As a result, extra care needs to be taken when
           | auditing calls to unsafe external FFI functions to ensure
           | that any external environment accesses are properly
           | synchronized with accesses in Rust."_
           | 
           | https://doc.rust-lang.org/std/env/fn.set_var.html
           | 
           | Ultimately the problem here is with Posix. Rust can only do
           | so much to paper over the pitfalls in the underlying
           | platform.
           | 
           | Although note that if you replace libc with eyra, then the
           | behavior goes from thread-unsafe to "just" a memory leak:
           | https://blog.sunfishcode.online/eyra-does-the-impossible/
        
         | SkiFire13 wrote:
         | > Rust declares it safe to modify the environment, while POSIX
         | declares it unsafe.
         | 
         | Arguably, Rust declares it is safe to modify the environment
         | through its stdlib methods. The tricky detail is that this
         | means it is unsafe to read/modify the environment through other
         | means, but sometimes this is really hard to avoid.
        
           | asveikau wrote:
           | > The tricky detail is that this means it is unsafe to
           | read/modify the environment through other means, but
           | sometimes this is really hard to avoid.
           | 
           | If you have C and Rust in the same process and C code calls
           | setenv(3), for one ...
           | 
           | Edit: why downvotes? It's very typical to link to C libraries
           | which may call the libc environment stuff ... My point is you
           | can't control library code as easily, if it's some dependency
           | of a dependency eventually calling libc.
        
           | manwe150 wrote:
           | Does rust also add an pthread_atfork handler? Otherwise, it
           | seems likely still unsafe for rust to claim to support
           | calling fork (for execv) or posix_spawn, as most libc call
           | realloc on the `environ` contents, but do not appear to take
           | any care to ensure that (v)fork/posix_spawn doesn't happen
           | concurrently with that. Worse yet, the `posix_spawnp` API
           | takes an `envp` parameter and expects you to pass it the
           | global pointer `environ`, which is completely unsynchronized
           | across that fork call. It is not obvious to me that this is a
           | security gap, but certainly it seems to me that this would
           | violate rust's safety claim, if it is not taking added
           | precautions there.
           | 
           | The Apple Libc appears to just unconditionally drops the
           | environ lock in the child (https://github.com/apple-oss-
           | distributions/Libc/blob/c5a3293...), while glibc doesn't
           | appear to even bother with that (https://github.com/bminor/gl
           | ibc/blob/6ae7b5f43d4b13f24606d71...)
        
             | connicpu wrote:
             | I don't think Rust's stdlib provides any kind of safe way
             | to call just fork(), it only has methods for creating child
             | processes because that's the only interface that works on
             | every supported Tier 1 platform. Calling fork is always
             | going to necessarily be an unsafe{} libc call or syscall,
             | and the caller will have to take care to ensure nothing
             | funny is going on.
        
               | namibj wrote:
               | There are OS specific APIs where needed, probably also
               | for threads.
        
               | connicpu wrote:
               | `std::os::unix does` adds some additional methods in that
               | vein like exec(), but no fork(). `std::os::linux` only
               | adds the ability to get `pidfd`s for child processes you
               | create. There's simply no safe way for the stdlib to
               | provide safe fork() without knowing a lot of things about
               | how you're going to set up your process and what other
               | libraries you might pull in that may not be fork-safe. If
               | you're willing to ensure you only call it in a safe way,
               | you can still call fork, the language just cannot
               | guarantee it will be safe, same as when you're doing it
               | in C.
        
         | tsukikage wrote:
         | If you are modifying TZ while another thread is relying on it
         | to calculate time, those threads are racing, and hiding the
         | crash won't solve the race: the reading thread will now
         | randomly return values in the wrong timezone instead,
         | subsequent code will use it in whatever operation it is it
         | wanted the time for, the end result will be garbage, and this
         | will be super hard to debug because there won't be a loud
         | obvious crash pointing to the root cause and also depending on
         | the winner of the race the symptoms will be
         | random/intermittent.
         | 
         | Fix the high level race, and suddenly you no longer need the
         | low level mutex.
        
           | formerly_proven wrote:
           | > If you are modifying TZ while another thread is relying on
           | it to calculate time
           | 
           | environ is a single contiguous null-terminated segment of
           | null-terminated key-value pairs; any change of any
           | environment variable might reallocate it, changing the
           | address and invalidating the old address.
           | 
           | Also why it's a bad idea to store the pointer returned by
           | getenv, it might be invalidated by any environment
           | modification.
        
             | Sprocklem wrote:
             | The strings in environ is only contiguous at program start.
             | In every libc I'm aware of, both putenv and setenv replace
             | only the specified key-value pair (and possibly environ
             | itself, if it needs to be larger) and should not affect the
             | address of any other environment variables. It is still
             | thread-unsafe, but far more limited in its unsafety.
        
               | comex wrote:
               | In current glibc master, it's unsafe for any
               | putenv/setenv to race with any getenv, even if the
               | variable names are different, for two reasons. (Note that
               | multiple calls to putenv/setenv are serialized by a lock,
               | but getenv does not take the lock.)
               | 
               | (1) setenv resizes environ using realloc, which frees the
               | old buffer, so getenv can end up reading from a freed
               | array.
               | 
               | (2) The code does not use atomics or memory barriers, so
               | on weakly ordered architectures, getenv could observe
               | another thread's write to one of the pointers in the
               | environ array, or to the environ pointer itself, while
               | observing stale values for the memory behind it.
               | 
               | In both cases, getenv could end up returning a bogus
               | pointer or just crashing.
               | 
               | However, those issues _can_ be fixed without changing the
               | API, and at least Apple 's libc seems to do the right
               | thing here. On the other hand, other libcs such as musl,
               | FreeBSD libc, and even OpenBSD libc (!) do worse than
               | glibc and have no locking at all.
               | 
               | If someone could convince the maintainers of all those
               | libcs to add a lock and make getenv/setenv 'thread safe
               | as long as you're not racing on the same variable name',
               | then that would be a good starting point. But in my
               | opinion it would still be a half-measure. We need a fully
               | thread-safe environment.
               | 
               | And honestly, it might be _easier_ to convince the
               | maintainers to add a full solution than a half-measure,
               | even if it involved API changes. (But it may be hard
               | either way. Rich Felker showed up in a Rust thread a
               | while back and was highly negative on the idea of making
               | any changes to musl.)
        
               | mjevans wrote:
               | IMHO - I am sympathetic to the BSDs, Apple (presumably
               | forked BSD), and musl approach.
               | 
               | In what sane world would someone reasonable treat
               | (initial shell) Environment Variables as a proper ACID
               | complaint database? About the 'best' solution I can see
               | for preventing segmentation faults related to resizing
               | the env array during runtime is to defer reclaiming freed
               | memory chunks until after all in-process threads have
               | been given another uninterrupted timeslice to process.
               | Even that wouldn't be 100% but probably would cover any
               | not pathological case.
        
               | bensecure wrote:
               | atomic ordering is very easy if you don't care about
               | performance. So on the other hand we could ask why
               | get/put/setenv have such a terrible need for performance
               | that we can't afford to put a simple lock around them.
        
           | asveikau wrote:
           | > the reading thread will now randomly return values in the
           | wrong timezone instead, subsequent code will use it in
           | whatever operation it is it wanted the time for, the end
           | result will be garbage,
           | 
           | I really strongly disagree with how bad you seem to think
           | this is. If you are designing your application to use the
           | timezone and modify it at the same time, it is a _totally
           | natural_ consequence that you may see the previously set time
           | zone in a timing dependent fashion. That 's the nature of the
           | beast. To "solve this" is seemingly to make that other thread
           | capable of time travel or something. It read something before
           | it was written, and acted on it. Reasonable!
           | 
           | The harmful data races are when you read _intermediate_
           | results. If setting the timezone is a multi-step process, or
           | involves manipulation on complex data structures with
           | pointers that might be deallocated, then you are in grave
           | danger. Seeing a previously valid result is ... I honestly
           | don 't know how you'd expect to solve it without threads
           | being able to see the future, or some other unreasonable
           | expectation.
        
         | lifthrasiir wrote:
         | To be exact, it was Chrono and time-rs 0.1, while time-rs 0.2
         | and later was rewritten from the scratch and didn't have that
         | issue... because the new time-rs didn't yet support general
         | time zones other than fixed offsets. The accepted solution for
         | Chrono surprised me a lot, because as far as I reckon it was
         | the hardest solution. (Disclaimer: I'm the original author of
         | Chrono.)
         | 
         | But a bad API design doesn't end at environment variables. Many
         | POSIX systems rely on `/etc/localtime` to define the system-
         | wide time zone, and every `localtime` call has to check if the
         | file has been changed or not because there is no way to
         | subscribe to the system-wide time zone change event. Of course
         | there is a cache, but many libcs call at least `stat` per each
         | `localtime` call AFAIK. I had even experienced a possible glibc
         | bug due to the lack of guard against I/O error during this
         | process [1]. Windows got this right, I can't see why POSIX
         | couldn't do the same when it does have an asynchronous signal
         | delivery mechanism anyway.
         | 
         | [1] https://news.ycombinator.com/item?id=9953898
        
           | pitdicker wrote:
           | My respects for your work on Chrono!
           | 
           | And you are right about time-rs (or I think you are). Version
           | 0.1 was never fixed, and version 0.3 does the OS and thread
           | count checks.
           | 
           | It does have some advantage for chrono to do everything in
           | Rust: it can now return two results for ambiguous local time
           | during DST transition fold, and properly return None during a
           | transition gap.
        
             | lifthrasiir wrote:
             | > My respects for your work on Chrono!
             | 
             | Thank you. To be frank as a first-time maintainer I did a
             | mediocre job---my biggest regret for Chrono is that I did
             | know most forthcoming issues beforehand and yet didn't take
             | enough time to make them public and explicit so that
             | someone else could prepare for the future.
        
           | wavesquid wrote:
           | I believe systemd has a way to subscribe to timezone changes.
        
           | account42 wrote:
           | > Many POSIX systems rely on `/etc/localtime` to define the
           | system-wide time zone, and every `localtime` call has to
           | check if the file has been changed or not because there is no
           | way to subscribe to the system-wide time zone change event.
           | 
           | But you _can_ subscribe to file change events so why not do
           | that?
        
             | lifthrasiir wrote:
             | I did seriously consider inotify back in time, but in order
             | to take advantage of inotify I had to parse _all_ binary
             | TZif files (because otherwise I still had to call
             | `localtime` that would `stat` every time anyway). It was so
             | cumbersome, that was only halfway finished when I stepped
             | down as a maintainer. Hence my surprise when I learned that
             | someone actually did implement all of them.
        
           | mprovost wrote:
           | Unix was "designed" (if you can call it that) a long time
           | before it was possible to move a running system between
           | timezones. So many of these decisions were made in completely
           | different circumstances (I almost said environment) and are
           | laying around like old WWII bombs just waiting for someone to
           | dig one up.
        
             | mcguire wrote:
             | Presumably, GNU Hurd will fix these issues without
             | introducing fun new ones.
        
           | mjevans wrote:
           | Offhand, and a quick google search; I was unable to find the
           | exact definition / specification for how time-zone data must
           | be obtained rather than how it happens to conventionally be
           | obtained.
           | 
           | It is entirely reasonable that any of the following _might_
           | be valid behavior.
           | 
           | * Simple but syscall heavy approach which re-reads the env,
           | and possibly /etc/localtime each call and has no stability.
           | (Results may mutate as other processes / threads change
           | things.)
           | 
           | * Same as above, then caches the decision result for some
           | application specific reasonable time; which may be until the
           | application exits.
           | 
           | * The elsewhere mentioned stat / inotify approaches that only
           | track updates to /etc/localtime (and ideally update the
           | cached decision result when notified).
           | 
           | All approaches seem valid. It's sort of like the hostname or
           | any other system level configuration where a reboot may be a
           | reasonable expectation for a complete update.
        
         | Sytten wrote:
         | The shame of those CVE is that it created a split in the rust
         | community between chrono and time. For a time it looked like
         | people were all moving to time (which handling on TZ is a bit
         | stupid IMO since it just refuses to work if there is more than
         | one thread). But with chrono 0.4 now things are stale and there
         | is no clear winner anymore.
         | 
         | I would argue that those splits are in great part responsible
         | for the feeling that rust is hard to learn. I remember to have
         | had to dig into pretty complex time code to understand why it
         | broke our program that relied on timezone when we switched from
         | chrono to time. It hinders your productivity for sure even if
         | you learn the how.
        
       | vintermann wrote:
       | Using environment variables for global mutable state isn't
       | exactly good practice, is it?
       | 
       | I can't think of any time I had wanted to do that.
       | 
       | What exactly are the programs that break if this changes?
        
         | okr wrote:
         | I can think of using libraries, that get their config through
         | environment variables. So you start your program, modify
         | environment, and then start up the rest.
        
           | quickthrower2 wrote:
           | Does the "and then" protect you here?
        
             | okr wrote:
             | No, not really. You never know when the library reads the
             | enviroment variables. But i can make sure, that the parts
             | under my control do not modify them after start.
        
           | johanbcn wrote:
           | Libraries shouldn't be reading environment variables, that
           | responsibility should be only for the application.
        
             | kuon wrote:
             | It is common, out of my head SDL and pipewire do it.
        
             | lelanthran wrote:
             | They shouldn't, but if even the C standard library reads
             | environment variables, then the bar is set pretty low for
             | library designers.
             | 
             | Look up the mess around LOCALE sometime.
             | 
             | The best the programmer can do is perform all the setenv
             | calls before spawning any threads or making any library
             | calls.
        
       | Someone wrote:
       | I may misunderstand what "Extension to the ISO C standard" can
       | mean, but _getenv_ isn't thread-safe, either.
       | 
       | https://pubs.opengroup.org/onlinepubs/9699919799/:
       | 
       |  _"The getenv() function need not be thread-safe"_
       | 
       | I expect most if not all implementations are more robust.
        
         | numeromancer wrote:
         | It says more than that:
         | 
         | "The returned string pointer might be invalidated or the string
         | content might be overwritten by a subsequent call to getenv()"
         | 
         | You don't even need threads for this to be unsafe; another call
         | to the same function may invalidate earlier-gotten pointers. I
         | don't see how to interpret this as anything but broken.
        
       | yason wrote:
       | Come on! Of all C/Posix thread-safety issues in circulation this
       | can plausibly be considered the most moot one.
       | 
       | Environment variables are not meant to be an inter-thread
       | communication channel and the documentation that points out
       | setenv() is not thread-safe is very much a fair shot.
       | 
       | You rarely, if ever, need to setenv() anything maybe unless
       | you're a shell. For spawning children execve() already takes an
       | envp parameter. For debugging I think I've mostly set in-process
       | environment variables manually from gdb.
       | 
       | Further, because environment variables are an interface between
       | the process and its environment you typically read environment
       | variables at start and cache the parsed values in some internal
       | location. If you need to change that global state on the go you
       | should do it using your own internal variables instead of
       | recycling it through the environment and having the program
       | threads repeatedly getenv() the updated values.
        
       | usrbinbash wrote:
       | > C Doesn't Want to Fix It
       | 
       | Or: C knows that _it doesn 't need fixing._
       | 
       | How often do I need to `setenv()` anything? The answer is "Never"
       | in the vast majority of programs, because ENVVRS are usually read
       | rather than set, so this issue is nonexistent for them.
       | 
       | For the vast majority of the small amount of programs that
       | actually need to use `setenv()`, the answer is: "Maybe once or
       | twice during the entire lifetime of the process, and then only at
       | the very start, probably even before running any threads",
       | meaning this issue is nonexistent for them as well.
       | 
       | So, is there a potential issue with thread safetey? Yes. Does it
       | matter given where and under what circumstances it occurs? Not
       | really.
       | 
       | > such as Go's os.Setenv (Go issue)
       | 
       | Here is the link to the "issue":
       | 
       | https://github.com/golang/go/issues/63567
       | 
       | What kind of actual real life production code would continuously
       | set envvars while simutaneously calling a function that tries to
       | read the environment?
       | 
       | Yes, this is a footgun. But even the issues author acknowledges,
       | in the issue thread:                   Realistically: this is a
       | pretty rare problem, and documenting          it is probably a
       | fine solution. This is probably going to cost         someone
       | else a couple of days of debugging every couple of         years
       | 
       | > It has wasted thousands of hours of people's time, either
       | debugging the problems, or debating what to do about it.
       | 
       | Source?
        
         | MrBuddyCasino wrote:
         | Unpopular opinion: Neither Go's _os.Setenv_ nor Rust 's
         | _std::env::set_var()_ should exist. I was pleased to find that
         | Java only has _System.getEnv()_ , but not a setter.
        
           | usrbinbash wrote:
           | That is an unpopular opinion for the simple reason that some
           | programs do in fact need to set envvars, particularly
           | programs that will start child processes.
        
             | MrBuddyCasino wrote:
             | That is still possible using the _java.lang.ProcessBuilder_
             | API: you can launch a child process and give it a modified
             | environment, but just at launch time. This side-steps the
             | issue.
        
             | ryan-c wrote:
             | Programs that need to set environment variables for child
             | processes should use `execvpe` or `execle`.
        
               | account42 wrote:
               | Or posix_spawn / posix_spawnp.
        
             | badsectoracula wrote:
             | Or programs that rely on libraries that for some
             | unfathomable reason expose some functionality only via
             | environment variables without an API.
             | 
             |  _looks at SDL_
        
               | account42 wrote:
               | There is SDL_SetHint [0] which doesn't modify the
               | environment but instead changes the value internally to
               | SDL only.
               | 
               | [0] https://wiki.libsdl.org/SDL2/SDL_SetHint
        
             | pajko wrote:
             | Nope, execve() and friends ending in 'e' accept a pointer
             | to a completely new set of environment variables, no need
             | to do setenv. Windows has _execve() too.
        
               | usrbinbash wrote:
               | The fact that there is an alternative doesn't change the
               | fact that a lot of software relies on the worse method to
               | work.
        
               | naniwaduni wrote:
               | It makes it a pretty silly idea to invite people writing
               | new programs in a new language to use that method,
               | though.
        
               | rerdavies wrote:
               | A lot of software doesn't modify the environment when
               | exec-ing.
        
             | kevincox wrote:
             | I think the probably is really that there are 2 times where
             | you should be setting env vars 99% of the time.
             | 
             | 1. Right after program startup before any threads are
             | spawned.
             | 
             | 2. After a fork before an exec.
             | 
             | In both cases it can be known that no threads are running.
             | (Ok, for 1 it can actually be non-trivial if you have code
             | before main or if you call functions that spawn helper
             | threads, but let's assume that you can know this).
             | 
             | However no languages actually have ways to enforce this. So
             | the APIs can be called at any time and are huge footguns.
             | 
             | I think that the proposed improvement of `getenv_s` is
             | great. It is cheap and easy to use, then software can
             | slowly migrate off of the less safe stuff. You can imagine
             | that if libc stopped using `getenv` internally most of this
             | problem would be solved.
        
               | Blikkentrekker wrote:
               | No, in many cases one needs to set them interactively.
               | 
               | Consider for instance something as simple a implementing
               | a shell. Such a program needs to be able to set the
               | environment based on user interaction and this change
               | needs to show up in /proc/$pid/env.
        
               | __david__ wrote:
               | Why does a shell need its current environment to be
               | visible in /proc/$pid/env (as opposed to just its initial
               | environment)?
        
             | rerdavies wrote:
             | If you need to set environment variables for child
             | processes in a thread-safe manner, use execvpe or execle.
        
           | jeroenhd wrote:
           | I think there are good reasons for Setenv and set_var to
           | exist, but if they are implemented, they shouldn't be
           | wrappers around POSIX' shitty API and implement their own
           | environment variable system instead (one of which the initial
           | variables are possibly initialised by a call to getenv to
           | make them compatible).
           | 
           | There's no reason why these languages need to restrict
           | themselves the same way C does.
        
             | MrBuddyCasino wrote:
             | The bug in Golang was because DNS lookups interact with the
             | C library, which looks up environment variables. As long as
             | everything happens in Goland, there is no problem - but
             | this is simply not good enough.
        
               | jeroenhd wrote:
               | Go makes the assumption that the DNS lookups are thread-
               | safe, but it doesn't have that guarantee (or the C
               | library is spec-incompliant, but I doubt that). It's
               | still something Go can fix.
               | 
               | You can't fix C libraries loaded into Go programs (i.e.
               | and external library calling C's setenv, or I suppose
               | explicit FFI calls by the user), but Go can be
               | responsible for the APIs it calls itself. That may
               | necessitate writing a thread-safe alternative for DNS
               | lookups, or documenting and/or adding compile time
               | warnings that threaded programs doing DNS lookups will
               | just crash sometimes, but the language's standard library
               | can still make it much harder for developers to write
               | buggy code.
        
               | MrBuddyCasino wrote:
               | My impression is that this was Golang's plan from the
               | start - this is why they didn't want to use the C stdlib
               | at all, issuing the Kernel syscalls directly from the
               | Golang runtime. A good idea, but then they had to
               | backpedal to solve issues such as DNS resolution
               | respecting certain OS settings, and this bug is a symptom
               | of that.
        
               | fch42 wrote:
               | Yes, there are certain things in UNIX which _are_ part of
               | the standard (POSIX / IEEE1003) but _aren't_ usually
               | implemented as system calls.
               | 
               | Name lookups (whether user identities or network
               | resources) are the biggest chunk of these. You have a
               | "choice" as a user/programmer here. Say, the existing
               | name lookup interfaces in most libc implementations don't
               | do DNS-over-HTTP (DoH); you can implement that yourself
               | and just use the addresses returned by your
               | library/package where the system calls ... want
               | addresses.
               | 
               | If you have the go stance, go all the way. Don't say "the
               | C runtime is sh*te but I really really really want that
               | one particular teensy tiny bit of it could someone
               | somewhere somehow please do something to make it a little
               | less sh*te". Legacy baggage is a burden and backwards
               | compatibility shackles you. The C/Unix interfaces are
               | full of this, and with the hindsight of 50 years noone
               | today, not even "C programmers", would implement them all
               | the same way again. But that doesn't mean their behaviour
               | can be arbitrarily changed.
        
               | o11c wrote:
               | > Go makes the assumption that the DNS lookups are
               | thread-safe
               | 
               | DNS functions _are_ thread-safe.
               | 
               | The thing people aren't understanding here is when you
               | set loose nasal demons (such as by calling `setenv` in a
               | multithreaded program), they can cause problems even in
               | safe code.
        
               | dwattttt wrote:
               | If a function is safe only if everyone else you rely on
               | never calls a particular function, it's not that safe.
               | Certainly less safe than other functions guaranteed not
               | to result in crashes if you use them right.
        
             | OskarS wrote:
             | That doesn't fix the problem: these languages has to be
             | able to coexist peacefully with C in the same address
             | space. You can have a dynamically linked library written in
             | Rust in a host program written in C, you can use C
             | libraries in Go, etc.
             | 
             | Even if that wasn't an issue: this is a bug in C as well!
             | You should absolutely be able to use setenv/getenv safely
             | in multi-threaded C, it's insanity that you can't.
        
           | grodriguez100 wrote:
           | I fully agree with your unpopular opinion.
        
         | jeroenhd wrote:
         | > Or: C knows that it doesn't need fixing.
         | 
         | People don't like APIs that can randomly crash your program
         | while there's no good technical reason for why they should. Why
         | not fix the problem? People like you, who have no issues with
         | the current implementation, won't see any regressions because
         | you're already a good citizen, and myriad other programmers
         | whose programs do occasionally crash because of this will be
         | helped.
         | 
         | > So, is there a potential issue with thread safetey? Yes. Does
         | it matter given where and under what circumstances it occurs?
         | Not really.
         | 
         | "The unpredictable crashes only happen very rarely" doesn't
         | mean the crashes go away.
         | 
         | > What kind of actual real life production code would
         | continuously set envvars while simutaneously calling a function
         | that tries to read the environment?
         | 
         | The reproduction sample calls setenv in a loop so the issue can
         | be reproduced. A single setenv anywhere in the code is enough
         | to trigger the crash, but then you would get one of those "you
         | need to run the program a million times to reproduce it" bug
         | reports that gets pushed down the line.
        
           | usrbinbash wrote:
           | > Why not fix the problem?
           | 
           | Because doing so breaks backwards compatibility, simple as
           | that.
           | 
           | The problem isn't even that `setenv` isn't thread save. The
           | problem is that `getenv` returns a `*char` directly into the
           | environment memory space. Many many many programs rely on
           | that being the case.
           | 
           | > People like you
           | 
           | People like me would like every software to be perfect, but
           | that's not the world we live in, so we are forced to be
           | pragmatic. When fixing something causes more problems by
           | breaking backwards compatibility promises, than it prevents,
           | then there is no good argument for a fix, and the correct
           | approach is to say "yes, this sucks, let's document it well
           | so people don't waste too much time on this".
           | 
           | The setenv/getenv problem is such a case. Anyone who
           | disagrees is free to fork glibc, implement whatever fix they
           | think is adequate, and then try to compile the software
           | packages found on a typical Linux server against the result.
           | 
           | > so the issue can be reproduced.
           | 
           | "Can be reproduced" and "is a common issue in production
           | code" are not the same.
           | 
           | Fact is, almost all production programs that set envvars, do
           | so once, very early in the process lifecycle, and then never
           | again, and so are never affected by this.
        
             | mastax wrote:
             | So why not implement the fix suggested in the article:
             | improve the existing interface to the extent possible, and
             | introduce a new interface which is easier to use correctly.
        
               | rerdavies wrote:
               | So why not implement it yourself, instead of polluting
               | the standard runtime with functionality that nobody
               | needs?
        
               | fch42 wrote:
               | There is nothing to "improve" on the existing interface,
               | really. From a C point of view ... a _hidden_ global lock
               | is worse than no lock at all. Because in the latter case
               | ... you, as the programmer, have a choice what to do. If
               | you never call setenv(), no locks. If you only ever call
               | setenv() in your startup code, no locks. If you only ever
               | call setenv() after fork&co, no locks. And if you do
               | believe you need to call it at runtime, but are
               | singlethreaded ... still no locks. And if you really
               | really really need to call it from a multithreaded
               | process, concurrently with getenv(), then lock around
               | both and make your getenv() "safe" wrapper create you an
               | owned point-in-time copy - basically a getenv_r().
               | 
               | Note also that "global references" like getenv() returns
               | and point-in-time owned snapshots don't behave the same
               | way. Say, a library initializer code could retrieve a
               | number of env var references by calling getenv(), and
               | then use those at runtime. No more need/use for getenv()
               | again after - and even perf-sensitive code could look at
               | the env var. With a func that copies, the perf-sensitive
               | code would need to do that each time (lock, lookup,
               | copy). Not strongly desirable.
               | 
               | Also ... UNIX is rather flexible ... and if you so wish,
               | you _can_ substitute _your own_ setenv()/getenv() by the
               | magic of dynamic linking. To create a set that locks and
               | returns you leaked copies (changes the semantics of
               | getenv so that the caller must free the pointer to avoid
               | a leak). It's all possible to do this.
               | 
               | I'm getting the impression from this that we see a "go
               | tantrum" here. "I make my own standards but I wanna use
               | that C/Unix standard thing as well but not how it is
               | because it's not nice it should take go into account
               | waaaahwaaah ...".
               | 
               | It is not _nice_ to modify your own env at runtime.
               | Maybe, just maybe ... that's for reasons. Because not
               | everything that can be done is also a great idea.
        
           | wredue wrote:
           | The real skinny of it is that it's in the name:
           | "Environment".
           | 
           | If you're calling setenv in the middle of your program, you
           | fucked up.
           | 
           | There are those things in programming that should be
           | extremely triggering to your "what the actual fuck?!" senses,
           | and "setenv in the middle of runtime" is one of those things.
        
             | kstrauser wrote:
             | True, but for every envvar a program reads, _something_
             | called setenv on it originally. It's not like _no_ programs
             | call setenv in the middle of runtime. Examples:
             | 
             | - Shells
             | 
             | - CI runner
             | 
             | - Container launchers
             | 
             | - IDEs
        
               | tsukikage wrote:
               | The child process's environment for these purposes is
               | constructed without mutating its parent's environment - a
               | copy is used - and before the child process actually runs
               | the target code it was created to run. So there is no
               | possibility of race between mutations to the environment
               | and reads of the environment. If you are writing such a
               | tool but doing something other than this, you are doing
               | it wrong.
        
               | DSMan195276 wrote:
               | > but for every envvar a program reads, something called
               | setenv on it originally
               | 
               | That's not true, that's just misunderstanding how it
               | works. `execve()` takes an entirely new copy of
               | environment variables to give to the child, that's the
               | "real" way to do it.
        
               | stefan_ wrote:
               | No, a process gets its environment variables from the
               | operating system (just like argc, argv) before any code
               | is ever executed and the majority never change them.
        
             | ric2b wrote:
             | Then why does setenv even exist? Maybe that's the issue and
             | it should be deprecated and throw compilation warnings?
        
           | zare_st wrote:
           | > People don't like APIs that can randomly crash your program
           | while there's no good technical reason for why they should.
           | Why not fix the problem?
           | 
           | I think you're not seeing this from the right POV. People
           | that consume POSIX API need to know POSIX API.
           | 
           | https://pubs.opengroup.org/onlinepubs/009604499/functions/se.
           | ..
           | 
           | It says loud and clear "The setenv() function need not be
           | reentrant. A function that is not required to be reentrant is
           | not required to be thread-safe."
           | 
           | > "The unpredictable crashes only happen very rarely" doesn't
           | mean the crashes go away.
           | 
           | If you get a crash over setenv() reading the manual page of
           | setenv C call should be your first step. And the only step.
           | The bigger issue is in design of application that has wrongly
           | assumed setenv() is thread-safe. That requires a refactoring
           | and is solely due to developer misunderstanding the API.
        
             | loup-vaillant wrote:
             | You're blaming the victim here.
             | 
             | Not being re-entrant makes the user-facing API
             | unnecessarily complicated. It creates an avoidable foot
             | gun. It trips people up for no good reason. And unlike
             | stuff like signed integer overflow, there doesn't even seem
             | to be a (dubious) performance argument to justify this
             | insanity.
             | 
             | The standard should be fixed and that's the end of it.
        
             | tikhonj wrote:
             | "RTFM" is not a coherent defense for awful API design and
             | we shouldn't accept it as such.
        
               | zare_st wrote:
               | Who is "we"?
               | 
               | I'm a UNIX/C programmer for decades and I don't care
               | about this.
               | 
               | There is no such thing as beautiful API design. Every
               | design is a compromise. If you think non-reentrant calls
               | should be deprecated in POSIX take it to the committee.
               | 
               | There is a myriad of non-reentrant code both in POSIX
               | spec and in libc implemenations. You need to RTFM, I'm
               | sorry.
               | 
               | There is no "coherent API" as far as null termination
               | goes too. Some library functions deal with it, some calls
               | don't. You need to RTFM.
               | 
               | I also want to know OP's reason to even use setenv() in a
               | multithreaded piece of software. It's like an oxymoron.
               | setenv and vars are useful to pass on data from parent
               | process to forked children because they inherit the
               | environment. If you use the threading model you don't
               | need it. If your application is a single process setenv()
               | is useless.
        
               | usrbinbash wrote:
               | What should we accept? That every library is made under
               | the assumption that it has to work as expected, even if
               | people ignore the documentation?
               | 
               | As someone who made and maintains multiple libraries: No.
               | Not gonna happen.
        
               | JohnFen wrote:
               | Putting aside whether or not the design is awful, the
               | fact that it's standardized and documented is absolutely
               | a valid argument. Changing it now would break backward
               | compatibility. That should always be a showstopper.
               | 
               | Programmers who are using any library code without
               | reading and understanding the documentation are asking
               | for trouble regardless of language.
               | 
               | The correct solution to your objections is to create new
               | functions that behave as you prefer.
        
         | xbar wrote:
         | While I am sure that thousands of hours have been spent
         | debugging threaded setenv() attempts (and developing &
         | discarding Annex K), it is clearly not a problem that needs a
         | solution.
         | 
         | Languages that compile to C need be careful not to promise
         | thread-safe implementations of POSIX or C functions that are
         | explicitly documented as not reliably thread-safe, including
         | setenv(). The author seems to want to change C, and POSIX, so
         | that Go can reliably do so.
        
         | loup-vaillant wrote:
         | You are literally putting forth arguments in favour of fixing
         | the thread safety issue, and then conclude it's not worth the
         | effort.
         | 
         | It's simple, really: we indeed rarely to `setenv()`. So it's
         | not a performance problem. So we can make it thread safe, and
         | the performance impact will be negligible. In exchange for this
         | small price, safety will increase.
         | 
         | Sacrificing any amount of safety for a negligible improvement
         | in performance is flat out unprofessional, and should be
         | grounds for immediate termination in most contexts.
        
           | DSMan195276 wrote:
           | How do you propose making it thread-safe? The real problem
           | here is that `getenv()` was designed around it returning a
           | `char *` into some read-only memory. It's a bad API if the
           | backing data can change because the returned pointer is
           | assumed to exist 'forever'.
           | 
           | `setenv()` has no way to knowing where those pointers are
           | floating around so there's no way to safely change the
           | environment variables. The best you could do would be to leak
           | memory every time you set new environment variables so that
           | the old pointers don't get invalidated, and that just creates
           | a new problem and reason not to use `setenv()` (that's
           | arguably worse).
        
             | cnity wrote:
             | Here's my proposal: Introduce a new threadsafe API
             | (`tgetenv` or whatever) which takes _two_ `char *`s, one of
             | which is a return buffer. This leaves allocation as a
             | responsibility of the caller.
             | 
             | And then you can leave the existing syscalls as they are
             | (thread unsafe) while having a separate thread safe
             | version.
        
               | DSMan195276 wrote:
               | I agree that would be the way to do it, but now we're no
               | longer talking about simply 'fixing' the implementation
               | of the existing API but rather introducing a new function
               | you have to use.
               | 
               | `setenv()` would only be safe if your program never uses
               | `getenv()`, and calls to `getenv()` are so numerous and
               | all over the place that for most non-trivial programs it
               | would be hard to ensure they never happen.
               | 
               | There's also the rub that `setenv()` is not part of the C
               | standard, it's POSIX. I don't think the C standard would
               | ever introduce `tgetenv()` to fix a problem it doesn't
               | have, so non-POSIX code would have to continue to call
               | `getenv()` since that's all that is available to them.
        
               | josefx wrote:
               | > I don't think the C standard would ever introduce
               | `tgetenv()` to fix a problem it doesn't have
               | 
               | The C standard has no problem acknowledging that getenv
               | is subject to data races for most of its implementations.
               | As far as I can tell that part was even added at the same
               | time as threading support.
        
               | DSMan195276 wrote:
               | Well actually I'll have to eat my words on that one - I
               | didn't catch that Annex K in C11 includes `getenv_s`
               | (even if it is optional).
        
               | leoh wrote:
               | >you have to use
               | 
               | I mean, why not just deprecate the old one; add a warning
               | if it's used
        
               | DSMan195276 wrote:
               | That doesn't really help you determine whether a given
               | library is using `getenv()` or not. That also requires
               | that things are actually recompiled/updated, which for
               | some C libraries is not that often.
               | 
               | There's also the rub that many C programs do not target
               | the latest standard (for a variety of reasons). I didn't
               | realize `getenv_s` was added in C11 (though it's
               | optional), but it doesn't really matter because
               | programs/libraries that target C89 or C99 can't use it
               | anyway.
        
               | salawat wrote:
               | GNU convention is <funcname>_r for a reentrant version of
               | a non reentrant function.
               | 
               | I'm in the process of working on a tool in C at the
               | moment, so for once I actually have some context on
               | what's being grumped about here!
        
               | JohnFen wrote:
               | Entirely this. It works so well that I've seen this in
               | various utility libraries for decades.
        
             | forrestthewoods wrote:
             | > to safely change the environment variables. The best you
             | could do would be to leak memory every time you set new
             | environment variables so that the old pointers don't get
             | invalidated, and that just creates a new problem and reason
             | not to use `setenv()` (that's arguably worse).
             | 
             | Arguably worse? My goodness no.
             | 
             | This is a rare edge case that most programs don't
             | encounter. Option 1 is to crash and explode and die. Option
             | 2 is to leak tens of bytes.
             | 
             | Leaking tens of bytes is for sure NOT worse than crashing.
        
               | marcosdumay wrote:
               | > Leaking tens of bytes is for sure NOT worse than
               | crashing.
               | 
               | I do really disagree here. The answer is not clear at
               | all.
               | 
               | But then, you are mischaracterizing the problem. The
               | issue is not with crashing, you can get plain bad data
               | too, and this is clearly worse than both leaking memory
               | and crashing.
               | 
               | Also, the GP is mischaracterizing the options. You don't
               | need to leave the old values around, you can just copy
               | them into userspace memory.
        
               | DSMan195276 wrote:
               | My reasoning is simple - the issues here can be avoided
               | if you're careful about how you use `setenv()` and
               | `getenv()`, which many programs already are. The memory
               | leak in contrast would never be avoidable regardless of
               | how you use it.
        
               | forrestthewoods wrote:
               | The problem with "be careful" is that libraries often
               | want to use the very unsafe API and there is no standard
               | mechanism to expose safety. It's fundamentally a bad API
               | design. It could be good. But it is not.
               | 
               | There's a reason this problem comes up on HN once or two
               | a year. And don't even get me started about printf
               | grabbing a mutex for a stupid locale...
        
               | none_to_remain wrote:
               | You could even put a note in the man page that `setenv()`
               | will leak memory. Then ten or twenty years from now there
               | will be a blog post about how a currently trendy
               | language/runtime can be manipulated into looping over
               | `setenv()` zillions of times and OOM'ing, and comments
               | about how no one can possibly be expected to read the man
               | page for this horrible footgun, and it's wrong to expect
               | developers to have any idea about what they're doing,
               | give a shit, or pay attention at all.
        
           | leoh wrote:
           | This is not a good argument imo. Its "rarity" still affects a
           | tremendous number of folks in profoundly vexing ways that are
           | difficult to debug on account of this not only affecting C
           | but innumerable other languages' compilers and interpreters
           | that rely on the stock getenv implementation.
           | 
           | I wouldn't be surprised if a good chunk of compilers and
           | interpreters in other languages suffer from this gotcha'.
           | 
           | I mean, I wouldn't even be surprised if some _JVM_
           | implementations silently expose their users to bugs on
           | account of this implementation.
           | 
           | EDIT: ... ha, sure looks like it https://github.com/openjdk/j
           | dk/blob/a2c0fa6f9ccefd3d1b088c51...
        
           | usrbinbash wrote:
           | > You are literally putting forth arguments in favour of
           | fixing the thread safety issue, and then conclude it's not
           | worth the effort.
           | 
           | Yes. I do. These two concepts don't contradict each other.
           | 
           | > No it's not a performance problem. So we can make it thread
           | safe, and the performance impact will be negligible
           | 
           | Who said anything about performance being the problem, or a
           | reason not to change it!?
           | 
           | The problem is _BACKWARDS COMPATIBILITY_. The issue is that
           | `getenv` returns a `*char` into the envvar array. Basically
           | every application that uses this function relies on this
           | fact.
           | 
           | So we have:
           | 
           | a) A potential issue that occurs only in very unusual
           | circumstances, most of which will never occur in production
           | code and on the odd chance that they do, they can easily be
           | avoided. Documenting that well can help prevent time wasted
           | in debugging.
           | 
           | b) A fix that may prevent a) but breaks backwards
           | compatibility promises, and would necessitate reworking god
           | knows how many programs, the vast majority of which were
           | never impacted by the issue in the first place.
           | 
           | Of these 2 options, a) is just the better one. Yes, in an
           | idea world, we could have pure, 100% bug free code, and spend
           | an unlimited amount of time on fixing every last problem.
           | That's not the world we live in however, and so a pragmatic
           | approach is simply a necessity.
        
           | zlg_codes wrote:
           | Ooh, the old 'unprofessional' epithet! What do you mean by
           | that slur here? Most can't agree on what professional even
           | means. Additionally, why should one be held to artificial,
           | inconsistent, and poorly defined standards of
           | 'professionalism' when they aren't a professional?
           | 
           | My care for code robustness scales with income.
        
       | Galanwe wrote:
       | I don't quite get why this is a problem per se.
       | 
       | I mean, the environment is just a chunk of memory made available
       | to the process by the OS. It is no more, no less thread safe than
       | any other chunk of memory.
       | 
       | Why would the libc need to protect it more than any other memory
       | location?
        
       | fooker wrote:
       | Changing any kind of global state is fundamentally not thread
       | safe.
       | 
       | Sure you could use locks, stop the world, etc, but there is no
       | way you can ensure that all the data and information you had
       | derived from the old state is going to be valid.
       | 
       | A better solution is to not rely on global state like this.
        
       | larschdk wrote:
       | What is the use case for a mutable environment past
       | initialization?
       | 
       | It seems like a complicated and error prone thing to be using no
       | matter if it is thread safe or not. You can set up your own
       | environment before you launch threads, and you can launch child
       | processes with a different environment from the current process
       | without modifying your own. If you fork, you can modify the
       | environment in the child without affecting the parent until you
       | exec.
       | 
       | And even setenv() if it was reentrant and couldn't cause crashes,
       | it wouldn't be thread safe, since threads share the environment
       | and could get their environment changed under their feet.
        
       | 1over137 wrote:
       | >We should apparently read every function's specification
       | carefully, not use software written by others, and not use
       | threads. These are unrealistic assumptions in modern software.
       | 
       | The first of the three listed items you should certainly do. I
       | hope this author is not writing medical software, or anything
       | important.
        
       | faiD9Eet wrote:
       | I am trying to make sense of the argument of pushing
       | configuration into a library:                   * if the library
       | is just a dependency, the Linux loader will set it up. It will
       | have the same environment as the other libraries and as the main
       | program.         * if the library is set up by dlopen(), there is
       | no way to provide an environment pointer
       | 
       | Altering the global environment variable for child processes
       | makes no sense, for                   execve()
       | 
       | accepts an                   char* envp[]
       | 
       | . So I guess we need to talk about issues with a specific use
       | case of                    dlopen()
        
         | planede wrote:
         | Maybe the dlopen issue could be hacked around by dlmopen and
         | injecting getenv and setenv symbols that access a different
         | environment variable list than the application's.
        
       | devnonymous wrote:
       | Everytime I hear someone lament about something not being thread
       | safe, what I actually hear is - I want this shared invariant
       | global state to be modifiable but it isn't. Which makes me ask
       | the question why would you want that ?
       | 
       | The process environment should not be the mechanism for threads
       | to communicate with each other.
        
       | sebstefan wrote:
       | >The argument is that the specification clearly documents that
       | setenv() cannot be used with threads. Therefore, if someone does
       | this, the crashes are their fault.
       | 
       | Oh, C is taking the php approach :)
        
         | Ayesh wrote:
         | In fairness to PHP, almost every PHP 7+ version from the 5-6
         | years had things deprecated especially to "cure" these things.
        
       | fsckboy wrote:
       | the env is unix. It's not C's job to fix. Keep studying it till
       | you understand that.
       | 
       | and don't forget, you are using unix because it defeated all the
       | other options, because it was better and they were worse, so also
       | keep studying till you understand why that is too.
       | 
       | then this problem with env will fix itself.
       | 
       | unix gives you tools to handle threads. C gives you tools to
       | handle threads. Learn them, use them.
        
       | jeroenhd wrote:
       | On Windows you can use
       | GetEnvironmentVariable/SetEnvironmentVariable (on XP and later),
       | which do implement some locking and doesn't run into this issue
       | because GetEnvironmentVariable copies the data out into a caller-
       | supplied buffer. getenv_s was a nice effort, but it failed.
       | 
       | I don't really understand why other languages such as Go and Rust
       | decided to call the weird POSIX API rather than implementing
       | their own API, which matches the semantics they expect. In cross
       | platform C you'll be stuck with the outdated POSIX API design,
       | but there's no reason why other languages should accept those
       | same limitations.
       | 
       | We're not running on PDP-11s anymore. You can afford a thread-
       | safe hash map in your standard library. Ignore the limitations of
       | the old C library. Twenty years ago, Microsoft released a better
       | API, keep the crashy old API with tons of deprecation warnings
       | (hell, add a compiler flag --enable-broken-c-api-designs) and
       | just provide new APIs that are actually usable in modern
       | programming environments.
        
         | pjmlp wrote:
         | Because as languages born from UNIX culture, the community
         | usually sees POSIX everywhere, and then there are the other
         | OSes.
         | 
         | Have some fun reading on how Go handles files.
        
           | jeroenhd wrote:
           | Go likes to ignore edge cases ("all file names are UTF-8 and
           | if they aren't then we'll just pretend they are") to make it
           | easier to write code, so I'm not very surprised that it got
           | caught in a POSIX related crash here.
           | 
           | It's hard to tell if Microsoft altered the source code since,
           | but the leaked XP source code (https://github.com/tongzx/nt5s
           | rc/blob/master/Source/XPSP1/NT...) doesn't seem to do any
           | getenv() calls for DNS lookups. The specific bug that started
           | all this nonsense only triggers on (specific) Unix
           | implementations. Unfortunately, Go opts to call the POSIX
           | methods rather than
           | GetEnvironmentVariable/SetEnvironmentVariable on Windows, so
           | I suppose it's still possible that somewhere in the chain
           | this bug gets triggered by Go code.
        
             | pjmlp wrote:
             | On non-UNIX platforms stuff like getenv() belongs to the
             | specific compiler C library, not the OS API, hence why
             | Windows doesn't use it.
        
               | SAI_Peregrinus wrote:
               | Non-Linux. Most other UNIX platforms also have syscalls
               | depend on the specific C library.
        
               | pjmlp wrote:
               | On UNIX there is no distinction between C standard
               | library and OS APIs, hence POSIX as the UNIX bits missing
               | from ISO C.
               | 
               | I feel you're mixing OS APIs, with the low level
               | mechanism to enter into the kernel space.
        
         | lifthrasiir wrote:
         | Once you start to provide C interoperability, it is inevitable
         | because C programs still rely on that broken API and many users
         | would expect that Rust will give the same time zone as C. And
         | with an exception of Windows, that API is often the single
         | existing API throughout the entire system.
        
         | Measter wrote:
         | Rust's stdlib's API is completely safe here. On Windows[1], it
         | uses the GetEnvironmentVariable/SetEnvironmentVariable API,
         | which as you noted doesn't have this problem. On Unix[2], it
         | maintains its own RwLock to provide synchronisation.
         | Additionally, Rust's API only gives out copies of the data, it
         | never gives you a pointer to the original.
         | 
         | The problem comes when you do FFI on *nix systems, because
         | those foreign functions may start making unsynchronised calls
         | to getenv/setenv.
         | 
         | [1] https://github.com/rust-
         | lang/rust/blob/master/library/std/sr...
         | 
         | [2] https://github.com/rust-
         | lang/rust/blob/master/library/std/sr...
        
       | masklinn wrote:
       | Doesn't the essay contradict itself?
       | 
       | It states that glibc "never free[s] environment variables], but
       | then goes on to state
       | 
       | > [in glibc] if a thread calling setenv() needs to resize the
       | array of pointers, it copies the values to a new array and frees
       | the previous one
       | 
       | Since envvars cause crash under glibc, I assume the initial
       | assertion is incorrect.
        
         | Tobu wrote:
         | There are two level of pointers: the environment block points
         | to an array of pointers to C strings, this higher-level pointer
         | can be updated and the previous one freed, which is a problem
         | when it is being iterated on (which getenv does). The C strings
         | themselves aren't freed by glibc, though some applications do
         | modify them in place.
        
       | tsukikage wrote:
       | Hot take: if a crash can be made to disappear just by adding a
       | mutex inside setenv(), this means code reading the environment is
       | racing with code writing to it, and in this situation adding a
       | mutex inside setenv() will generally make things worse instead of
       | better: it may hide the immediate symptom, but the underlying
       | race between reading and writing to the environment remains, your
       | program will behave differently each run depending on who wins
       | the race (with potentially catastrophic results depending on what
       | the writes are doing), and the cause will be much harder to debug
       | from cold due to the lack of a smoking gun pointing at
       | environment manipulation.
       | 
       | The multithreaded program needs to be restructured so that the
       | parts that communicate via the environment are properly
       | serialised with respect to each other, just as would be needed
       | for any other communication via global state and/or access to a
       | shared resource.
       | 
       | This has to happen at a higher level than the individual
       | getenv/setenv calls: entire blocks of logic containing the calls
       | need to be made atomic (or otherwise refactored; perhaps you
       | could do all the environment writes before spawning any threads)
       | so that no other thread can blow away the environment contents in
       | between the code that sets it up for some purpose and the code
       | that implements that purpose; and once this is properly done, the
       | individual calls themselves do not need further protection.
        
         | hddqsb wrote:
         | Sure, some applications might require custom higher-level
         | synchronisation, but it's still important for getenv/setenv to
         | be thread-safe (i.e. not crash):
         | 
         | - The race might be irrelevant (e.g. simultaneous calls that
         | access different variables are fine).
         | 
         | - The application author might not have complete control over
         | all calls to getenv/setenv (e.g. if using a third-party
         | library).
        
       | dark-star wrote:
       | If you're doing setenv in multiple threads in parallel and can't
       | afford the overhead of wrapping it in a mutex or whatever, then
       | you're clearly doing something wrong...
        
         | kukkamario wrote:
         | That isn't enough. You'd need to wrap getenv calls as well and
         | that can't be done as there are hidden getenv calls in stdlib
         | implementation. There isn't a trivial fix.
        
         | tsukikage wrote:
         | if you're doing setenv in multiple threads in parallel, you
         | clearly don't actually care about what state the environment is
         | in at any given time, and therefore a better optimisation would
         | be to not bother with the setenv calls at all
        
       | planede wrote:
       | So you make it thread-safe. Now what? Just because you don't get
       | a data-race or undefined behavior, it doesn't make setenv/getenv
       | usable across threads without any synchronization anyway.
       | 
       | My take on it is that global mutable state is owned by the
       | application, library code should never ever mutate it. Applies to
       | the environment variables, stdout/stderr, locale.
       | 
       | The application must ensure that when these are mutated they are
       | not read concurrently by an other thread. As external libraries
       | rarely document the exact conditions when they read environment
       | variables, the best is to only update the environment when no
       | other thread is running. The absolute best is to avoid mutating
       | it altogether.
        
       | lifthrasiir wrote:
       | > My understanding is the people responsible for the Unix POSIX
       | standards did not like the design of these functions, so they
       | refused to implement them.
       | 
       | And herein lies the actual issue: C has a sh*tton of API issues
       | in its standard library, and people _really_ want to fix as many
       | of them as possible whenever possible, but doing so will
       | destabilize the standard so most of them won 't ever be accepted.
       | In the case of Annex K many clearly felt that the size-restricted
       | API alone is not enough, because it is still easy to
       | desynchronize the buffer and the allocated size, and it's a good
       | point if we ignore an obvious counterpoint of the lack of safe
       | `getenv` alternatives in the standard library at all... I wonder
       | about the alternative universe where we have two distinct
       | standards for the C language and C standard library so that the
       | library standard is much easier to fix and adapt.
        
       | FounderBurr wrote:
       | So much cringe, I can't believe this wasn't written as satire.
        
       | ChrisRR wrote:
       | Lots of C isn't thread safe, and that's the point. It's supposed
       | to be close to the metal, and it's supposed to be small. If you
       | want to add additional functionality then that's on the
       | user/library. If we add thread safe/memory safe versions of every
       | function, then we're moving away from C and into Rust territory.
       | 
       | C is a powerful language, but it's not a language designed to
       | hold your hand.
       | 
       | Edit: Especially in this case where the environment is maintained
       | by the OS, that puts the onus of safety on the OS to ensure that
       | different processes can't modify and read the env simultaneously.
       | 
       | If you're worried about reading and writing the env in different
       | threads within the same process, then you need to reconsider your
       | design.
        
         | rini17 wrote:
         | We're talking about C standard library here. Which has streams
         | (struct FILE) that are threadsafe and use locks by default -
         | even in single-threaded programs. They could have extended the
         | interface to access the environment too. But idk, reasons.
        
           | jstimpfle wrote:
           | Contrary to fread(), getenv() does not copy data. It returns
           | a pointer to memory that might be invalidated by the
           | following setenv(). In other words, it can't be really made
           | threadsafe -- it's fundamentally a thread-unsafe API.
           | 
           | What _could_ be done is offering an alternative, thread-safe
           | API that takes an RW mutex on get /set, and get reads to a
           | user-provided buffer. However, that would be complicated as
           | well because the user can't know the size of any envvar, and
           | getting the size before reading the var is racy. So maybe
           | there needs to be env_lock()/env_unlock() and
           | getenv_unlocked()/setenv_unlocked(). Or a version that locks
           | and strdups() the var. But that still leaves the problem that
           | existing software does not use this API.
           | 
           | And really, why would you set envvars instead of global
           | variables? Seriously? A data structure that is a list of
           | KEY=VALUE formatted zero-terminated string pointers? Just
           | don't do it. Use setenv() only at startup and after fork()
           | and before exec(), done.
        
         | lifthrasiir wrote:
         | My design is perfect, but libraries have no idea and can and
         | will break my perfect design.
         | 
         | Joke aside though, for this reason thread safety is one of
         | general requirements for the composability that any large
         | enough software project would definitely need.
        
         | another2another wrote:
         | I don't think C standard library particularly wanted to be
         | problematic to use, but they just weren't prepared for
         | multithreaded general programming, and tried to patch it with
         | e.g. the _r() version of methods.
         | 
         | An example where I recently made a change is localtime() where
         | I changed to use the _r() variant after wrongly just calling
         | localtime(), but only later realised they were using a global
         | per-process buffer which _might_ produce wrong results in my
         | logging code (e.g. if another thread calls localtime then my
         | time buffer would be updated after I populated it 2 seconds
         | prior).
         | 
         | Now the clib could have made a per thread TLS slot for each
         | threads' time values without too much overhead, which would
         | have automatically fixed any careless uses of localtime() in
         | multiple threads, but instead opted for the separate thread
         | safe function calls.
        
         | SAI_Peregrinus wrote:
         | C is not supposed to be close to "the metal" unless that's a
         | single-core unified-memory processor. It's supposed to be
         | maximally portable while retaining reasonable performance, and
         | it does that very well.
        
         | andrewaylett wrote:
         | > It's supposed to be close to the metal, and it's supposed to
         | be small.
         | 
         | Unless you're developing for the PDP-11, I'm afraid I have some
         | bad news for you.
         | 
         | We absolutely want -- and occasionally get -- safe versions of
         | unsafe functions, like strlcpy. And we rely on _quite a lot_ of
         | code to maintain the appearance that C is close to the metal.
        
       | 0xbadcafebee wrote:
       | Much ado about nothing. If you have to set environment, do it at
       | the beginning of a program, before threading. When you execute an
       | application, pass a new environment, without setting it. There's
       | really no need to set environment from threads.
       | 
       | The only use case where this bug happens seems to be threaded
       | programs that load libraries _after threading has initialized_ ,
       | and want to configure those libraries with environment variables,
       | that the user/parent program hasn't specified, rather than
       | calling their APIs with specific arguments. If a library provides
       | no way to set some option other than environment variables, their
       | API is simply incomplete and needs fixing. An incomplete library
       | is not a good enough reason to amend the C standard.
        
       | dahfizz wrote:
       | > We should apparently read every function's specification
       | carefully.... These are unrealistic assumptions in modern
       | software.
       | 
       | Lol
        
       | ewst wrote:
       | why are you calling setenv in threads?
        
         | amelius wrote:
         | Possibly because some library they are using calls getenv.
        
       | whalesalad wrote:
       | Environment is considered an immutable space in all the systems I
       | build. It's set before any program code / processes are even
       | started.
        
       | Joker_vD wrote:
       | And to re-iterate my point from another thread on setenv, no,
       | "just don't call setenv() after creating threads" is not a
       | solution because even if the code _you_ wrote may be single-
       | threaded, your application as a whole is not composed entirely of
       | code written by you: the moment you link against _any_ 3rd-party
       | library, you program can have arbitrarily many threads before it
       | even reaches main().
        
         | account42 wrote:
         | > And to re-iterate my point from another thread on setenv, no,
         | "just don't call setenv() after creating threads" is not a
         | solution
         | 
         | Yes it is.
         | 
         | > the moment you link against any 3rd-party library, you
         | program can have arbitrarily many threads before it even
         | reaches main().
         | 
         | So don't call setenv after you load any libraries. If that
         | means you can't call setenv at all for your linking setup then
         | so be it.
        
         | grayhatter wrote:
         | > the moment you link against any 3rd-party library, you
         | program can have arbitrarily many threads before it even
         | reaches main().
         | 
         | what?! name one library that does this?
        
           | aidenn0 wrote:
           | I think the C++ version of the zeromq library could
           | initialize a context in a static initializer, and zmq
           | contexts involve creating a thread.
        
             | grayhatter wrote:
             | https://zeromq.org/get-started/?language=cpp#
             | 
             | there's 5 listed for c++ which one creates threads before
             | main()?
        
               | aidenn0 wrote:
               | It's been a long time, so I might be wrong, but with
               | zmqpp[1] make a global zmqpp::context and I believe it
               | will create threads before main().
               | 
               | 1: https://github.com/zeromq/zmqpp
        
           | slaymaker1907 wrote:
           | Windows will have a few threads for doing IO at program
           | startup.
        
       | egberts1 wrote:
       | I am only more surprised that we have not inserted a new #ifdef
       | UNSAFE_THREAD added to stdhdr.h, so we can wrap setenv(), et. al.
        
       | Const-me wrote:
       | Interestingly, Windows developers made much better choices back
       | in the 1990-s.
       | 
       | GetEnvironmentStrings() API comes with the
       | FreeEnvironmentStrings() counterpart. Whoever calls
       | GetEnvironmentStrings() to get the entire environment is then
       | responsible to call FreeEnvironmentStrings(), which allows
       | thread-safe GetEnvironmentStrings() API.
       | 
       | GetEnvironmentVariable() API is even simpler, it doesn't return a
       | pointer, instead it fills a caller-provided buffer.
        
         | NekkoDroid wrote:
         | GetEnvironmentVariable() also has a subtle problem from what I
         | can see: You first check how big the buffer has to be and in
         | the second call you actually fill out the buffer, which allows
         | for the variable to change between those 2 calls (I guess it's
         | a variant of the time-of-check time-of-use race condition).
         | 
         | There is getenv_s that doesn't have this problem to my
         | knowledge, but it also doesn't exactly allow you to control the
         | allocation of the memory (how important that is in cases like
         | this is a different question)
        
           | Const-me wrote:
           | It seems the getenv_s and GetEnvironmentVariable API
           | functions are 100% equivalent; the API is only slightly
           | different.
           | 
           | When you don't know the length of the variable you query, you
           | guessed wrong on the first attempt, and other threads are
           | changing the environment in the background, you might need 3
           | of even more calls to either function (passing longer output
           | buffers) to successfully retrieve the complete variable.
        
             | NekkoDroid wrote:
             | You are right, I mixed up my functions. What I actually
             | meant was _dupenv_s
        
       | InfiniteRand wrote:
       | Beyond availability, anyone have issues with getenv_s?
        
       | account42 wrote:
       | > Since many libraries are configured through environment
       | variables, a program may need to change these variables to
       | configure the libraries it uses. This is common at application
       | startup. This causes programs to need to call setenv(). Given
       | this issue, it seems like libraries should also provide a way to
       | explicitly configure any settings, and avoid using environment
       | variables.
       | 
       | This is the only correct solution. Even threadsafe, setenv
       | doesn't guarantee anything about when the variable will take
       | effect. There is no way for consumers to tell be notifided of a
       | changed variable. For that you need guarantees from the library
       | and at that point the library can just as well provide a better
       | configuration interface. Keep environment variables static for
       | the process lifetime and there are no issues.
        
       | cryptonector wrote:
       | I'm glad TFA mentions Solaris/Illumos' implementation, so I don't
       | have to.
        
       | cryptonector wrote:
       | I'm glad TFA mentions Solaris/Illumos' implementation, so I don't
       | have to.
       | 
       | Java doesn't allow one to set environment variables for the
       | running process, but it does allow setting env vars for processes
       | being spawned. It would be better if all C libraries did what
       | Illumos' does.
        
       | kibwen wrote:
       | Eyra is a new implementation of libc in Rust that addresses this
       | in its default configuration:
       | 
       |  _" Eyra solves this by having setenv etc. just leak the old
       | memory. That ensures that it stays valid for as long as any
       | thread needs it. Granted, leaking isn't great, and Eyra makes it
       | configurable with the threadsafe-setenv cargo feature, so it can
       | be disabled in favor of the thread-unsafe implementation. "_
       | 
       | https://blog.sunfishcode.online/eyra-does-the-impossible/
        
         | cryptonector wrote:
         | It was never impossible. As TFA notes, Solaris/Illumos got it
         | right 15 years ago.
        
       | StillBored wrote:
       | The problem isn't setenv() so much as getenv returns raw pointers
       | to the data structure that is potentially being manipulated.
       | 
       | But this is/was strictly a glibc/Linux bug because maintainers in
       | the past didn't want to improve the situation (read add locks
       | which weren't 100% reliable) nor add thread safe version of the
       | calls: ex getenv_s/getenv_r as pretty much every other POSIX
       | compliant system has done.
       | 
       | And so the situation I hit many years ago was a proprietary
       | library doing setenv's before fork() has now been fixed (and me
       | calling those routines from multiple threads), and setenv() on
       | linux/glibc is now working if it's built with locking:
       | 
       | https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/sete...
       | 
       | So the remaining issue is making the environment thread safe,
       | which means adding a getenv_r/s call and assuring its being used
       | everywhere, which is probably a more complex problem than tossing
       | the setenv() lock. But then in the setenv/fork case the forked
       | process is a crapshoot whether it gets the "right" environment.
       | In my case above it didn't really matter because the library was
       | doing the equivalent of `export YOURACHILD=1` so the value wasn't
       | being changed from invocation to invocation.
       | 
       | But there are dozens and dozens of other similar gochas in the
       | spec, where error conditions or threading races exist and aren't
       | noticeable until one understands how it is being implemented.
       | There isn't a way to fix it with the library itself because many
       | of the calls need more stored context space (ala win32 handles).
       | So one ends up doing things like building serialization locks
       | into the application to assure certain subsets of the posix/c/etc
       | libraries aren't being called in parallel. (in this case
       | getenv/setenv/fork).
        
       | dpc_01234 wrote:
       | We need to fork libc/posix, name it like `libc2` etc. fix
       | versioning and start making changes, while allowing legacy
       | software deal with the old cruft.
       | 
       | It never gonna happen if we're waiting for some committee to deal
       | with it, because they will be too afraid of breaking backward
       | compatibility.
        
       | leoh wrote:
       | Yeah, it's lame. Linters and static analyzers should probably
       | warn in addition to updating as much living documentation as
       | possible.
       | 
       | And, idk, suggest something like a method with a singleton mutex
       | for getting/setting?
        
       | worik wrote:
       | > We should apparently read every function's specification
       | carefully,
       | 
       | Yes.
       | 
       | You should.
        
       | zare_st wrote:
       | I don't see the problem here. I don't see why you need setenv.
       | Please tell me why it's used in conjunction with pthreads. I
       | think whoever is doing this is designing their software under
       | wrong assumptions.
       | 
       | Can you read/write to same fd socket across threads? No? So
       | what's the issue then?
        
       | Groxx wrote:
       | Is anyone aware of a language that simply does not have globals
       | like this? Or globals at all? The more I deal with other people's
       | code, the more I want one.
       | 
       | Obviously there are some semantic-globals (ports, env, main
       | thread, etc) that are unavoidable, but we have a way to deal with
       | that: dependency injection. Allow it in main, everything else has
       | zero access unless it is given the instance representing it.
       | 
       | Obviously that would be pretty painful in practice without some
       | boilerplate-reducing tools, but... would it be worse _in
       | aggregate_? Or would having _real_ control over all this at last
       | pay off? I 'm quite curious.
        
       | andrewla wrote:
       | A lot of comments in here seem to be saying that "setenv is
       | rarely used, so we can make setenv threadsafe even if it is
       | costly", but I think that's missing the point.
       | 
       | The thread-safety is in the operators getenv/setenv (and putenv
       | and unsetenv). Thread safety has to apply to all operators, and
       | the functionality of getenv() (which is by far the most commonly
       | used of these operators) is what has to be fixed.
       | 
       | You simply cannot make setenv() thread-safe so long as getenv()
       | has its current interface. You need to make getenv() safe first;
       | ideally with a getenv_r() call that fills in a user-supplied
       | buffer. From that point making the rest of the calls thread safe
       | is trivial.
        
       | jcalvinowens wrote:
       | This is silly, setenv() isn't reentrant for the same reason that
       | getopt() isn't reentrant: there's no valid reason to use it
       | except at the very beginning of the program.
       | 
       | The most common misuse I see is changing env before forking a
       | child: nobody has to do that, execve() lets you pass arbitrary
       | envp to the new process without changing yours.
       | 
       | If you need to change env in threaded tests... frankly I think
       | there was probably a better way to do whatever you're doing, but
       | you can just declare a global lock and use it. I bet you could
       | even LD_PRELOAD a custom setenv() that uses your lock.
       | 
       | Nobody is pointing at concrete problems outside of Rust. Rust is
       | just wrong here, sorry, the manpage has said this for a long
       | time:
       | 
       | > POSIX.1 does not require setenv() or unsetenv() to be
       | reentrant.
       | 
       | I think a more intellectually honest version of this article
       | would have been "POSIX should have made setenv() reentrant", not
       | "C is buggy": it's not buggy, it obviously complies with the
       | standard. There's nothing to "fix", he wants to change the
       | standard.
        
         | kibwen wrote:
         | _> the manpage has said this for a long time_
         | 
         | Nobody is disputing what the manpage says. What people are
         | arguing is that the specification should be improved so as to
         | no longer say that. Please stop quoting the Posix docs, as it
         | merely broadcasts that one has missed the point here.
         | Documenting the behavior does not automatically excuse the
         | behavior.
         | 
         | As for Rust, the Rust docs make it clear that the underlying
         | mechanism is fraught. Rust is well-acquainted with trying to
         | find satisfactory solutions to the unhelpful and nonsensical
         | tech stack that has been foisted upon the world by decades of
         | worse-is-better laziness. And even if Rust were to mark
         | std::env::set_var as unsafe, that doesn't magically fix
         | anything; the underlying mechanism is broken, and actually
         | fixing it is beyond Rust's control. Only the platforms can fix
         | it.
        
           | jcalvinowens wrote:
           | > Documenting the behavior does not automatically excuse the
           | behavior.
           | 
           | It's a _standard_ , and I'm citing the _standard_. The non-
           | reentrant behavior doesn 't need to be "excused", _it is
           | correct_!
           | 
           | If a microcontroller used a different bit than you expected
           | it to for something, would the documentation need to be
           | "excused" for "disagreeing" with you? That's how absurd what
           | you're saying here sounds.
           | 
           | > What people are arguing is that the specification should be
           | improved
           | 
           | That would be more reasonable, but that's not the argument.
           | They're saying the standard is wrong. Standards can't be
           | wrong, they're tautological. All that can be wrong is the
           | programmer's understanding of them.
        
             | paulddraper wrote:
             | If your point is "the standard is correct as judged by the
             | standard" your point is accurate and meaningless.
        
         | loevborg wrote:
         | You nailed it. The real villain in this story is mutability.
         | We're addicted to changing variables in place, which is
         | inherently complex - especially so in multithreaded
         | environments. Environment variables are clearly best treated as
         | immutable. Rust, despite its advances in some areas,
         | perpetuates our addiction to mutable variables.
        
           | orange_fritter wrote:
           | Every attempt at escaping mutability basically kills the
           | language in the mainstream because so much of "real"
           | programming is just bit-twiddling that gets too verbose when
           | immutability is involved. It's a good question whether Rust
           | nudges the world toward functional/declarative spiritual
           | purity by placing constraints on mutation. I'm betting that
           | No, it doesn't.
        
         | beltsazar wrote:
         | > This is silly, setenv() isn't reentrant for the same reason
         | that getopt() isn't reentrant: there's no valid reason to use
         | it except at the very beginning of the program.
         | 
         | It is not, unless you'd argue that "there's no valid reason to
         | use [anything that transitively uses setenv()] except at the
         | very beginning of the program." Did you even read the article?
         | The author and the GitHub links mentioned provide some examples
         | that use setenv() not directly, but transitively.
        
           | jcalvinowens wrote:
           | > Did you even read the article?
           | 
           | Of course. I also clicked through and looked at his examples,
           | did you?
           | 
           | > The author and the GitHub links mentioned provide some
           | examples that use setenv() not directly, but transitively.
           | 
           | The big list at the end of the article? That's absolutely not
           | true: they're all read only usecases that prove my point.
           | Nobody should be changing any of those in the middle of the
           | program. If you disagree, please point out specifically which
           | one and explain why, because I don't see it.
        
             | beltsazar wrote:
             | Ah yes, silly mistakes--I mixed up getenv and setenv. Those
             | examples transitively use getenv, not setenv.
             | 
             | Having said that, my point still stands. You can only
             | control what you directly use / don't use. Third-party
             | libraries you use might use setenv at times other than "the
             | very beginning of the program."
        
         | duped wrote:
         | > there's no valid reason to use it except at the very
         | beginning of the program.
         | 
         | POSIX doesn't define the "beginning of a program." Nor do you,
         | if you're compiling C code. Libraries can (and do) spawn
         | threads before main, so it's not even safe to use if you
         | restrict yourself to "only the beginning of the program."
         | 
         | > The most common misuse I see is changing env before forking a
         | child: nobody has to do that, execve() lets you pass arbitrary
         | envp to the new process without changing yours.
         | 
         | Just remember to do it before fork() and not between fork() and
         | exec() because you probably want to copy the existing envp and
         | allocators usually aren't async signal safe. If you want to be
         | sure to be correct, use posix_spawn to create a child process.
         | 
         | --
         | 
         | I think it's fair to say "setenv is buggy" in the sense that
         | the POSIX specification for setenv guarantees it to be buggy in
         | most programs that think they should be using it. What makes it
         | an even bigger footgun is that the path of least resistance for
         | people who need/want the behavior is the most difficult to use
         | correctly. POSIX is full of shit like this, and "you should
         | know better" isn't a good enough excuse.
         | 
         | It's like saying "hey you should have known unless you press
         | the doohickey for 5 seconds and turn the chainsaw at 90 degrees
         | when you start it, the chain is going to fly off."
        
           | jcalvinowens wrote:
           | > Libraries can (and do) spawn threads before main
           | 
           | I don't think any library should be calling setenv(), there's
           | always a better way. If you know of a counterexample, please
           | share it, I'd like to see it.
           | 
           | > Just remember to do it before fork() and not between fork()
           | and exec() because you probably want to copy the existing
           | envp and allocators usually aren't async signal safe.
           | 
           | Why would you go to all that trouble? Most implementations
           | just use the stack to build the arguments for execve(),
           | making all that irrelevant.
           | 
           | > If you want to be sure to be correct, use posix_spawn to
           | create a child process.
           | 
           | That's not the purpose of posix_spawn(), it exists to deal
           | with vfork-only nommu architectures:
           | 
           | >> [posix_spawn() was] specified by POSIX to provide a
           | standardized method of creating new processes on machines
           | that lack the capability to support the fork(2) system call.
           | 
           | >> These functions are not meant to replace the fork(2) and
           | execve(2) system calls. In fact, they provide only a subset
           | of the functionality that can be achieved by using the system
           | calls.
           | 
           | Anyway... back to our regularly scheduled programming:
           | 
           | > the POSIX specification for setenv guarantees it to be
           | buggy in most programs that think they should be using it.
           | [...] POSIX is full of shit like this, and "you should know
           | better" isn't a good enough excuse.
           | 
           | I completely disagree.
           | 
           | The C standard library is chock full of non-reentrant APIs.
           | Nobody who has read more than ten manpages would reasonably
           | assume that _anything_ is reentrant without an explicit
           | assurance. Locks are trivial. POSIX expects you to use a lock
           | if you want to use setenv() like this.
           | 
           | I also want to point out that (at least on Linux) getenv()
           | returns a pointer to the stack! How could anybody with basic
           | programming literacy reasonably expect that to be thread
           | safe? It's exactly analogous to getopt() and argv.
           | 
           | No, the authors obviously couldn't imagine that $FAANG would
           | be building 4GB binaries with 1000+ recursive library
           | dependencies, each of which has its own chance to reenact the
           | printer scene from Officespace with your shared envp. I think
           | that's an organizational problem, not a POSIX problem.
           | 
           | I'm not saying the standard shouldn't change: I would love to
           | see argv and envp become immutable. If we're going to change
           | it, that's the right move IMHO. But I don't really think
           | that's practical...
           | 
           | > It's like saying "hey you should have known unless you
           | press the doohickey for 5 seconds and turn the chainsaw at 90
           | degrees when you start it, the chain is going to fly off."
           | 
           | I think it's more like saying "we can't stop people from
           | getting hurt jaywalking, so we're going to solve the problem
           | by legally requiring everybody to wear helmets at all times
           | outdoors".
        
             | layer8 wrote:
             | > I don't think any library should be calling setenv()
             | 
             | It's already a problem if the library is calling getenv(),
             | because this could happen concurrently to the main program
             | calling setenv(). The only universally safe solution is to
             | not use setenv()/putenv() at all.
             | 
             | Which I think is actually reasonable. But yes it makes
             | those functions broken in a multithreaded program.
        
         | zlg_codes wrote:
         | That's essentially the problem with any technology, or
         | community around said technology, that's primed against some
         | target they want to replace.
         | 
         | It has kept me away from Rust for years. If its fans weren't
         | such fanboys for disruptive activity like rewriting things in
         | Rust for the fuck of it, I might look closer into it. But
         | considering where it came from and their politics, it doesn't
         | seem like Rust is actually for everyone.
         | 
         | Its angle of picking on C is also rich. If C is so bad, why has
         | no major language supplanted it or C++? Anything with high
         | importance on performance is written in low level languages,
         | unconcerned with pushing a narrative or making BS 'more
         | inclusive' which is really another view on affirmative action.
         | 
         | My identity alone makes me unfit for the project.
        
           | mike_hock wrote:
           | > If C is so bad, why has no major language supplanted it or
           | C++?
           | 
           | Because no one, not even Rust, aims for feature parity with C
           | ( _except_ on the language level).
        
           | jcalvinowens wrote:
           | > or community around said technology, that's primed against
           | some target they want to replace.
           | 
           | It's a vocal minority of the community, in my experience.
           | Most people I've personally met who are passionate about Rust
           | have a much more reasonable attitude about the whole thing,
           | and see it as moving the needle in the right direction rather
           | than a fully formed solution.
           | 
           | It also really is interesting: obviously it's not the panacea
           | it is sometimes made out to be, but it truly does eliminate a
           | class of error. I admit to a bit of stubbornness myself, but
           | I'm trying to work with it more.
        
           | JohnFen wrote:
           | > It has kept me away from Rust for years.
           | 
           | Yes. The Rust community is what has kept me away from Rust
           | for a long, long time. Now I'm learning Rust because it may
           | become an important skill to have, but it's despite the
           | community. They're very hard to put up with.
        
           | Georgelemental wrote:
           | > But considering where it came from and their politics
           | 
           | > unconcerned with pushing a narrative or making BS 'more
           | inclusive' which is really another view on affirmative action
           | 
           | For what it's worth: I am conservative, right-wing, as
           | opposed to "affirmative action" as it is possible to be--and
           | I love Rust. Don't judge the language by the politics of a
           | tiny section of its community, judge it on the technical
           | merits.
        
         | boring_twenties wrote:
         | > The most common misuse I see is changing env before forking a
         | child: nobody has to do that, execve() lets you pass arbitrary
         | envp to the new process without changing yours.
         | 
         | That's pretty much never what you would want. You want to set a
         | single variable while inheriting the rest of the existing
         | environment. In order to do that with execve() you would have
         | to copy the existing environment first, yuck.
         | 
         | And you wouldn't use setenv() before forking anyway, you would
         | do it after forking, in the child, before exec.
        
       | JohnFen wrote:
       | getenv/setenv is also part of the library, not the language. If
       | it presents a problem for a particular program, it's easy enough
       | to implement your own variety that behaves as you need, or to
       | wrap the library call in something that provides the needed
       | thread safety.
       | 
       | This seems like a bit of a tempest in a teapot to me.
        
       ___________________________________________________________________
       (page generated 2023-11-20 23:02 UTC)