[HN Gopher] Stack size invisibility in C and the effects on "por...
___________________________________________________________________
Stack size invisibility in C and the effects on "portability"
Author : todsacerdoti
Score : 157 points
Date : 2021-09-28 14:07 UTC (8 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| shadowgovt wrote:
| I once worked on a project where we were building a browser
| plugin in IE (back when plugins existed) and found ourselves
| needing to use Open Dynamics Engine within the plugin. Under the
| hood, ODE uses `alloca` a _lot_... Once per frame to set aside
| enough RAM to compute a Jacobian that solves for all active
| collisions. But on Windows, you can configure "redzone
| protection" to fault the program if the stack takes a flying leap
| of a (relatively) paltry few kB off the current top of the stack,
| relative to the default configuration of redzone protection on
| most single-threaded Windows apps. From ODE's point of view this
| was fine; they didn't really care about the "compiling on
| Windows" use case at all and considered our problem to be ours.
| But this proved disastrous in the context of running on Windows
| in a thread, and since we were running in IE's process context,
| we couldn't change the thread allocation rules because they're
| baked into the process itself.
|
| The nice thing about C++ is that the preprocessor is a monster.
| We #define'd all references to alloca in the ODE codebase away
| into calls to our own function that returned, instead of a bare
| pointer, an object with the right pointer-dereference semantics
| to act like a bare pointer but that was really a reference to a
| slice of a colossal chunk of heap memory. When the context of the
| alloca call closed, the returned object was destroyed, and its
| destructor moved our imaginary "stack" pointer in the big chunk
| of heap memory back into place.
|
| Ugliest thing I ever wrote. ;)
| pikhq wrote:
| This article is a bit strange. It's referring to the thread stack
| size but acting as though it's the main stack size. It's true
| there's no portable C API for doing anything with the main stack
| size, but musl libc doesn't behave any differently than glibc
| here -- the main stack is allocated by the kernel. The thread
| stack size on musl is small, yes, but the only way to even end up
| in a thread involves using the pthread API. The pthread API
| exposes an API for configuring the stack size for a thread; it's
| called pthread_attr_setstacksize. You can find it here:
| https://pubs.opengroup.org/onlinepubs/9699919799/functions/p...
|
| Granted, it is maddening that C does not offer any great ways to
| reason about what stack size is needed, but it is simply nonsense
| to imply (as people seem to be inferring here) that there's no
| way for an application to handle running into musl's stack size
| limits.
| unwind wrote:
| It's interesting that nobody has mentioned the fact that "stack"
| is a word that is not mentioned in the C standard (caveat: I
| don't own it. but I just checked in a draft copy). In other
| words, there is no assumption/guarantee that a C compiler will
| rely on the machine stack in order to implement certain fetures
| (i.e. automatic variables, parameter passing, and function
| call/return).
|
| So I guess that means that an imaginary API to query these things
| should not talk about "stack", since that is making an assumption
| that is a bit out of line. That does not make naming any easier
| ... and, as pointed out by vasama, it would be hard to use an API
| that queries about the current function only. Perhaps something
| like "autosizeof f" where f is a function could return, at
| compile-time, the projected "automatic storage size" of the given
| function (perhaps making it into a compilation error if f() uses
| VLAs, or calls functions that do), and a runtime "autoleft()"
| function could be used to determine how much space is left.
|
| Uh, no, probably not. These things are hard, and I'm making
| assumptions about code analysis capabilities that are probably
| not realistic.
| Joker_vD wrote:
| LLVM-MOS targets 6502 and so goes out of its way to use as few
| of the hardware stack as possible: it does global program
| optimization, figuring out what calls what, and what functions
| don't need to be reentrant so it could allocate as many of
| activation records as possible statically, FORTRAN-style.
| silon42 wrote:
| I can easily imagine compiler+linker being able to forbid
| recursion and statically calculate maximum stack needed.
| Probably you need to ban general function invocation via
| pointers... methods on objects could be fine, you just evaluate
| all virtual method override combinations.
| noneeeed wrote:
| The SPARK Ada language prevents recursion by only allowing
| you to call methods that are defined above the method you are
| in. It has a similar rule to prevent recursive calls between
| packages although I forget the details of that. From this you
| can calculate the worst case stack size. As all other memory
| has to be statically allocated you can always caclculate the
| maximum memory footprint of any progam.
|
| Obviously those restrictions mean you have to write your code
| without recursion, but in practice the uses that SPARK is put
| to don't tend to need it. A statically allocated stack in
| your code and a loop can solve a lot of problems (along with
| suitable logic for gracefully handling that stack filling
| up).
| Ironlikebike wrote:
| The stack isn't in the C-standard, but it is often minimally
| defined in the psABIs (e.g., the ELF ABI on Linux) for the
| architecture. The psABI is generally trying to balance
| performance of addressability (on the architecture), with
| security (position-independence), while constraining the
| toolchain developers' implementation as little as possible. As
| a programmer you don't have to implement to the ABI, but if you
| want interoperability and you want to use a C language
| compiler, you should (good luck creating your own ABI
| otherwise).
|
| I'm not aware of any way to programmatically query the rules
| laid out in the psABI. The implementation of the corner cases
| is often based on mutual agreement (convention) between the
| compiler and the link-editor.
|
| You could probably write a tool that maintains a representation
| of these rules and returns the results in some sort of parse-
| friendly EBNF, but you'd need a convention linter to make sure
| that the implementation of your tool tracks the change in
| convention in the toolchain components.
|
| On-the-other hand, you could write an extension to the compiler
| that warns you when certain C code results in behaviors around
| the stack that might be undesirable, i.e., if you're passing
| too many parameters to a function and it's going to spill those
| to the stack.
|
| The standard way to avoid unknown behavior in this regard is to
| explicitly tell the compiler what you want to accomplish by not
| relying on auto-allocation or parameter passing rules (as
| stated in the top thread). This means explicit allocation on
| the heap, rather than than allowing the compiler to auto-
| allocate on the stack, and using pass-by-reference to avoid
| parameters spilling to the stack. All of this is slower since
| accessing values in the stack is often a single instruction
| operation (small number of cycles), vs multi-instruction loads
| from far-away addresses (large number of cycles).
| pjc50 wrote:
| Yeah, I've written C for the PIC16 which has a hardware stack
| .. of depth 8.
| https://electronics.stackexchange.com/questions/158307/is-th...
|
| I'm not entirely sure it was a standards-compliant compiler
| though. I think it may have denied recursion?
|
| Mind you, I also think people have unreasonably high
| expectations of the C standard; in order to get things done you
| inevitably have to make compromises and use features that
| aren't available on _every_ platform. Which leads to autoconf
| hell, but if you 're still developing on C you've accepted its
| imperfections.
| cryptonector wrote:
| You ask for an automatic allocation of N bytes. You can use
| alloca() if you have that available, or VLAs if you have that
| available.
|
| If you have neither, then you still have to worry that fixed
| sized automatic allocations might be too large, but at least
| this can be checked statically.
|
| Given either alloca() or VLAs, those can be allocated on a
| stack or on a heap, and the spec does not say either way. The
| OS might over-commit memory, too, and if the C run-time has
| fixed-sized stacks there may or may not be a guard page at the
| end.
|
| However, even with all of that, using memcpy() or similar (see
| elsewhere in this thread) to touch every byte or every 4KB's
| worth of bytes in what might be the direction of stack growth
| (if there's a stack, otherwise the direction is irrelevant), is
| something you can do to make sure that the memory allocated is
| committed, and that any failure due to over-committing (bounded
| stacks look like over-committed unbounded stacks) is detected
| as early as possible and without corrupting memory (e.g., by
| wildly exceeding stack bounds). Failure may ensue, but it
| should be behavior defined by the machine rather than undefined
| behavior.
| alcover wrote:
| > fixed sized automatic allocations [...] can be checked
| statically
|
| May I ask how you'd do this ? I need this for optimizing a C
| project.
| cryptonector wrote:
| Check every function's automatic variables' sizes, add them
| all up, and add whatever function call frame size the
| compiler and ABI are using. Getting an exact accounting of
| function call stack memory usage is tricky because you
| might have framepointer-less code, and you might have
| inlining of function calls, but you can figure it out
| assuming no inlining and assuming you don't enable
| framepointer-less-ness.
| alcover wrote:
| Thank you. It doesn't err.. look super straightforward. I
| was hoping for an automated way of some sort.
|
| This stack affair of `local arrays are super fast! but
| keep it small, 1024 should be okay` is very puzzling to
| aspiring C lovers like me.
| cryptonector wrote:
| There's a way to dump DWARF debug information that you
| might then be able to parse and use to do just this.
| jart wrote:
| You can guarantee undetected stack overflow is impossible
| if you do two things. First you mprotect() a 4096 byte at
| the bottom of the stack. Second, you program the compiler
| to fail if an individual function allocates a stack frame
| larger than that. #ifndef
| STACK_FRAME_UNLIMITED #pragma GCC diagnostic error
| "-Wframe-larger-than=4096" #if __GNUC__ >= 9
| #pragma GCC diagnostic error "-Walloca-larger-than=1024"
| #pragma GCC diagnostic error "-Wvla-larger-than=1024"
| #endif /* GCC 9+ */ #endif /* STACK_FRAME_UNLIMITED
| */
| b3morales wrote:
| Conversely, I don't think `malloc` et al. say anything about a
| "heap". All they promise is they'll allocate some memory, with
| certain size/alignment and lifetime guarantees. It's up to the
| platform to figure out how to fulfill those guarantees.
| zabzonk wrote:
| How can you predict how much stack you need? Isn't this basically
| the halting problem?
| [deleted]
| phkahler wrote:
| With the exception of recursion, static analysis can answer
| that question.
| Verdex wrote:
| As far as I know, there aren't any languages which provide a
| way to do this out of the box. If a language wants to provide
| an environment where no stack overflows can happen, then it
| needs to provide a runtime where the stack programs use is
| transplanted over to the heap as it gets too large (or maybe it
| just starts out there). At this point the answer is that you
| only need a stack big enough to hold your runtime (which you
| can engineer to be sufficiently small always).
|
| Of course you can imagine a language where your function
| signature includes how much stack space it takes up. The stack
| size being recursively calculated by looking at all the
| functions your function calls etc.
|
| If recursion is included, then you're correct that this would
| require solving the halting problem. The obvious solution would
| be to just not allow recursion. Although, if you also consider
| languages like Idris (which are technically sometimes not
| turing complete) you can setup a termination checker and then
| certain rules can give you an upper bound on how long a
| recursive function will run (and therefore how much stack it
| takes up).
| rurban wrote:
| Well, there's still libsigsegv and its
| stackoverflow_install_handler. You won't get the size, but get
| the overflows.
| CJefferson wrote:
| The lack of stack size is one of my continual irritations. It
| makes it difficult to write tree-searches in the obvious
| recursive way, because there is a good chance one day you will
| have an input big enough to overflow the stack. I wish on modern,
| 64-bit OSes the default thread's stack size was just made a
| decent size (128MB or even 1GB) -- would it really cost that much
| (as until the memory was touched it wouldn't really be
| allocated).
|
| Your choices are:
|
| * Make a "stack" yourself, in heap memory, and implement your
| algorithm to manually push/pop from the stack you implemented
| (often not hard to do, but feels like reimplementing for no
| reason).
|
| * Spawn a new thread, where in most languages you can set the
| stack size. This is what I tend to do, although it does seem like
| a waste to have an extra thread for no reason other than to get
| more stack.
|
| You can (in principle) set the stack size when you compile a
| program, but how to do it seems to differ between OSes and
| compilers, so I prefer the new thread strategy as an in-language
| option.
| LegionMammal978 wrote:
| I recall that the Rust compiler uses the stacker crate [0] to
| grow the stack into the heap, and it's under the rust-lang
| organization on GitHub, so it's probably the closest you can
| get to an "in-language option" without it actually being in the
| standard library. However, it currently isn't supported for
| several platforms due to the lack of a suitable OS interface.
|
| [0] https://docs.rs/stacker/0.1.14/stacker/
| kevincox wrote:
| > would it really cost that much (as until the memory was
| touched it wouldn't really be allocated).
|
| The problem here is that once it is used once it is never
| freed. (Other than being swapped out). So you effectively cache
| all of the space your program needs forever.
|
| There are some solutions like trying to mark the stack below
| you freeable but it is hard to do this in a way that doesn't
| cause huge performance concerns if you happen to be crossing
| whatever boundary is picked often. IIRC Golang used to do this
| but switched to copying, growable stacks because no matter how
| hard they tried they always ended up in scenarios where they
| were allocating and deallocating stacks in a hot loop.
| owl57 wrote:
| > IIRC Golang used to do this but switched to copying,
| growable stacks because no matter how hard they tried they
| always ended up in scenarios where they were allocating and
| deallocating stacks in a hot loop.
|
| They used to do segmented stacks. The bad overhead is not in
| allocation, but in switching between segments when stack
| pointer is oscillating around the boundary -- but if you want
| to allocate really many stacks that could grow large, you
| don't have much choice: either this or tracking pointers that
| need to be updated on reallocation. Normal contiguously
| mapped stack, but with dynamic growing and shrinking of
| actually mapped pages, is an entirely different proposition.
| Dylan16807 wrote:
| Ah, that makes more sense, since you could easily limit
| stack deallocation to once a second or whatever.
| Pxtl wrote:
| > The problem here is that once it is used once it is never
| freed. (Other than being swapped out). So you effectively
| cache all of the space your program needs forever.
|
| What do you mean? It will get freed once you pop that stack-
| frame won't it?
| marcosdumay wrote:
| No. It was never allocated either.
|
| It was virtual memory before it was used, but once you use
| it, it's mapped into physical memory and marked as dirty.
| So it may be swapped out or live there, those are the only
| two options.
|
| It would take your OS being fully aware of your usage
| schema for it to be able to free the memory.
| LeifCarrotson wrote:
| > So you effectively cache all of the space your program
| needs forever.
|
| All the space your program uses at the largest stack size,
| you mean, which is different. Until you touch it, the huge
| stack is purely virtual. And it only lasts until you restart
| the program.
|
| Yes, if you default to a 10 GB stack, then recursively tree-
| search over 9 GB of data, then close the data and spend the
| next week with 1 MB data that only touches the bottom of the
| stack, you'll have 8.999 GB swapped out, which does suck.
|
| But that only happens if you run the program on that 9 GB of
| data. The alternative was to fault when you overflowed the
| smaller stack when that data was imported.
|
| It does mean if there's an infinite recursion bug, instead of
| crashing quickly with a stack overflow, you have to wait
| while it pages out other programs and fills the whole 10 GB
| of stack...
| kevincox wrote:
| Yes, that is basically what I mean. If you are going to use
| a very large amount of stack space you should consider
| alternatively allocating something on the heap and freeing
| it afterwards.
|
| Yes, if most of your program is using roughly the same
| amout of stack space then it is great. However I think if
| you are worried about running out then you are likely using
| far more than the average part of your program.
|
| So yes, it isn't intrinsically bad, but context important
| to evaluate the decision.
| ximeng wrote:
| Zig appears to have a proposed solution to this similar to your
| option 1, move recursion to the heap.
|
| https://github.com/ziglang/zig/issues/1639
|
| They're also looking at static analysis of the maximum stack
| depth which might help in some cases.
| ziml77 wrote:
| Being able to make recursive functions use the heap would be
| great. I assume it would need to be opt-in since you'll
| change performance characteristics, but I almost always
| rearrange recursive algorithms to use a loop and a heap
| allocated stack structure instead. There's no reason the
| compiler couldn't help do that for me.
| phkahler wrote:
| A good tree will have something like log(n) depth so normal
| recursion should be fine.
|
| Apart from recursive algorithms, stack size can be determined
| by static analysis.
|
| Which reminds me: what is a good tool for analyzing / viewing a
| C++ programs call tree?
| wallstprog wrote:
| > Which reminds me: what is a good tool for analyzing /
| viewing a C++ programs call tree?
|
| As already mentioned, perf can capture call stacks, but those
| are statistical samples, so not necessarily complete.
|
| callgrind (https://www.valgrind.org/docs/manual/cl-
| manual.html) by contrast can capture all call-stacks.
|
| kCacheGrind/qCacheGrind
| (https://kcachegrind.github.io/html/Home.html) can be used to
| view the output from either perf or callgrind. Sources for
| kcachegrind, including converters from perf, oprofile, etc.,
| can be found at https://github.com/KDE/kcachegrind
| xfer wrote:
| https://llvm.org/docs/Passes.html#dot-callgraph-print-
| call-g...
| AnimalMuppet wrote:
| Yeah. You won't be able to fit the tree in memory long before
| you overflow the stack while recursing down the tree.
| stncls wrote:
| > A good tree will have something like log(n) depth so normal
| recursion should be fine.
|
| Yes, this is true for comparison/search/sort/bucketing
| algorithms. But in other applications (e.g. graph theory,
| networks, sparse matrices) a depth of n, for n elements, can
| be perfectly legitimate. Or it can even be an edge case, but
| one you don't want to handle by just segfaulting.
|
| > Which reminds me: what is a good tool for analyzing /
| viewing a C++ programs call tree?
|
| Linux perf can collect the necessary data. Then for
| frontends, I am not sure. KDAB's "hotspot" (
| https://www.kdab.com/tag/perf/ ) can do flamegraphs, but this
| may be more performance-oriented than what you're looking
| for.
|
| Edit: perf can collect the data _statistically_. It will miss
| many calls, the ones that are the least time-consuming.
| dllthomas wrote:
| > A good tree will have something like log(n) depth so normal
| recursion should be fine.
|
| But in many contexts you have to be robust against a bad
| tree.
|
| > Apart from recursive algorithms, stack size can be
| determined by static analysis.
|
| Not always, in the presence of dynamically sized arrays (or
| alloca and friends).
| foobiekr wrote:
| Honestly, there is almost no difference between a purely
| recursive implementation on the stack and one that uses a
| passed-in stack structure (whether extensible or fixed) in
| terms of complexity. Though I can see why it would be nice if
| the languages supported some sugar for this.
| likeabbas wrote:
| Could also be solved if the compiler supported Tail calls
| optimization
| Jensson wrote:
| The main reason to use recursion is to use the stack
| variables, when tail call optimization is possible people
| usually just prefers loops.
| dnautics wrote:
| I don't think tail call will help depth-first ("obvious
| recursive way") tree algos.
| aidenn0 wrote:
| It doesn't help with breadth-first searches either. A non-
| degenerate tree of depth N will use O(N) space, and that
| space will be on the call-stack if you use a recursive
| algorithm to access it.
|
| As long as you use less than 1 page per activation record
| though, most modern libc implementations behave safely
| under unbounded recursion as they will allocate a guard-
| page around each stack.
|
| [edit]
|
| It would also be useful if the compiler could warn about
| activation records larger than 1 page. Absent alloca (and
| deprecated c99 VLAs), that should be statically knowable.
| dnautics wrote:
| A BFS is essentially a while loop with a queue
| datastructure. This is (relatively) easy to implement in
| TCO-recursive erlang/elixir, which is what I'm currently
| most proficient in. And, yeah, TCO would help.
| Dylan16807 wrote:
| What does a recursive breadth-first search look like? I
| think you'd have to contort yourself to write that, which
| is why people are specifically mentioning depth-first
| where recursion is easy and natural.
| aidenn0 wrote:
| Ah you're right; the most natural BFS would be tail-
| recursive or iterative.
| cryptonector wrote:
| TCO is an optimization. It's not always possible.
| jhgb wrote:
| It's a misnomer. It's not just "an optimization"; in some
| cases, it alters program semantics.
| cryptonector wrote:
| Usually the only visible side-effect of TCO is stack
| trace compression and failure to exhaust the stack. You
| can call that semantics if you like, but I think most
| people wouldn't. It's absurd that Java doesn't do TCO
| because of the stack compression "issue" (as I
| understand, that's why Java doesn't TCO).
| jhgb wrote:
| Termination instead of divergence definitely is a change
| in semantics.
| adrianN wrote:
| C++ compilers are free to optimize infinite loops without
| side effects away and people are generally fine with
| that.
| jhgb wrote:
| Well, I'm sure this is not the worst thing about C++ that
| C++ people are "generally fine with", so that really
| doesn't tell me a lot. Not quite sure if this applies to
| useful programs, though. Lack of tail call elimination
| can definitely break a class of useful programs even with
| side effects.
| cryptonector wrote:
| No, failure to fail is not a change in semantics.
|
| EDIT: "My program used to fail with a stack size
| exhaustion error, but now it works -- I'm so angry about
| this!" -no one ever.
| jhgb wrote:
| But they'd complain about the opposite -- for example if
| the program were an external-event-driven state
| automaton.
| cryptonector wrote:
| TCO does not do the opposite. TCO helps avoid stack
| exhaustion without introducing new faults.
|
| TCO might cause your program to avoid one stack
| exhaustion fault only to reveal a different fault
| (possibly a different stack exhaustion fault). This is
| not a semantics change. And it's not a bug either.
| jhgb wrote:
| What do you mean by "reveal a different fault"? A program
| with an indefinite amount of tail calls does not have "a
| fault".
| cryptonector wrote:
| If you have a program that fails due to stack exhaustion,
| but fixing it by enabling TCO causes it to run longer and
| therefore run into some other problem...
| fsckboy wrote:
| > You can call that semantics if you like, but I think
| most people wouldn't.
|
| Tail Call Optimization (TCO) is an optimization, yes;
|
| but if Tail Call Elimination is required in the
| implementation of a language as it is in Scheme, then
| programmers in the language can rely on it (along with
| lexical scope) to write iterative loops instead as
| recursive calls. That is a change in program language
| semantics.
|
| see Guy Steele's "Debunking the 'Expensive Procedure
| Call' Myth, or, Procedure Call Implementations Considered
| Harmful, or, Lambda: The Ultimate GOTO"
| http://dspace.mit.edu/handle/1721.1/5753
| haberman wrote:
| > but if Tail Call Elimination is required in the
| implementation of a language as it is in Scheme
|
| This is also supported as a non-standard extension in C,
| C++, and Objective-C as of Clang 13: https://clang.llvm.o
| rg/docs/AttributeReference.html#musttail (I implemented
| this).
|
| > see Guy Steele's "Debunking the 'Expensive Procedure
| Call' Myth
|
| Yes! :) I cited this paper in my blog article describing
| how we are using musttail:
| https://blog.reverberate.org/2021/04/21/musttail-
| efficient-i...
| cryptonector wrote:
| My point is that adding TCO is not a breaking semantics
| change, as its only visibility will be through a) the
| fact that more code works with it than w/o it, b) elision
| of tail-calling frames in stack traces. In particular I
| don't consider (b) to be a problem. I'm aware (b) is a
| matter of opinion, but really, TCO is much too valuable
| to use (b) as an argument against TCO.
| ploxiln wrote:
| The initial/main thread stack size is dynamic, it grows until
| it runs into another allocation (and these days there are guard
| pages and such). You can already use 128MB on the initial/main
| stack, even with musl-libc / Alpine linux (it needs to be
| linearly probed it if it's unreasonably huge, so the OS isn't
| surprised by a random memory access out in the middle of
| nowhere, but the compiler should do that automatically these
| days).
|
| You can also request whatever stack size you need for
| additional threads when you create them, this has always been a
| good idea.
|
| It's only the default stack size for non-main threads, if you
| don't specify it, which may be considered reasonably limited.
| masklinn wrote:
| > The initial/main thread stack size is dynamic That seems
| doubtful, AFAIK all OS have a static allocation for the main
| thread, just one that is commonly (though not necessarily)
| much larger than what they give off-threads, as
| https://ariadne.space/2021/06/25/understanding-thread-
| stack-... quotes Windows gives 1MB to all threads, while
| macOS or OpenBSD 4.6+ give 8MB to the main thread and 512k to
| off-threads.
|
| The former would not merit consideration if it was
| automatically increased.
|
| > it grows until it runs into another allocation
|
| Which it won't because vmem.
|
| > and these days there are guard pages and such
|
| That just makes the access segfault if the stack allocation
| is not large enough to straight jump over the guard page.
| ploxiln wrote:
| I'm not so familiar with other operating systems, but on
| Linux and presumably other unixes, the only thing holding
| back the initial/main "process stack" is ulimit. Try
| "ulimit -s unlimited". (On macOS that increases it from 8MB
| to 64MB, but on linux it seems to really go unlimited)
| grumpyprole wrote:
| A brilliantly written article. Couldn't have said it better
| myself.
| ghoward wrote:
| TFA is right, sadly.
|
| One of the projects I was to go in the future is implement
| boringcc [1], and this sounds like something I would want in it.
|
| The question then becomes: would it even be possible to do?
|
| [1]: https://groups.google.com/g/boring-
| crypto/c/48qa1kWignU/m/o8...
| qsort wrote:
| Wouldn't the majority of it just be, effectively, "all pointers
| are now fat"?
|
| It could disable some (many?) optimizations, but I don't see it
| as a particularly hard thing to do (at least conceptually).
| ghoward wrote:
| No, not a hard thing to make a boringcc compiler, but I was
| referring specifically to querying stack size.
| hdjjhhvvhga wrote:
| Probably the only way is to use platform-specific quirks, but
| that's basically the same with everything else. The portability
| only works for some generic code, pretty much any non-
| trivial/real-world software uses a lot of platform-specific
| code.
| cryptonector wrote:
| TFA is wrong. The C run-time's call frame allocation method and
| actual resource limits leak past the C function call and
| variable-length-array (VLA) abstractions. You can write
| portable code to force defined (by the platform, not by the C
| spec) resource exhaustion behavior sooner than undefined
| behavior caused by memory corruption caused by use of much-
| larger-than-the-stack stack allocations.
| ghoward wrote:
| Can you really? If you depend on a SIGSEGV handler, that
| handler needs stack space unless you set up a separate stack
| for it, and I am not sure that is portable.
| cryptonector wrote:
| The point is to cause a defined (by the platform) stack
| exhaustion failure mode before possibly causing undefined
| behavior by corrupting memory.
|
| The C specs do not define any resource exhaustion failure
| modes. But actual operating systems, ABIs, platforms --
| they do.
|
| If you are on a system that gives you a SIGSEGV but no way
| to catch it on an alternate stack -or if you don't catch or
| ignore it- then your process will crash. This is fine. It's
| better than memory corruption.
| ghoward wrote:
| Can you give me an example of a platform where your
| program cannot cause memory corruption if you access an
| array on the stack out of bounds far enough to corrupt
| actual existing memory on the heap? Because a SIGSEGV is
| the best case scenario; the other scenarios are not
| helped at all by platforms.
| cryptonector wrote:
| Say you alloca(1GB), and alloca() is naive and gives it
| to you even though that 1GB spans several other stacks,
| and now you write somewhere in the middle. If you're
| lucky you'll hit unmapped space and trigger a segfault.
| If you're unlucky you'll scribble over another thread's
| stack or whatever.
|
| But if you progressively write into the alloca()ed area
| in the direction of stack growth then you cannot corrupt
| other mappings, and you can only segfault if you exceed
| your stack's maximum size. Of course, there can be
| platforms with no SIGSEGV or whatever.
| miloshadzic wrote:
| The spam on that thread is really something.
| yholio wrote:
| Seems like a moot point on any 64 bit platform, where you can
| effectively reserve GBytes of virtual memory for each of
| thousands of threads, and page in physical memory as required.
|
| It's the only sane interpretation I can give to a standard that
| hides any way to manage stack usage - the compiler/OS layer takes
| responsibility.
| noobermin wrote:
| If it's the OS's responsibility then Alpine Linux shouldn't
| blame developers or say their code isn't "portable."
| aidenn0 wrote:
| Gigabytes for thousands of threads is just barely true; if you
| reserved say 8GB each for 16k threads, you would just about use
| up the 47 bits of address space reserved for userland on Linux
| leaving nothing for text or heap.
|
| Certainly a few MB per thread isn't too crazy though; at 8MB
| per thread, you can have a million threads and use only a tiny
| fraction of the address space available.
| yholio wrote:
| You could reserve, say, 46 bits for heap and 46 for stack, 70
| terabytes each _ought to be enough for anybody_. Then start
| allocating thread stack in successive halvings: 2 threads get
| 35TB each, 65536 threads get 1GB each etc.
|
| Effectively, you can make sure the program runs out of
| available memory long before a stack exhaustion issue, unless
| the stack usage pattern is severely unbalanced for some
| thread.
| jart wrote:
| Virtual memory isn't free. 16k threads with 8gb stacks
| technically needs like 250gb of page tables. You must be
| using some strange memory model and microprocessor that
| lets you have 92 bits of address space.
| Koshkin wrote:
| How do functional languages and, in general, the languages where
| the use of recursion is pervasive - how do they deal with this?
| Verdex wrote:
| So, you can't answer "in general" because things are handled
| differently depending on the language.
|
| I believe Scheme basically only uses the heap (which is
| necessary because of call/cc iirc).
|
| Haskell uses a variation of a G machine (looks like a spineless
| tagless g machine to be specific). The way that works is that
| they convert your program into a giant lambda term and then
| they work on reducing it. This of course looks nothing like a
| traditional call stack, so you can replace that whole thing
| with the G machine. Recursion (at least of the non-tail
| recursive variety) will look like a lambda term slowly getting
| bigger in heap space.
|
| And then I also heard of someone (I forget which language)
| talking about setting up some logic to look for stack overflows
| and then throw all the current stack onto a heap and then start
| over.
| dfox wrote:
| The part about detecting stack overflows probably refers to
| implementation strategy in some scheme implementations (IIRC
| chicken scheme does this) which involve using the
| platform/machine stack as effectively hardware-assisted
| nursery generation for generational copying GC, when it gets
| full (usually well before true overflow) the minor GC cycle
| gets done and the stack gets completely emptied. In these
| designs the "stack" is usually used for all allocations, not
| only for activation records.
| miloignis wrote:
| Tail Call Optimization & allocating their own stacks,
| generally.
| elipsitz wrote:
| In many cases, with tail call optimization, turning specific
| types of recursion into iteration (which doesn't use more than
| a constant amount of stack space).
| masklinn wrote:
| Not tail call optimisation, tail call _elimination_.
|
| If the language is based on recursion, you can't just hope it
| happens commonly enough, it has to happen all the time, even
| for corecursive functions.
|
| Except I've never been super clear on Haskell having TCE or
| not, because laziness means there are limited opportunities
| for it to even fire.
| cryptonector wrote:
| As long as there's a guard page and as long as you can assume
| that page is at least 4KB in size, you can detect C stack
| boundaries safely. Basically, don't use alloca() or C99 VLAs, or
| huge automatics, or, if you do, touch a byte every 4KB in the
| direction of stack growth before you use bulky stack allocations.
| I have a macro for this somewhere.
|
| Also, this isn't limited to C... The real problem is not that the
| stack size is invisible to the program, or that you can run out.
| The real problem is that you can make a stack allocation so large
| that you might miss the end of the stack and scribble over some
| other memory. This is why you need to touch a byte ever guard-
| page-size bytes (in the direction of stack growth) in large stack
| allocations before using those allocations.
|
| What's specific to C is alloca() and C99 VLAs. Not that other
| languages don't have those, but that in their absence it's
| extremely unlikely that any function would have a frame so large
| as to run past the guard page. A maximum frame size can be
| computed by the compiler in the absence of alloca() or VLAs.
|
| Also, the compiler could (and arguably should) generate code to
| do this touching of a byte every guard-size-bytes thing for
| dynamically sized stack allocations, which would make the problem
| disappear. (You could still run out of stack space, but more
| safely.)
| Joker_vD wrote:
| > touch a byte every 4KB in the direction of stack growth
| before you use bulky stack allocations. I have a macro for this
| somewhere.
|
| > you need to touch a byte ever guard-page-size bytes (in the
| direction of stack growth) in large stack allocations before
| using those allocations
|
| > Also, the compiler could (and arguably should) generate code
| to do this touching of a byte every guard-size-bytes thing for
| dynamically sized stack allocations, which would make the
| problem disappear.
|
| That's precisely what MSVC does since 1995 or so: inserting the
| calls to the _chkstk() function that touches the guard pages. I
| believe gcc also does this when it targets Windows.
| cryptonector wrote:
| Excellent. Thanks for the tip. Some references:
|
| https://docs.microsoft.com/en-
| us/windows/win32/devnotes/-win...
|
| https://metricpanda.com/rival-fortress-update-45-dealing-
| wit...
| titzer wrote:
| This doesn't work in practice, see my above comment. You can't
| put guard pages into the middle of stack segment (using
| mprotect()--the kernel will just override that mapping, happily
| growing right over top of it). Even putting a guard page at the
| end doesn't work, as you can still have spurious SIGSEGVs, as
| you mention, because programs might just stride more than 4KiB
| at a time; you can't trust a C compiler to insert strided
| accesses for you, and that's just overhead.
|
| The only way you can reliably do this, I've found, is to
| allocate your own stack segment.
| cryptonector wrote:
| Of course you can have a SIGSEGV when you exceed the max
| stack size. That's fine and to be expected.
|
| What's NOT fine is for a large stack allocation to cause the
| program to silently scribble over another stack or heap.
| masklinn wrote:
| > Of course you can have a SIGSEGV when you exceed the max
| stack size. That's fine and to be expected.
|
| It's more that you can't expect anything better, having an
| actual stack overflow error would be a lot more helpful.
| cryptonector wrote:
| Right. With 64-bit ISAs you can address 2^64 bytes,
| but... you can't exactly _have_ 2^64 bytes. At some point
| the system will fail. How it fails is important. A ran-
| out-of-resources errors is a lot better than undefined
| behavior resulting from random memory corruption.
|
| And yes, portable C has finite size_t. Therefore portable
| C must fail in some way when limits are exceeded.
| gpderetta wrote:
| Have you considered --split-stack? It provides infinitely
| growable stacks (up to the heap limit I guess) and a defined
| ABI to find the stack boundaries.
|
| The cost is slower function calls and an ABI break.
| chrisseaton wrote:
| > As long as there's a guard page and as long as you can assume
| that page is at least 4KB in size, you can detect C stack
| boundaries safely.
|
| How do you do this in pure C?
| cryptonector wrote:
| You're gonna make me dig it out, ain't ya. I don't know where
| I put it, but I've written a macro and function that does
| this. Basically, given a pointer and allocation size, if it's
| smaller than 4KB, do nothing, if it's larger, loop for
| allocation size div 4KB touching a byte from one end of the
| allocation to the other. The tricky things are: determining
| in which direction the stack grows, and the page size (which,
| arguably, you can hard-code to 4KB) if you really don't want
| to hard-code 4KB.
|
| You can determine stack growth direction by having one
| function call another with a pointer to a local variable in
| the former and then compare that to a pointer to a local
| variable in the latter. To do this correctly you have to cast
| to uintptr_t.
|
| And, yes, pure C does not even assume a stack, much less a
| bounded stack. In pure C it's possible to have every call
| frame allocated on the heap, which means that assuming that
| they are allocated contiguously is not safe. Moreover, even
| with stacks, the run-time could allocate a second stack when
| the first is exhausted and push new call frames on the second
| rather than die. In pure C the only assumptions about limits
| have to do with the size of size_t. The real world intrudes
| though, and in practice all C run-times allocate call frames
| on bounded stacks.
| chrisseaton wrote:
| None of this is possible in a well-defined pure C program.
| Pages with special memory protection aren't a thing C knows
| anything about.
|
| That's the point - you can't do it without sacrificing
| portability.
| cryptonector wrote:
| What I'm describing is portable. It may be unnecessary in
| C run-times that allocate call frames on the heap, but
| those aren't realistic C run-times.
| chrisseaton wrote:
| Say I ran your code on a special C implementation that
| tracks all memory access and disallows it when you try to
| read from stack memory that doesn't contain a live
| object.
|
| That'd be a completely standards-compliant C
| implementation.
|
| Your code wouldn't run on it. Your code would not be
| portable to my standards-complaint implementation.
| Therefore your code is not portable.
| cryptonector wrote:
| You can almost certainly make it portable. After all, the
| allocation has a pointer, size, and type, and if it is at
| all an array, you could dereference every Nth element as
| needed, possibly with memcpy(), and avoid aliasing rules
| via casting to unsigned char *.
| Hercuros wrote:
| Wouldn't the compiler be allowed to elide the memory
| accesses that are being used to poke the memory region?
| If you are doing the poking before the memory has been
| initialized, then I'm not sure there are any guarantees
| that the compiler will read from the actual memory region
| (since it's uninitialized). And if you do writes to do
| the poking then there might be similar issues if the
| result of the write is never used. Even for reads you
| would have to make sure that the result gets used. Also
| not sure if casting to volatile is allowed if the object
| itself is not volatile. This is all true regardless of
| UB.
|
| It seems to me like it is impossible to do poking in a
| way that could not be elided by the compiler, because
| removing poking never alters the semantics of the program
| (stack overflow or guard pages are not part of the
| semantics of C, I believe).
|
| Of course, in practice you'll probably get the right
| behavior, but I'm personally always interested in
| language lawyery questions. I do believe that there are
| cases of security issues introduced by compilers eliding
| memset operations on password buffers, because the
| compiler can tell that the result of memset is never used
| and so it doesn't actually clear the memory. These sorts
| of things are tricky to get right.
| cryptonector wrote:
| Yes, you need to find a way to have the compiler not
| elide memset() or this particular function. That part is
| probably not portable.
|
| You can cast to volatile, but casting away volatile-ness
| and dereferencing the resulting pointer is UB.
| chrisseaton wrote:
| > That part is probably not portable.
|
| If you need that part, and that part is not portable,
| then your solution is not portable is it!
| chrisseaton wrote:
| So you're going to touch every byte of the memory, to
| check it's been successfully allocated.
|
| What happens when you access memory which has not been
| successfully allocated?
| cryptonector wrote:
| > So you're going to touch every byte of the memory, to
| check it's been successfully allocated.
|
| Strictly less (not more than every 4KB), but, yes :)
|
| > What happens when you access memory which has not been
| successfully allocated?
|
| Portable C has finite size_t. Actual machines, virtual or
| otherwise, have finite resources. At some point you may
| run out of resources. With a dynamic stack allocation
| checker you'll fail with a segmentation fault when you
| run out. Without a dynamic stack allocation checker (but
| _with_ dynamic stack allocations) you may well fail in
| random ways due to corruption of other stacks, heap,
| mmaps, etc.
|
| If your C run-time doesn't have stacks then a dynamic
| stack allocation checker may be pointless... unless your
| OS is over-committing memory, then it may cause failure
| to happen sooner than it might otherwise.
| chrisseaton wrote:
| > you'll fail with a segmentation fault
|
| Who says you will?
|
| Nothing in the C specification says that.
| cryptonector wrote:
| The C specs assume unlimited (up to size_t) resources.
|
| Real life has no machines with 64-bit ISAs and 2^64
| bytes' worth of resources.
|
| Real life isn't portable. Who knew.
|
| If there's no guard pages, no segmentation faults, no OOM
| killers, etc., then an allocation checker will cause
| failure in much the same kinds of undefined ways as not
| having the checker.
|
| A segmentation fault _is_ what you can expect in real
| systems, and it 's absolutely better than undefined
| behavior.
|
| You can do this portably, in C. It may not do you any
| good, and it may waste CPU cycles and memory bus usage.
| But it shouldn't harm. If the alternative is to not use
| alloca() or VLAs, and to make it so you can statically
| know your program's resource needs (which is _very_
| limiting, and not really attempted outside contexts where
| that such limitations are very strongly desired), then do
| that.
| Joker_vD wrote:
| > A segmentation fault _is_ what you can expect in real
| systems,
|
| Nope. Just today I've seen a stack corruption bug that
| instead of producing a segmentation fault caused the
| process to go into uninterruptible sleep, somehow.
| chrisseaton wrote:
| You're saying 'if run on a system that does things in a
| way I expect then it works' but the point of the article
| is 'but what about if you want to run on a system that's
| not as you expect and you want to run pure C'.
|
| We all know what you're saying works in practice on most
| machines.
|
| But it's not pure C and it's not portable if you follow
| the spec.
| cryptonector wrote:
| It's portably assuming that you can cast an allocation's
| pointer to a pointer to unsigned char. It's like assuming
| you can memset(), since that's effectively what it's
| doing. In the absence of memory over-commit, this will
| have no effect beyond wasting resources. In the presence
| of over-commit -bounded stacks resemble memory over-
| commitment!- it will have the effect of committing the
| memory OR failing in whatever way the machine fails when
| resources are exhausted. However, in the latter case, and
| when the machine uses stacks for C function calls, the
| failure will be defined _by the machine_ rather than
| undefined (due to memory corruption).
|
| Also, in the absence of stacks, there's no "direction in
| which the stack grows", and any attempt to determine that
| direction will yield a nonsensical result, but doing
| memset() in one direction or the other will also then be
| equivalent anyways.
|
| EDIT: There is one non-portable thing here: the
| assumption that machines are more limited than size_t
| implies. However, that's a distinction without a
| difference for any 64-bit ABIs or larger because all such
| machines must be more limited than size_t implies. That
| there is any failure mode when resources are exhausted is
| a non-portable assumption is uselessly true. Not being
| able to assume defined resource exhaustion failure modes
| would be destructive -- that the C spec does not is
| irrelevant.
| khrbtxyz wrote:
| _determine stack growth direction by having one function
| call another with a pointer to a local variable in the
| former and then compare that to a pointer to a local
| variable in the latter._
|
| This is undefined behavior.
|
| See n1570.pdf, Annex J page 560, J.2 Undefined behavior
|
| Pointers that do not point to the same aggregate or union
| (nor just beyond the same array object) are compared
| using relational operators (6.5.8).
| cryptonector wrote:
| Comparing the uintptr_t cast of two disparate pointers is
| not undefined behavior. A comparison may not yield a
| portable result because the integer representation of
| pointers may not be as plain memory addresses.
|
| So such a simplistic stack growth detector may get the
| wrong answer, but it's not undefined behavior:
| char stack_grows_up(uintptr_t callers_local) {
| char my_local = (uintptr_t)&my_local > callers_local;
| return my_local; }
|
| EDIT: Ah, and uintptr_t is an optional type. If the
| platform doesn't have it (e.g., because it's a segmented
| memory architecture) then this won't work, so it's not
| portable, but aside from that it's not undefined
| behavior, it's just that the computation of stack growth
| direction may well be wrong.
| [deleted]
| shadowgovt wrote:
| In pure C, there is no stack abstraction. Technically
| speaking, the stack is an extremely common implementation
| detail shared by almost all C runtimes.
|
| There's no real way to answer this question at that layer of
| abstraction; the minute you add any library calls that deal
| with a stack at all (such as alloca), we're no longer talking
| about pure C.
| cryptonector wrote:
| Correct. Note that alloca() can be an alias for malloc() in
| stackless Cs. If the stack assumption is invalid, then a
| checker will not trip over a guard page, and it will at
| best trip over memory over-commit (which may still be
| desirable). If the guard page assumption is invalid, then a
| checker will cause failure just as surely as not checking
| (provided you intended to use the whole allocation
| anyways), and either way the failure will be undefined,
| therefore this is still portable C!
| chrisseaton wrote:
| I was going to reply that alloca is a standard function,
| but then I checked and it's actually not is it!
| cryptonector wrote:
| However VLAs _are_ in C99 and up. They 're rather similar
| to alloca() in that whatever mechanism you'd use to
| implement alloca() (automatic stack allocation, or heap
| allocation), you'd also use to implement VLAs.
| wahern wrote:
| Except alloca'd space is only dropped at the end of a
| function. VLAs are dropped at the end of a block. This
| makes a huge difference in loops as alloca can quickly
| exhaust the stack.
| cryptonector wrote:
| No real difference: for (size_t i = 1; i
| < WHATEVER; i <<= 1) { char vla[i];
| vla[0] = 0; vla[i - 1] = 0; ... }
|
| that is, you can still exhaust your stack trivially with
| VLAs.
|
| Sure, the two things have a different "API", but as far
| as allocating arbitrary amounts of memory goes, they both
| let you do it.
| wahern wrote:
| I'm not sure what your point is. You could replace the
| VLA with malloc in that example and exhaust your entire
| VM space.
|
| In the rare cases where VLAs might prove useful, e.g. a
| vector math library (IIRC, a major reason for VLAs was to
| help FORTRAN programmers transition to C), this
| difference absolutely does matter in reality.
| cryptonector wrote:
| Maybe I missed the point of your comment. Mine is that
| the scoping difference between VLAs and alloca() isn't
| relevant to the question at hand of how to protect
| against memory corruption due to large stack allocations.
| steveklabnik wrote:
| Notably, C11 moved alloca to an optional feature, not a
| required one.
| Someone wrote:
| > touch a byte every 4KB in the direction of stack growth
| before you use bulky stack allocations
|
| I don't think you can portably (1) do that. You have to
| allocate before you can touch those bytes.
|
| Also, if you do _before_ , I don't think there's a guarantee
| that you won't hit some volatile memory that interacts with
| hardware, and of course, there's no portable way to get the
| stack pointer, in the first place (if only because it,
| conceptually, doesn't exist in C)
|
| (1) specific compilers can safely do that because they know how
| they are implemented, and many compilers are fairly similar, so
| one can make this "quite portable", but not really portable.
| cryptonector wrote:
| One way is to allocate first. alloca(1GB) typically isn't
| going to cause a SEGV, so you can check the allocation after
| making it. With VLAs you have to make sure (this is not
| portable) that the compiler will put your loop iterator state
| variables ahead of the VLA.
|
| If you can assume a linear stack (which is kinda the whole
| point of this discussion) then you can call a function that
| will repeatedly alloca() in small chunks and touch a byte and
| return on success or SEGV on failure, and then when it
| returns it will release that memory but then you know you can
| safely alloca() that much.
| Spivak wrote:
| I don't get Alpine's reasoning here. In order to be portable does
| every C program have to work with any stack size limit? How could
| that even work? Without a guaranteed minimum there's nothing to
| do except "portable to common platforms-ish."
| gpderetta wrote:
| Do you know about the shortest fully conforming C compiler?
| #include <stdio.h> int main(int argc, char*argv[]) {
| printf("ERROR: resources exhausted!"); return -1;
| }
| jeff-davis wrote:
| How does something like Ada help you measure and control stack
| usage?
| nagisa wrote:
| Measuring of stack requirements is typically implemented by the
| way of walking the call graph and adding up functions' stack
| sizes (for usual function these are well known at compile
| time). There are "only" a couple problematic cases where this
| algorithm fails: recursion, `alloca` and the likes, and
| indirect calls.
|
| For all of these problems there are more complex analyses that
| can be done to determine e.g. maximum recursion depth or
| possible functions that an indirect call could resolve to. And
| failing all of these there is usually a way for an engineer to
| override or provide this information.
| trasz wrote:
| What's the point of a low stack limit, apart from places like the
| kernel?
| m_eiman wrote:
| ... and embedded, of course. The main problem is that there IS
| a limit, and you have no idea what the limit is. The only way
| you find out is when you crash.
| kevin_thibedeau wrote:
| > The only way you find out is when you crash.
|
| You can fill the stack region with sentinel data and
| periodically scan it to warn of impending doom.
| trasz wrote:
| There is - in embedded (also kernel), but not in general
| purpose systems, like the Unix userspace. And the same piece
| of software very rarely runs on both. So, again, what's the
| point of this exercise?
| StillBored wrote:
| People have differing opinions of "embedded" but I would
| claim that most embedded uses cases the stack is actually
| specified by the programmer via linker scripts and/or
| manually setting it up during early bootstrap. Plus, its not
| exactly hard in most of these environments to write target
| and/or compiler specific macros which detects top of stack
| and can runtime check it via DEBUG() like macros or modifying
| function entry macros. None of this is of course portable,
| but neither is code designed to run in 1-16k of RAM and a few
| more K of flash.
|
| OTOH, there seem to be a lot of "we don't control anything"
| IoT devices which are random overly powerful processors with
| random uboot+linux kernels+rootfs's running
| my_first_c_program.c level applications. All of it tossed
| together with very little engineering effort, and frequently
| less low level knowledge, and left to rot because its
| basically unmaintainable, even if it happens to be the 1% of
| devices that can get over the air updates.
| formerly_proven wrote:
| gcc's -fstack-usage works well, at least for the kind of C
| I write on small microcontrollers. You still have to add it
| up manually across the callgraph and of course take ISRs
| into account.
| ridaj wrote:
| Any environment with restricted memory resources compared to
| needs, and a need for reliable behavior
|
| Embedded applications such as automotive, or industrial control
| come to mind (or even consumer electronics, with less life-or-
| death consequences possible)
| jameshart wrote:
| A microcontroller programmer says: "What kernel?"
| JoeAltmaier wrote:
| Time to abandon stack as a 'display'? That's why they overflow
| (mostly) - lots of large items or lots of recursion and local
| variables.
|
| Also having the addressable method arguments in the same space as
| the return address, allows a whole host of other bugs and cracks.
|
| I have long thought, the return stack should be a memory region
| un-addressable by the code except via call and return.
|
| Local method arguments could then revert to ordinary allocations
| or some other check-able allocation, and have normal error
| recovery.
| QuadrupleA wrote:
| Never thought about it but yes, what a huge (and easily
| implementable) oversight - stackSize(), stackUsed(),
| stackTopPtr() would be a few asm instructions in most cases.
| PerkinWarwick wrote:
| It seems to me that in the case of the x86, the virtual memory
| system could detect overrunning the end of the stack and
| announce it to you.
|
| The meta-problem is what to do with the information.
| masklinn wrote:
| A clean / clear error would already be better than just
| segfaulting with no indication (that's why e.g. Ruby or
| Python have explicit stack depth check and will raise an
| exception hopefully before they run out of C stack).
| vasama wrote:
| There is no useful way to employ these functions. It's
| equivalent to asking if a mutex is locked. That piece of
| information is worthless.
| OskarS wrote:
| What are you talking about? For lots of functions, it is
| perfectly possible to statically determine a maximum stack
| size (not all functions obviously, that would be the halting
| problem). An trivial example would be a function that calls
| no other functions, but you can imagine others: functions
| with static call trees (i.e. no function pointers), don't use
| VLAs or alloca, and that are not recursive (mutually,
| indirectly, or otherwise) will have a static limit on
| possible stack usage.
|
| If you've formally verified that function X uses at most Y
| pages of stack, and the remaining stack is smaller than that,
| you could return an OUT_OF_STACK error instead of calling it.
| Most libraries/functions wouldn't need this, but there are C
| libraries which place an extraordinary importance on
| reliability, and it seems like it would be a useful feature
| for them.
|
| There might be other reasons why these functions shouldn't be
| in the standard (or are impossible on some platforms), but
| "the information is worthless" is not one of them. Of course
| this information can be useful.
| adrianN wrote:
| "I can't complete this recursive algorithm because I ran out
| of stack at this particular step" is a much nicer error
| message than "segmentation fault". For that it would be
| useful if you could check remaining stack space before
| recursing.
| sumtechguy wrote:
| I could see that sort of thing being very useful for a
| debugger.
| notact wrote:
| If writing a recursive algorithm on arbitrary input, check
| remaining stack size upon recursive function entry. If space
| is depleted, allocate a new fiber (aka a brand new clean
| stack), switch to it, and resume. Switch back to calling
| fiber on stack unwind.
| 10000truths wrote:
| If you're writing a recursive algorithm on arbitrary input,
| you should be using a stack container to store your state,
| not the call stack.
| Y_Y wrote:
| Could you explain this please?
| lifthrasiir wrote:
| As mentioned in the OP though, there is zero guarantee about
| the stack usage by library functions.
| PerkinWarwick wrote:
| ...or much of anything else. I've seen commercial libraries
| monkey with processor state in some nasty ways.
| OskarS wrote:
| No, but that's a pretty weak argument not to have them:
| they're better than nothing. Not all code uses libraries. And
| if it was part of the C standard, libraries could use them as
| well: libraries could fail gracefully instead of blowing the
| stack. Many libraries already do this when malloc fails,
| after all.
| nybble41 wrote:
| There is also no guarantee that the stack is contiguous--see
| the "-fsplit-stack" GCC option, or search for "segmented
| stacks". Libraries for coroutines and delimited continuations
| also routinely segment the stack. A simple API based around
| easy access to "top of stack", "bottom of stack", and
| "current stack pointer" addresses is thus non-universal.
| There may not be a meaningful limit to the amount of stack
| available, aside from available heap memory or address space.
| And this can change over time in response to actions other
| than allocating or freeing memory on the stack.
| titzer wrote:
| I just wrestled with this. You cannot effectively handle stack
| overflow situations if you use the system stack on Linux, because
| stack segments are in magical "MAP_GROWSDOWN" regions. These
| regions are managed by the kernel itself. There is a base address
| and a size, and the kernel initially only maps some small number
| of pages near the upper end (high addresses). Faulting accesses
| to addresses that are not mapped, but in the region, just below
| what is currently mapped cause the kernel to map in new pages
| automatically, "growing down".
|
| The customary way to handle stack overflow gracefully in userland
| is to install a SIGSEGV handler and check the faulting address to
| see if it is near the stack segment end, handling the signal on
| an alternate stack. Except that determining if the access was in
| the stack segment requires knowing the stack segment start. The
| signal API doesn't allow querying that; you'd have to query the
| process's mappings by other means. It's also not robust because a
| program with big frames might just stride too far and confuse the
| kernel's automatic growing, causing spurious SIGSEGVs when in
| fact, there is stack remaining.
|
| I found it was just easier to allocate my own stack segment and
| immediately switch to it and just leave the loader-provided stack
| segment alone. Virgil does this in the first 50 or so
| instructions upon entry.
|
| This also makes stack overflow very deterministic (only depends
| on the program) and controllable--there is a compiler option to
| set the stack size in KiB. There's no need for the compiler to
| insert strided accesses either, as the whole segment is mapped,
| and you can have more than a single guard page at the beginning
| (think: 1MiB guard region).
|
| Stack segments in C (and Linux) are madness if you use very much
| and want anything to be robust to overflow.
| matheusmoreira wrote:
| Nice to see you and the Virgil project again!! Really
| interesting to read about your Linux systems programming
| experiences. I learned a lot reading your source code but this
| bit about stacks escaped me.
|
| Also I see x86_64 Linux targets are supported now and that's
| awesome. Do you plan to support ARM architectures?
| titzer wrote:
| Hey, thanks! I wrote about 1/3rd of an arm32 backend but it's
| languished because I've focused Virgil energy on only those
| things I need for my Wizard Engine project. Probably the next
| port I'll do will be arm64 to run on the M1 and Raspberry Pi,
| but that'll be some time in the future.
|
| Nice to hear someone has read the code. Glad you like it.
| Cheers :)
| whoisburbansky wrote:
| Tangent because this the first I've heard of Virgil. From the
| main README, what do you mean when you say you would like
| programs written in Virgil to be dependency-free?
| titzer wrote:
| You can write programs that depend on no libraries at all,
| not even an SDK. They just need a compiler binary (and a
| machine to run that--x86-linux, x86-darwin, jvm, or even a
| Wasm engine) to compile, and compile to a binary that also
| only needs a machine to run--no shared objects, external
| jars, etc.
|
| The language has no builtin classes, interfaces, or
| functions. In contrast to say, Java, where the language (and
| compiler) know about java.lang.Object, java.lang.String, etc.
| The compiler knows how to _compile_ classes, but it doesn 't
| know the name of any built-in ones.
|
| Virgil is really designed to build the lowest level of
| software. Thus all my focus on low-level things like calling
| the kernel and mapping stacks, etc. The only thing that
| depends on more than just the kernel is the test suite, which
| uses bash and a little bit of C.
| whoisburbansky wrote:
| Ah, that's really cool. The clarification about it
| targeting the lowest level really helps explain the
| motivation, thanks, and all the best!
| jeffbee wrote:
| Managing your own pthread stacks can make thread more costly
| though, can't it? Use of pthread_set_stack requires stack_size
| at least PTHREAD_STACK_MIN which on glibc linux tends to be
| 16384, four times larger than the initial cost of a demand-
| paged stack map. I understand that on musl this is 2048, so
| that's nice.
| titzer wrote:
| Yeah, I think there would be some overhead, because it
| doesn't look like pthread_create allows specifying a stack
| segment. But I wouldn't be using pthread_create (or any C
| library for that matter), because Virgil goes all the way
| down to kernel; there is no C environment.
|
| I only experimented briefly[1] with getting Virgil threads to
| run on Linux. You need to call SYS_clone, which forks a
| lightweight process (i.e. shares all virtual memory), and
| specify a stack segment. The segment in question would of
| course, be mapped by Virgil and not have MAP_GROWSDOWN, so it
| could be made robust to stack overflow along the lines above.
|
| [1] https://github.com/titzer/virgil/blob/master/apps/Multi/M
| ult...
| stedolan wrote:
| The way to handle stack overflow gracefully on Linux is to
| check in your signal handler whether the faulting address was
| between the beginning of the stack and the current stack
| pointer. You can read the stack pointer register from the
| signal handler's third parameter.
|
| This is also how the kernel grows the stack. When there's a
| fault, it compares the faulting address to the stack pointer
| register. This way, big frames don't confuse the automatic
| growing. (On Windows, by contrast, stack growth is detected
| using a single 4k guard page, so compilers must be careful with
| big frames and insert strided accesses)
| jeffffff wrote:
| i think maybe you can do it with userfaultfd, performance won't
| be as good as using a signal handler because of the context
| switch but you aren't restricted in what you can do in in the
| handler thread like you are with a signal handler.
| zwieback wrote:
| Kids these days - let grandpa tell you a story about writing OS/2
| 1.x device drivers pre-virtual memory. Stack was obsessively
| monitored, especially in interrupt handlers where you were called
| with just a few hundred bytes of stack or so. If running low you
| had to manually switch to your own stack and keep track which one
| you were on.
|
| Excellent point being made about C, though. Amazing how we've
| lived with this limitation for so long. I get that C is supposed
| to be portable assembler but the cases where stack wasn't used
| for return address and local var frames are rare so some support
| should have been added early on.
| scoutt wrote:
| Wanna feel that thrill again? Do some bare-metal embedded.
| zwieback wrote:
| Yeah, I do that quite a bit actually, when I write code for
| smaller embedded systems I get the same rush I got back on my
| old Apple ][.
| silon42 wrote:
| I believe even later, userspace code had to grow stack page by
| page, no skipping.
| skitter wrote:
| What are the advantages of low maximum stack size limits (given
| growable stacks)?
| jart wrote:
| Being able to build web servers with 10,000 threads w/ clean
| imperative code rather than event-driven yo-yo pattern finite
| state machines. It's what Google does. They limit stack size to
| 60kb for just that reason.
| PerkinWarwick wrote:
| One obvious area is in small embedded systems.
|
| It's surprising how hard it can be to track down a stack-caused
| bug when you've got a bunch of interrupts banging on it.
|
| Generally it's a little unnerving to just pray that you've got
| enough stack given it's hidden usage and the possibility for
| things like recursion. Kind of a shame considering what a slick
| way it is to deal with memory leak avoidance.
| [deleted]
| DrJokepu wrote:
| This is speculation, but maybe it catches unintended infinite
| recursion earlier? I'm willing to bet that the vast majority of
| stack overflows happen due to programming errors rather than
| genuine uses running out of stack.
| projektfu wrote:
| This is the best answer I could think of. We have to assume
| the question is about Alpine Linux, not about embedded
| systems per se. Small thread stacks run out of space quickly
| (fail fast), help prevent a malicious process from gobbling
| up lots of VM, and force you to program in a style that
| doesn't use much stack space, although that's questionable
| because most memory leaks are heap-based.
|
| It's worth noting that the limits are adjustable on a per-
| package basis in Alpine based on the compiler-time link
| options.
| Joker_vD wrote:
| This also makes Chicken Scheme basically guess the stack size: it
| guesses it's 1 MiB on 64-bit platforms, and 256 KiB on 32-bit
| ones. Of course, you can also manually override it, at your own
| risk.
___________________________________________________________________
(page generated 2021-09-28 23:01 UTC)