[HN Gopher] Where the top of the stack is on x86 (2011)
___________________________________________________________________
Where the top of the stack is on x86 (2011)
Author : cassepipe
Score : 73 points
Date : 2021-05-07 15:51 UTC (7 hours ago)
(HTM) web link (eli.thegreenplace.net)
(TXT) w3m dump (eli.thegreenplace.net)
| rdhatt wrote:
| Remember:
|
| "The x86 architecture is the weirdo" --Raymond Chen
|
| https://devblogs.microsoft.com/oldnewthing/20040914-00/?p=37...
|
| https://devblogs.microsoft.com/oldnewthing/20130320-00/?p=48...
| Narishma wrote:
| In this case, it's not a weirdo, is it? I don't know of any
| popular ISA with a stack that grows upward.
| colejohnson66 wrote:
| It also doesn't help that debuggers show the call stack with the
| current function at the top and the entry point at the bottom
| (the opposite of how it works in memory).
| TrianguloY wrote:
| It's easier to understand if your diagram show the starting
| memory positions (address 0x00000000) at the top, and final
| positions (address 0xFFFFFFFF) at the bottom. This way the top of
| the stack is, precisely, at the top.
|
| It doesn't seem to be an standard and there are almost the same
| number of diagrams with memory from 0 to F than from F to 0. Some
| of them are more useful in specific contexts, but most are simply
| drawn the way the author is used to.
|
| Personally I prefer increasing numbers go down, like lines in a
| file, numbers on a numbered list, timestamp on a log, etc. Having
| the highest memory value 0xFF on the top seems...odd.
| ellis-bell wrote:
| but then the heap would grow down in such a diagram...
|
| btw my understanding is that the heap came first in the logical
| development of C and other systems programming. so it made
| sense to have .text and other program data at the lowest
| virtual memory addresses and then have the heap grow towards
| higher memory addresses.
|
| then when the stack became a thing it had to grow down...
| monocasa wrote:
| The stack might have actually come before the heap in
| history, and actually proceeds any systems programming
| language that was a higher level than assembly. They were
| seen in the Z4, and Turing wrote about them in 40s.
| anyfoo wrote:
| We call things "base of memory" and "top of memory" though,
| with the latter one being at the highest address of memory.
| efaref wrote:
| I never understood why people ever put memory upside down on
| diagrams.
|
| In fact, I think the only time I see diagrams with F at the top
| and 0 at the bottom is in explanations of the stack, or in
| explanations of how "the stack is so weird it grows the wrong
| way!".
|
| When describing, e.g., a memory map for a custom ASIC, you
| would _always_ put the low addresses first and the high
| addresses later. That 's just how numbers work. The "stack
| grows the wrong way" issue seems to be an invented problem.
| toast0 wrote:
| If you're writing a diagram for little endian, you end up
| with the bytes going right to left, if you have multibyte
| values, so you may as well read bottom to top, right to left.
| billforsternz wrote:
| Exactly. After all we write programs by starting at the top of
| the screen and working our way down, and happily of course
| program execution increments in the same direction. So small
| addresses at the top and big addresses at the bottom is the
| most natural way of presenting any memory map.
| albntomat0 wrote:
| That way also makes reading/writing a buffer go from top to
| bottom, similar to reading/writing normally.
| moonchild wrote:
| From a footnote of TFA:
|
| > You may try to fix the confusion by viewing memory with its
| low addresses at the top and high addresses at the bottom.
| While this would indeed make stack movement more natural, it
| would also mean that increasing some memory address would take
| it _down_ in the graphical representation, which is probably
| even more counter-intuitive.
|
| Ultimately, my approach is to avoid directional terms such as
| 'top' and 'bottom' or 'high' and 'low' in the first place; they
| only cause confusion. Prefer 'greater' or 'smaller' addresses.
|
| (Similarly 'left' and 'right' when applied to bits, which gets
| especially confusing if endianness is involved; prefer 'more
| significant' and 'less significant'.)
| roelschroeven wrote:
| > Having the highest memory value 0xFF on the top seems...odd
|
| Dunno ... the highest being on top feels very normal ... .
| Isn't that more or less the definition of "highest" and "top"?
|
| We have conflicting conventions in all kinds situations:
|
| Numbers increase towards the top in a class Cartesian
| coordinate system, in the numbering of floors in buildings, on
| calculator keypads, ...
|
| Numbers increase towards the bottom in coordinate systems in
| many computer graphics contexts, in numbered lists, on
| telephone keypads, ...
|
| Sometimes these conventions meet and clash and there is no one
| right way to handle that.
| [deleted]
| azhenley wrote:
| The title should be: Where the top of the stack is on x86 (2011)
| acchow wrote:
| Just flip your diagram upside down and there's no confusion.
| amelius wrote:
| Shouldn't there be two stacks? One to contain stuff like return
| addresses, and the other to contain data? And the stack
| containing addresses in a separate address space that can only be
| accessed through stack manipulation/return instructions.
| moonchild wrote:
| This has been suggested, including by the proposed mill and
| forwardcom architectures. It's also the programming model used
| by stack-based languages such as forth.
|
| Another nice side effect aside from security is that it
| simplifies the return predictor.
| jonsen wrote:
| And where does a branch go? It leaves the stem of consecutive
| memory numbers into nowhere and mysteriously _comes back_ to the
| stem at an arbitrary number.
| glhaynes wrote:
| Easy to get confused similarly about trees that grow down in
| computer scientists' diagrams versus growing up in nature.
| eatonphil wrote:
| My confusion about statements like this are understanding where
| the convention begins. Isn't it libc that sets up the stack?
| Couldn't it decide to set up the stack starting from the bottom
| of memory and put heap allocations at the top? I guess
| instructions like PUSH/POP and derivatives wouldn't be useful
| anymore so you'd have to recreate them. So I guess that means
| that the convention starts at the processor? You could store
| memory in the opposite way but it would just be slower since you
| wouldn't be able to use built in operations. Do I have that
| right?
| ajross wrote:
| > Isn't it libc that sets up the stack?
|
| In coordination with the kernel, yes. Different OSes do this
| differently, but many will set up the stack pointer for you.
| Linux historically has not, the new thread keeps the parent
| stack pointer and has to implement careful entry code to switch
| it without messing up the parent thread.
|
| > Couldn't it decide to set up the stack starting from the
| bottom of memory and put heap allocations at the top?
|
| In theory, but as you mention the architecture makes some clear
| demands here. On x86 CALL/RET and PUSH/POP both require a
| SP/ESP/RSP register with free space below it. Likewise x86
| interrupts are handled on the stack and so you need to keep the
| stack pointer initialized for them (modern CPUs will
| automatically switch the stack for you from whatever user code
| is running, but they still need to switch it to a grows-down
| area you initialized for them).
|
| Broadly, sure, you can come up with a software abstraction that
| acted as a "stack", but it would have to be in addition to the
| CPU stack you already need. In effect you'd have to burn a
| register for this extra stack pointer, which has performance
| implications.
| toast0 wrote:
| > Likewise x86 interrupts are handled on the stack and so you
| need to keep the stack pointer initialized for them (modern
| CPUs will automatically switch the stack for you from
| whatever user code is running, but they still need to switch
| it to a grows-down area you initialized for them).
|
| If your kernel is re-entrant, you need to keep platform stack
| conventions in the kernel, or an interrupt (or exception)
| during the kernel will overwrite your backwards stack.
|
| I don't know for sure, but I think aignal handing in user
| processes runs on the user stack too (but I could easily be
| wrong).
|
| If you really wanted, you could use the platform stack for
| call/ret only and have a separate data stack with whatever
| conventions you like.
| monocasa wrote:
| > Linux historically has not, the new thread keeps the parent
| stack pointer and has to implement careful entry code to
| switch it without messing up the parent thread.
|
| Well, for calls to fork(2) and clone(2), sure, but the kernel
| will setup a stack for you on exec(2). It has to have
| somewhere to stick the command line args, env, and some other
| extra bits of information like a random seed.
| eatonphil wrote:
| That's helpful! Then why is it that libc is the one setting
| up the stack if this is a convention basically required by
| the processor?
| ajross wrote:
| Because the runtime linker (e.g. ld-linux.so) is
| implemented in the C library, as is the user-callable
| implementation of pthread_create() or whatever[1]. It's
| definitely the application's job to decide on its own
| memory layout in any case. The kernel doesn't tell you
| where your stack should be, it's your address space.
|
| [1] vs. the clone() Linux syscall, which isn't really
| useful to regular code because of the complexity.
| astrobe_ wrote:
| Yes, you clearly see that when you implement a stack-based
| virtual machine (as in bytecode interpreter, not VMWare and
| the likes, although it is on the principle the same thing) :
| your bytecode (or whatever technique you are using except
| perhaps JIT) for push/pop/call/ret must agree on how to use
| the stack.
|
| It is particularly the case in a language like Forth, which
| has two stacks that can be manipulated directly by the user
| (actually the user _has_ to if they want to get anything
| done...) : one for the parameters /arguments, and one for the
| return addresses. This deviates from the usual single
| hardware stack processors. Forth processors (there are still
| some in operation) fully support those two stacks.
|
| When you implement a Forth interpreter in assembler, you
| generally use the hardware stack for either the parameters or
| for the return addresses, while the other is managed manually
| (usually using another addressing-capable register such as
| EBP, ESI or EDI on x86).
|
| If for instance you dedicate one segment (either in the x86
| sense or in the common meaning), you typically use the push
| down "hardware" stack that you initialize at the top of
| memory, while the return addresses stack is a "push up"
| software-managed stack and initialized at the bottom of the
| memory.
| ylyn wrote:
| Yes, more or less. I'm not sure if push and pop is
| significantly faster than a mov and add. But there are also
| instructions like leave and ret that follow this same fully
| descending stack convention. So if you deviate from that you
| can't use any of those.
|
| It's worth mentioning what fully descending means here. I'm
| surprised the article didn't mention it.
|
| "Fully" means that the stack pointer points to the last entry
| on the stack (the top). "Empty" means it points to the next
| entry just after the last (top) entry. That is, it points to
| where the next push would be written to.
|
| Anyway, you can have whatever calling convention you want,
| really. All you need is a way to pass arguments, return values,
| and to know where to return to. You could have a linked list of
| frames allocated by malloc() with the head pointer in EBP, for
| example. Say the return address and caller frame pointer is
| stored in the start of the frame. Then you would return by mov
| ebx, [ebp]; mov ebp, [ebp+4]; jmp ebx. Or something lile that.
| This is a pretty ridiculous calling convention though, but it'd
| work.
| AshamedCaptain wrote:
| In ARM you do have pop/push (and call/ret) instructions that
| can go in either direction, and most platforms I know still
| have a stack that grows to 0.
|
| TBH, dunno where the difficulty is.
| monocasa wrote:
| Sort of. The ARM stuff was allowed other uses because they
| were pretty generic load and store multiple instructions with
| a lot of increment/decrement options to make up for the
| original Acorn's lack of a DMA engine.
|
| On anything resembling a recent ARM core though, using the
| stack pointer register with anything other than a descending
| stack has you falling off the perf wagon as it's backed by a
| hardware stack engine.
| JoeAltmaier wrote:
| There is an argument for making the call/return stack a separate
| non-addressable region from argument passing. So a malicious app
| can't overwrite return addresses and execute arbitrary code.
| Intel considered this early on, and rejected it.
|
| Why? Because of Fortran. There are (were?) cases where a Fortran
| app would reach back on the stack to retrieve values from before
| the last call or some such. Rather than find some other way of
| making that work for those folks, the separate-return-stack idea
| was shelved.
|
| And we all live with this debacle for decades.
| stevemk14ebr wrote:
| Checkout the shadow stack used for CET (control flow
| enforcement technology). It is literally this idea
| Someone wrote:
| I don't understand that argument. All function arguments still
| would be on a single stack (just as in Forth). That would only
| be a valid argument if some FORTRAN code inspected the return
| address, for example to behave differently, depending on the
| caller.
|
| I would guess it's more because early systems had small address
| spaces. If your heap grows up from the bottom of memory, and
| one stack grows down from top of your address space, how do you
| know where to place that second stack, especially in a system
| with, say, a 64kB address space?
| JoeAltmaier wrote:
| Yes, exactly. Some FORTRAN code did exactly that.
|
| And the stack could go in a page not addressable by general
| purpose instructions, save call and return instructions (and
| perhaps some kernel debug).
|
| Some limited-RAM embedded devices have dedicated stack RAM.
| But my point was about Intel x86
| monocasa wrote:
| If that was an issue, they managed to fix it decades ago.
| Itanium had separate return address and data stacks, and
| scientific computing was one of it's few competitive
| strongsuits.
| titzer wrote:
| You cannot address the call stack in WebAssembly, thus there
| can be no stack-smashing attacks (that overwrite return
| addresses). For languages that pass pointers into the stack,
| they must use a "shadow" stack allocated as a separate region
| and managed with an explicit stack pointer.
| Denvercoder9 wrote:
| Is there a reason why a stack growing towards the beginning of
| memory is better than one growing towards the end of memory, or
| is that just an arbitrary choice the x86 designers made?
| vlmutolo wrote:
| I wonder if it's at all related to the reasoning in this post
| about writing a bump allocator.
|
| https://fitzgeraldnick.com/2019/11/01/always-bump-downwards....
| bonzini wrote:
| It's mostly because it allows you to put code at a fixed
| address at the beginning of the memory and the stack at the
| opposite end. That lets the OS place the stack at a constant
| address (for MS-DOS COM files the stack pointer starts at
| 0xFFFE, with a zero already pushed so that a RET instruction
| exits the program; a similar convention existed in CP/M).
| pcwalton wrote:
| I don't know if this is the real reason, but it's more natural
| for stack-relative addressing to have the stack grow downwards,
| because you can write e.g. [rsp+10] instead of [rsp-10].
|
| Apparently on PA-RISC (hppa) the stack grows the other way, so
| it is arbitrary.
| tom_mellior wrote:
| Why is rsp+10 more natural? If rsp is the top of the stack,
| everything is below it, and "below" and "minus" go well
| together.
|
| Though I would agree that for the topmost value specifically,
| rsp+0 feels more natural than rsp-4 or rsp-8.
| xanathar wrote:
| Bonus (?): the stack going down rather than up means that
| overflowing a stack-allocated buffer will overwrite the contents
| that are already on the stack (as they have a higher address than
| the last item in the buffer), likely changing the return address
| of the function and thus making arbitrary code execution a breeze
| (see: https://en.wikipedia.org/wiki/Return-to-libc_attack).
|
| Most operating systems and standard libraries have checks and
| countermeasures to make this a lot harder nowadays, but still.
| ajross wrote:
| In fact the mitigations have been so effective that in practice
| stack smash attacks are mostly historical at this point. But
| yes: having the direction of "natural memory copies" be the
| same direction as "back into the caller memory on the stack"
| was clearly a really bad mistake in hindsight.
|
| I actually don't know where it came from. It was true in
| original PDP-11 Unix for sure: you had the program text
| followed by a grows-up heap, with the grows-down stack placed
| at the top of the program segment. Interestingly PDP-11
| addressing was general enough to have implemented a grows-up
| stack, so this is clearly a mistake Unix could have corrected.
| I just don't know if this was the original use of the
| convention or if it inherited it from elsewhere.
| vlovich123 wrote:
| Interestingly, per my reading of [1], these attacks are now
| easily available again in WASM due to misplaced confidence in
| the security of the language. It's a fantastic paper.
|
| [1]
| https://www.unibw.de/patch/papers/usenixsecurity20-wasm.pdf
| xanathar wrote:
| I _THINK_ that the reason is: 8086 inherited it from the 8085
| which inherited it from the 8080. The next parent in line
| would be the 8008, but that has a small call stack in the CPU
| registers rather than RAM, so the ancestor would be the 8080.
|
| The 8080 had 64KB address space. I bet the rationale was to
| partition memory so that classic memory usage goes upwards
| from 0000 to FFFF and the stack goes downwards from FFFF to
| 0000. This removes the need to define a boundary between the
| two beforehand.
|
| Of course this is totally speculation on my part, I might be
| super wrong.
| bonzini wrote:
| The 8008 instruction set was not designed by Intel IIRC, so
| the lineage ends at the 8080.
|
| Independently the 6502 also had a downward stack. I think
| the only modern machine with am upward-growing stack is the
| HP PA-RISC.
___________________________________________________________________
(page generated 2021-05-07 23:00 UTC)