hngopher.com

       [HN Gopher] i386 Assembly Language trick for storing data in .text
       ___________________________________________________________________
        
       i386 Assembly Language trick for storing data in .text
        
       Author : ingve
       Score  : 127 points
       Date   : 2023-11-09 06:47 UTC (16 hours ago)
        
 (HTM) web link (ratfactor.com)
 (TXT) w3m dump (ratfactor.com)
        
       | majke wrote:
       | Yeah, in i386 syntax there is no way to address EIP directly.
       | poping EIP from call is a common trick.
       | 
       | In newer processors there exists a cache for return addresses
       | Return Stack Buffer (RSB).
       | 
       | But there is a penalty for doing call and never doing ret.
       | 
       | From Intel's Optimization Reference Manual:[2]
       | 
       | "The return address stack mechanism augments the static and
       | dynamic predictors to optimize specifically for calls and
       | returns. It holds 16 entries, which is large enough to cover the
       | call depth of most programs. If there is a chain of more than 16
       | nested calls and more than 16 returns in rapid succession,
       | performance may degrade.
       | 
       | [...] To enable the use of the return stack mechanism, calls and
       | returns must be matched in pairs. If this is done, the likelihood
       | of exceeding the stack depth in a manner that will impact
       | performance is very low.
       | 
       | This trick is also what I understand retpolines are about:
       | 
       | Citing kernel doc
       | 
       | The kernel can protect itself against consuming poisoned branch
       | target buffer entries by using return trampolines (also known as
       | "retpoline") for all indirect branches. Return trampolines trap
       | speculative execution paths to prevent jumping to gadget code
       | during speculative execution. x86 CPUs with Enhanced Indirect
       | Branch Restricted Speculation (Enhanced IBRS) available in
       | hardware should use the feature to mitigate Spectre variant 2
       | instead of retpoline. Enhanced IBRS is more efficient than
       | retpoline.
       | 
       | retbleed https://lwn.net/Articles/901834/
       | 
       | [2] https://discourse.llvm.org/t/is-pic-code-defeating-the-
       | branc...
        
         | pjc50 wrote:
         | > there is no way to address EIP directly
         | 
         | In general this is a "thing" on pipelined processors,
         | especially once you have out-of-order, because "current
         | instruction" starts to smear out across a bunch of
         | instructions. But for CALL the processor has to dump the
         | pipeline and pick a specific return address to put on the
         | stack.
         | 
         | (unless you're MIPS with the branch delay slot nonsense)
        
           | planede wrote:
           | My impression was that pipelined processors pipeline across
           | CALL instructions. They even speculate across indirect calls.
           | How you describe this is as if there was a significant
           | penalty to reading EIP or calling a function on a pipelined
           | processor, but I don't think that's true.
        
           | phire wrote:
           | Don't forget, the CPU needs to know the current PC for
           | relative branch instructions too.
           | 
           | Relative branch instructions and calls are pretty common, so
           | flushing the entite pipeline just to get the current PC would
           | be way too expensive.
           | 
           | So pipelined CPUs actually implementate extra resources to
           | track the current PC of every single instruction (at least as
           | far as the execute stage) just so they can get the PC
           | rapidly.
           | 
           | And you will find almost every single modern CPU has an
           | instruction to copy the current PC to a register.
        
           | titzer wrote:
           | > has to dump the pipeline
           | 
           | Modern Intel processors have a "stack engine" (similar to the
           | return stack buffer) that speeds up access through RSP. But
           | regardless, there's no need to dump any pipeline; a call just
           | has an implicit store to memory and the RSP gets updated
           | (register-renamed, really). Calls are very, very fast these
           | days.
        
           | adrian_b wrote:
           | There was no way in 80386, except by pushing EIP on the stack
           | within a CALL.
           | 
           | In 64-bit Intel/AMD CPUs, the instruction LEA (load effective
           | address), with an address relative to RIP, can be used to
           | save the instruction pointer in a register, then a jump to
           | any address will arrive there providing the old instruction
           | pointer in the register, ready to be used without popping it
           | from the stack.
           | 
           | In all modern pipelined CPUs there are multiple copies of the
           | instruction pointer, one that runs ahead providing the
           | instruction fetch addresses, and an older value that provides
           | base addresses for relative jumps and for relative loads or
           | stores.
        
           | ajross wrote:
           | > In general this is a "thing" on pipelined processors
           | 
           | It's actually not? Or, indeed, it's a hard problem. And
           | modern designs certainly don't expose the instruction counter
           | as a general register anymore (ARM32 PC is the last of its
           | kind).
           | 
           | But having an IP-relative data addressing mode is a critical
           | feature for any reasonable modern device, for exactly the
           | reasons detailed here: you want constants stored with your
           | compiled code without having to incur overhead (c.f. the
           | linked article, or the GOT/PLT indirection in shared
           | libraries, etc...) to get it.
        
           | sweetjuly wrote:
           | This is generally not true. While OOO processors generally
           | don't love passing PC around (and they usually don't since
           | most instructions won't need it), both RISC-V, ARMv7, and
           | ARM-v8 all provide mechanisms to access PC:
           | 
           | ARMv7: you can just use PC directly as a register
           | 
           | ARMv8: adr Rd, #0 will move PC into Rd
           | 
           | RISC-V: auipc Rd, 0 will move PC into Rd
           | 
           | Usually what implementers tend to do to avoid piping around
           | massive 64-bit addresses for no good reason is that they form
           | fetch groups (a linear series of instructions starting at a
           | base PC) and then number instructions both by their group ID
           | and their position in the group. The base PC for each group
           | is then stored in an array indexed over the fetch group. If
           | an instruction later needs its PC (such as to perform a
           | branch or because it caused an exception), a request is sent
           | to the fetch group array to fetch the base PC for the
           | operation which can then be used in conjunction with the
           | offset to reconstruct the original PC.
        
         | im3w1l wrote:
         | > In newer processors there exists a cache for return addresses
         | Return Stack Buffer (RSB). But there is a penalty for doing
         | call and never doing ret.
         | 
         | I think you could play nice by actually doing the ret instead
         | of poping EIP. Something like                 GET_EIP:
         | mov eax, [esp]       ret
         | 
         | And then                 call GET_EIP
        
       | nynyny7 wrote:
       | I don't get the purpose, at least of his minimal example. The
       | author says he wants to make his code position-independent, i.e.,
       | so that it can be executed from everywhere in memory (without
       | relocation). But that is defeated by the...
       | 
       | mov edx, print
       | 
       | ... in the example.
        
         | yenz0r wrote:
         | Yeah the example wont work, but since it's only used for
         | getting the length of the string its an easy to fix to instead
         | use pascal/counted strings with a length prefix byte.
        
         | 0x0 wrote:
         | They should have put labels in front of and after the string
         | bytes, then most assemblers would evaluate "(labelafter -
         | labelbefore)" to a constant integer giving the length as
         | needed. No need for a runtime sub instruction either, then.
        
       | messe wrote:
       | I love using a variant on that trick in real mode code:
       | print:             pop si         .loop:             lodsb
       | jz .end             mov ah, 0x0E             xor bh, bh
       | int 10h             jmp .loop         .end:             push si
       | ret
       | 
       | I'm writing this from memory, so there may be an off by one error
       | in the above code.
       | 
       | It's used like this, with a null terminated string, rather than a
       | hardcoded length:                   call print         db "hello,
       | world", 0
       | 
       | This can even be transformed into something like
       | puts "hello, world"
       | 
       | with the aid of NASM macros. I can't recall where I saw this
       | trick originally. Maybe some FreeDOS or GRUB code.
        
         | stevekemp wrote:
         | If first saw this in virus-code from the 80s, where you'd have
         | code to get the current location:                      call
         | next          next:            pop ax
         | 
         | I've used the same approach for printing "inline" strings
         | myself, though in my case I tend to be working with CP/M and
         | there the string are terminated with "$".
        
           | EvanAnderson wrote:
           | This is exactly what I thought of. Learning x86 assembler in
           | the context of reverse engineering MS-DOS made this trick
           | seem perfectly normal (as did the idea of writing position
           | independent code).
        
             | stevekemp wrote:
             | A later comment in this discussion reminds me that this was
             | called "calculating the delta-offset".
        
               | EvanAnderson wrote:
               | Yep. That's the terminology I'd expect to see in a 40Hex!
               | >smile<
        
         | akoboldfrying wrote:
         | Cute! And the time overhead vs. the usual stack-based parameter
         | passing convention is roughly zero, since even though the
         | callee has to "push si" at the end, the caller needs zero
         | instructions to pass the argument, instead of the usual one.
        
         | messe wrote:
         | Too late for me to edit, but my code is missing a "test al, al"
         | after "lodsb".
        
         | amluto wrote:
         | The fact that code like this would get acceptable performance
         | is amazing! By modern standards, there's maybe a few cycles
         | (assuming no cache misses) in the loop body plus that INT
         | instruction. That's maybe 20k cycles for the round trip (read
         | IDT and GDT, go through the whole awful ucode flow, and jump to
         | kernel, then do the work, and the do IRET, which is, again,
         | amazingly slow).
         | 
         | Fortunately, I'm pretty sure the CPUs that were intended to run
         | this code were rather more efficient at interrupts (in terms of
         | cycles) than modern x86 monsters.
         | 
         | Intel is at least trying to fix this with FRED.
        
       | OhNoNotAgain_99 wrote:
       | interesting but can you still get i386's?
        
         | blueflow wrote:
         | This article is referring to Intels 32-bit instruction set,
         | which seemingly all x86 machines still support.
        
           | phire wrote:
           | All 32bit/64bit x86 machines.
           | 
           | I believe you can still buy new 8086 class and 286 class
           | 16bit x86 cores in random SoCs.
           | 
           | And I know you can buy SoCs with 386 class x86 cores. It
           | might be more accurate to describe them as 486 class cores
           | that don't implement the full 486 instruction set.
        
         | Dwedit wrote:
         | It's what the architecture is named. 32-bit mode code for intel
         | processors is still called i386, even if that processor is
         | decades old.
         | 
         | The most significant (non-SIMD) change was adding in CMOV.
        
       | irdc wrote:
       | Tricks like these are going to become less common with execute-
       | only mapping of .text slowly proliferating through the industry
       | (iOS, OpenBSD).
       | 
       | Though i386 is unlikely to ever become execute-only.
        
         | H8crilA wrote:
         | Is that a security measure? What would execute-only prevent?
        
           | irdc wrote:
           | Yeah, it makes constructing ROP chains slightly more
           | difficult when combined with ASLR and the like as you cannot
           | defeat the randomisation by inspecting the running binary.
        
             | H8crilA wrote:
             | As in you already roughly know where code is mapped, but
             | need the lower bits of the offset? Or also to learn the
             | specific version of the running code?
        
               | irdc wrote:
               | A successful ROP attack requires the exact addresses of
               | the various gadgets used (refer to a definition of ROP if
               | this is unclear, as I'm currently on mobile). ASLR
               | thwarts this, as does the libc layout randomisation that
               | OpenBSD does on every boot. However, it's not perfect,
               | and if you can read program memory you could scan for
               | gadgets at run-time. This last point is prevented by
               | execute-only.
        
               | H8crilA wrote:
               | Ah, but you first need to have even an approximate idea
               | of where some code is mapped, otherwise you'll fault on
               | nearly all requests into a 64 bit space.
        
               | irdc wrote:
               | Yes, that's true. That's where infoleaks come in. Plus a
               | lot of crashes are likely not even noticed, or blamed on
               | the software just being buggy. Repeatedly crashing a
               | fork()'ing server might just give you enough information
               | to reconstruct its memory layout (which doesn't vary
               | between parent and child processes after a fork(), which
               | is why OpenSSH does an execve() of itself after
               | fork()'ing).
        
               | H8crilA wrote:
               | I see, but for a 1GiB mapped code space we're talking
               | here about 2^64/(1 Gi) = 17'179'869'184 attempts, or
               | perhaps about half of that with average luck.
        
             | Findecanor wrote:
             | There are also attacks such as "JIT spraying" where JIT-
             | compiled code contains large constants that the runtime
             | gets tricked into jumping into. Execute-only would make
             | that attack a little less likely.
        
         | PrimeMcFly wrote:
         | > Tricks like these are going to become less common with
         | execute-only mapping of .text slowly proliferating through the
         | industry (iOS, OpenBSD).
         | 
         | Give the PaX project some credit, since they had it before
         | OpenBSD did. Windows has had it for a while also, since XP.
        
           | taway1237 wrote:
           | Is this some obscure feature of Windows? In my experience,
           | while code sections are almost never writable, they're always
           | readable.
        
             | PrimeMcFly wrote:
             | I was wrong, I was thinking of DEP which is quite
             | different.
        
           | irdc wrote:
           | > Give the PaX project some credit, since they had it before
           | OpenBSD did.
           | 
           | I didn't know that and cannot find anything that confirms
           | this. You have a source?
        
         | adastra22 wrote:
         | 32-bit intel ISA supports execute-only memory pages.
        
           | irdc wrote:
           | Only through segmentation hacks right? In page table entries,
           | execute doesn't have a separate flag but shares it with read.
        
             | blibble wrote:
             | execute disable was added with the pentium 4 (but needs pae
             | page tables)
        
               | irdc wrote:
               | Yes, but execute _disable_ is not the same as execute
               | _only_. AFAIK there 's no way to prevent executable pages
               | from being readable using only the i386/amd64 page table.
        
               | amluto wrote:
               | You can fudge it with protection keys (poorly), and you
               | can do it for real with EPT tricks.
        
           | Findecanor wrote:
           | Only through segmentation, I think. However x86-S is supposed
           | to force a flat memory model for 32-bit programs as well.
           | 
           | On recent Intel processors, it is possible to execute-only
           | protect pages using Intel MPK (Memory-Protection Keys) by
           | having pages with a key be read-only in the page table but
           | "access disable" in the PKRU register. PKRU is accessible
           | from user mode though.
           | 
           | AFAIK, the only (still) mainstream CPU arch with reliable
           | execute-only protection is RISC-V. (I would like to be wrong,
           | and see it on e.g. ARM as well)
        
         | ehaliewicz2 wrote:
         | execute-only as in no reading?
        
       | qweqwe14 wrote:
       | This trick isn't i386-specific. In general, you can merge .data,
       | .rodata etc into one section with a linker script and it will
       | just work, pretty useful for saving a few bytes.
       | 
       | Also see sstrip for ELF files and this legendary writeup
       | https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...
        
       | flohofwoe wrote:
       | A similar "trick" was used on some 8-bit home computers for
       | passing (optionally variable-length) data to operating system
       | calls.
       | 
       | For instance on the KC85/2..4 operating system (CAOS) the
       | equivalent of "puts()" expects the "syscall index" and zero-
       | terminated text to print after the call instruction, e.g.:
       | CALL 0F003H    ; call into generic "syscall" entry         DEFB
       | 23H       ; "syscall" identifier         DEFM 'HELLO WORLD!'
       | DEFW 0D0AH     ; newline         DEFB 00        ; end of text
       | NOP            ; execution continues here
       | 
       | The syscall dispatcher would pop the return address from the
       | stack and that way discover the data. Before the syscall returns,
       | a modified return address which points to the first byte after
       | the data is pushed back on the stack.
       | 
       | Only downside of this approach was that disassemblers would get
       | terribly confused, unless they had specific knowledge about this
       | CAOS peculiarity.
        
         | warpspin wrote:
         | Yes. C64 GEOS also used this a lot. They used to call it
         | ,,inline calls":
         | 
         | https://archive.org/details/The_Official_GEOS_Programmers_Re...
        
         | n_plus_1_acc wrote:
         | The TI 83 family of calculators with a Z80 also use this.
        
         | PinguTS wrote:
         | Oh, yeah, KC85. We had them in school, when I was in my early
         | teens.
        
         | jsymolon wrote:
         | Apple II, DOS and PRODOS calls do that too.
         | 
         | https://prodos8.com/docs/techref/calls-to-the-mli/
        
         | tenebrisalietum wrote:
         | The C128 had a KERNAL routine called PRIMM that did that.
        
         | cancerhacker wrote:
         | Classic Mac used this for some toolbox traps as well. Most apps
         | used a jump table at some offset from the A5 register, which
         | looked like:                 addr: _LoadSeg             dc.w
         | segmentNumber             dc.w segmentOffset
         | 
         | The _LoadSeg trap would ensure that 'CODE'(segmentNumber) was
         | loaded from disk and then modify the jump code @addr to become
         | an absolute JMP (0x4ef9 + 32 bits) and then set the PC back to
         | @addr and return from the trap. There was also an _UnloadSeg
         | mechanism that would reverse this!
        
       | BruceEel wrote:
       | Well yes, as said here, it's more of a linker thing and not so
       | much a language or assembly thing. On Windows you could do the
       | below to have a single, executable, readable and writable
       | section. Not sure whether it still works anno 2023.
       | 
       | It's generally considered a bad idea from a security standpoint
       | #pragma comment(linker,"/MERGE:.data=.text /MERGE:.rdata=.text
       | /MERGE:.flat=.text  /SECTION:.text,EWR ")
        
         | taway1237 wrote:
         | The article is about call+pop "trick" in assembly, linker is
         | not relevant here.
        
           | _nalply wrote:
           | Right, but that trick is not so useful if you have a
           | different section than .text only, and that's what GP is
           | referring to.
        
             | taway1237 wrote:
             | I disagree. For me it's useful mostly for position
             | independent shellcode prologue, which has no sections to
             | speak of, and may get embedded in a "normal" executable or
             | something that is not an executable at all (useful in a
             | bootloader, or for injecting code to another process, or
             | self-relocating code, etc). I use this "trick" all the time
             | and I never felt the need to mess with a linker for this.
             | 
             | But it's a good hint, I hope I didn't sound overly
             | negative.
        
               | _nalply wrote:
               | Your point is interesting. I didn't think about this use
               | case. Inject code with ptrace. Like the LD_PRELOAD trick
               | but you don't even need LD_PRELOAD, just attach and
               | bamboozle the running process into running some code you
               | provided. In such cases sections don't exist, but pages.
               | Right.
        
         | Dwedit wrote:
         | It happens to be a lot easier to reverse-engineer a program
         | where the sections are not combined, and you can predict that
         | strings will reliably be in the .rdata section. While it does
         | save a few KB, it just makes things so much nicer for the next
         | people who need to patch features into the binary manually.
        
       | PinguTS wrote:
       | It seems I'm getting old. What's the trick here? Is this a trick?
       | Yeah, that is who have done things in the past. That's when I
       | learned programming and then "hacked in" the hex codes on a hex
       | keypad on a Z80. That's when I learned programming on my first
       | 8086. You tried to figured out what caused the least overhead.
       | That meant saving space on instructions and in processing
       | power/speed. But then I learned that this is called Spaghetti
       | code.
        
         | self_awareness wrote:
         | You knew it. New generation didn't. This is how the world
         | works.
        
           | polynomial wrote:
           | This may be one of the most surprising things I have learned
           | in my life.
        
       | self_awareness wrote:
       | Reading EIP through CALL is called "delta addressing", and it was
       | a common technique in malware back in the days when viruses were
       | infecting executable files (nowadays this doesn't exist because
       | of digitally signed code on all major platforms except Linux)
        
       | hun3 wrote:
       | Reading the call return address is basically how you write
       | position-independent code (relocatable without modifying the on-
       | memory executable image).
       | 
       | On Linux there's a stub subroutine that does exactly that:
       | __i686.get_pc_thunk.<reg>.
       | 
       | Here's the entire subroutine:
       | __i686.get_pc_thunk.bx:           MOV EBX, DWORD PTR [ESP]
       | RET
       | 
       | Yup, that's all. If you compile with gcc -m32 -fPIC, you'll see a
       | call to that thunk whenever a function accesses GOT or other
       | relocatable symbols.
        
         | russdill wrote:
         | I've also seen:                    call 1f
         | 
         | 1: <next instruction>
         | 
         | So commonly I hadn't considered that people thought getting the
         | EIP on x86 was an obstacle.
        
       | krylon wrote:
       | I discovered this trick in 2008, during my one excursion into
       | assembly programming. But it was for a different purpose. I even
       | did it in inline binary. I felt so clever. X-D
        
       | iefbr14 wrote:
       | In '75 we re-used the memory of code that was executed only once
       | to store stuff with IBM's system/370 assembler.
        
       | layer8 wrote:
       | I never get used to the fact that the segment for executable
       | machine code is called "text". Anyone know the history of that?
        
         | projektfu wrote:
         | It goes back to at least OS/360 (TXT Record), probably earlier.
         | It follows from referring to the text of the program vs the
         | data.
        
       | rbanffy wrote:
       | One interesting advantage of very small programs in the age of
       | slow storage was that, if they fit in one disk block, they'd skip
       | one drive seek and read the whole file from the block indicated
       | in the directory entry.
        
       | snickerbockers wrote:
       | I see this a lot reverse engineering programs made for an older
       | ISA from the 90s called SH4. Its a 32-bit RISC that uses 16-bit
       | instructions[1] and is therefore unable to load more than 8 bits
       | of arbitrary immediate data (sometimes 12 but usually 8) into a
       | register without spreading the operation over several
       | instructions so most functions will have large blocks of data at
       | the end (and sometimes even in the middle, because it needs to
       | get a pointer to the data by offset from the PC and the
       | instruction format is limited to 8 bit offsets) where they load
       | in constant values and pointers. I'm pretty sure gcc even does
       | this. I see it so often it never occurred to me this would be
       | unusual on other CPUs.
       | 
       | [1] doubles the effective size of the instruction cache and also
       | makes dual pipelines easier to implement because they can both
       | fetch from the same bus at the same time. Legend has it that this
       | was successful enough to be the inspiration behind thumb mode on
       | ARM, which is also a 32 bit ISA with 16 bit instructions.
        
         | projektfu wrote:
         | It's 8086's history as a descendant of a limited 8-bit ISA that
         | made it lack PC-relative addressing. I'm not sure why it was
         | never added in all of its iterations until x64.
         | 
         | Other ISAs from the minicomputer age (PDP-11) and their
         | descendants and inspirations (H8, 68k) had it. Zilog added PC
         | relative loads and address calculations to the Z8000, and it's
         | a generally popular form now in x64.
         | 
         | The Unix V6 assembly source code is very readable because of
         | this and also I think it was unfortunately responsible for the
         | 0-terminated string use because of the ease of writing it that
         | way.
        
         | NobodyNada wrote:
         | This is also very common on ARM; they call it a "literal pool":
         | https://developer.arm.com/documentation/dui0473/m/writing-ar...
        
           | duskwuff wrote:
           | The Thumb encoding for ARM also has some _very_ clever
           | encodings for inline constants:
           | 
           | https://developer.arm.com/documentation/ddi0308/d/Thumb-
           | Inst...
           | 
           | Specifically, it can encode any 8-bit value rotated by any
           | number of bits, as well as any value of the form 0x00XY00XY,
           | 0xXY00XY00, or 0xXYXYXYXY. Combined with the use of inverted
           | instructions (e.g. MVN instead of MOV, SUB instead of AND,
           | BIC instead of AND, etc), this covers a surprising number of
           | the 32-bit constants which are likely to appear in a program.
        
       | ipython wrote:
       | This has been used for decades by malware and shellcode that
       | needs to be compact and position-independent (loaded at any
       | virtual address). It is clever and as a side effect is that the
       | string is already loaded on the stack, so if your next step is to
       | call a function with that string as an argument, you can just
       | call that function directly.
       | 
       | It used to confuse a lot of disassemblers, where you'd have to
       | re-synchronize the disassembly after the string and disambiguate
       | between 'code' and 'data' by hand.
        
       | ithkuil wrote:
       | pdp-11 had a very elegant unification of "immediate operand" and
       | "pc-relative addressing".
       | 
       | Basically one of the addressing mode is "access word pointed to
       | by register+offset and post-increment the register by word size
       | (2 bytes)".
       | 
       | That can be used to pop off a word from the stack if the register
       | is the stack pointer, but if the register is the program counter,
       | that basically reads the word following the instruction and
       | causes the CPU to continue execution after the immediate data.
       | 
       | A truly orthogonal instruction set :-)
        
       | higherhalf wrote:
       | This can be done even simpler:                 global _start
       | _start:           jmp next       string:           db `Hello
       | World!\n`       len: equ $ - string       next:           mov
       | ecx, string           mov edx, len           mov ebx, 1
       | mov eax, 4           int 80h                  mov ebx, 0
       | mov eax, 1           int 80h
       | 
       | For NASM, it can also be put into a macro, for example printing
       | to video memory at 0xb8000:                 %macro print 1
       | mov ecx, %%loop_start - %%strdata           mov eax, 0x0700
       | jmp %%loop_start       %%strdata: db %1       %%loop_start:
       | mov al, [%%strdata + ecx - 1]           mov [0xb8000 + ecx * 2 -
       | 2], ax           loop %%loop_start       %endmacro
        
         | projektfu wrote:
         | That will require a fixup or a fixed load address. The example
         | in the article is position independent.
        
         | fargle wrote:
         | the author wanted it to be position-independent (PIC), so it
         | works no matter what address the .text segment is loaded to and
         | run.
         | 
         | This example uses a fixed symbolic reference ("string:") and is
         | the normal way to do it. The trick is to it in a PC relative
         | way.
        
       | hota_mazi wrote:
       | The first time I saw this trick was with ProDOS on the Apple ][,
       | circa 1983.
       | 
       | ProDOS came up with this new call syntax where the parameters to
       | the API follow the call to it.
       | 
       | For example:                           ldx #$00
       | ldy #$10                         sty params+4                 stx
       | params+5    ; setup number of bytes to read (16)
       | jsr $BF00       ; call ProDOS                 .BYTE $CA       ;
       | ProDOS command number = CA (read)                 .WORD params
       | ; address of parameter table, lo/hi                 bcs error
       | ; carry set, error                 .                 .
       | params .BYTE $04       ; number of parameters for a read
       | .BYTE $00       ; file reference number, 0, 1, 2 in MacQForth
       | .WORD BUFFER    ; pointer to data buffer                 .WORD
       | $0000     ; requested number of bytes to read, fill in
       | .WORD $0000     ; number actually read, returned by ProDOS
        
       ___________________________________________________________________
       (page generated 2023-11-09 23:01 UTC)