[HN Gopher] A Proper x86 Assembler in Haskell Using the Escardo-...
___________________________________________________________________
A Proper x86 Assembler in Haskell Using the Escardo-Oliva
Functional
Author : Smaug123
Score : 85 points
Date : 2025-01-20 13:40 UTC (3 days ago)
(HTM) web link (blog.vmchale.com)
(TXT) w3m dump (blog.vmchale.com)
| netr0ute wrote:
| This is basically irrelevant now that better ISAs like RISC-V
| have a fixed instruction length (2 or 4 bytes) so the fancy
| algorithm here isn't necessary.
| Coolbeanstoo wrote:
| As a result of RISC-V existing, all x86 processors have ceased
| to exist or be produced.
| snvzz wrote:
| Accurate, if said sometime in the future rather than today.
| saagarjha wrote:
| There are still people making z80 machines today, so no.
| nicebyte wrote:
| ARM would have been a better example because the amount of
| people that care about RISC-V is a rounding error compared to
| x86 or ARM.
| lifthrasiir wrote:
| That fancy algorithm is relevant to RISC-V (and in fact, most
| fixed-length ISAs) because loading an immediate into a register
| needs one or two instructions depending on the immediate; you
| surely want to elide a redundant LUI instruction if you can. Of
| course such redundant instructions don't harm by itself, but
| that equally applies to x86 as the algorithm _is_ an
| optimization.
| remexre wrote:
| This same problem applies to RISC-V with the C extension,
| because the J and JAL instructions have a larger range than the
| C.J and C.JAL instructions.
| tliltocatl wrote:
| Having fixed instruction length doesn't make the need to load
| large constants magically disappear. These just get split
| between multiple instructions. If anything, RISC-V might be
| worse. See also https://maskray.me/blog/2021-03-14-the-dark-
| side-of-riscv-li....
| ggm wrote:
| One pass, but.. I'm willing to bet there is some transitional
| internal state which had to be looped over to do this, subsuming
| the "pass" inside lazy eval. You have to know the indeterminate
| "where" to jump to, which depends on subsequent state. So you
| preserve execution state, calculate on, and return at end to fill
| in the gaps.
| lmm wrote:
| That bigotimes function that it's talking about is doing
| something like tree search with backtracking - lazy evaluation
| isn't really relevant as far as I can see. I don't think the
| statement that it's "one pass" is meant to mean that it's a
| closed form solution or something, the point is that you can
| write your code as plain single pass logic and this generic
| functional tool does the rest.
| ggm wrote:
| It's a showcase for Haskell and well worth reading. I should
| be less snide.
|
| Nowadays "multipass" means Mila Jovovich but it used to mean
| "another complex language with massive implications we deal
| with by divide-and-conquer-through-the-filesystem" and in the
| case of the York ada compiler that was some amazing number
| like 10 or more passes.
|
| There was a fashion for running your C instrumented and then
| recompiling it after runtime branch choice evidence was
| collected. I think the Dec Alpha OSF/1 compiler did it
| optionally.
| astrange wrote:
| > There was a fashion for running your C instrumented and
| then recompiling it after runtime branch choice evidence
| was collected. I think the Dec Alpha OSF/1 compiler did it
| optionally.
|
| This is called profile-guided optimization and new
| compilers (well, GCC and LLVM) have it.
|
| I mostly think it's bad and not statistically sound; given
| profile data the compiler is both overly trusting of it (no
| error bars) and can't find much useful to do with it
| (because there's no way to hint a modern OoO CPU).
| rhet0rica wrote:
| Yes, that seems to be what the aptly-named "tardis" monad does,
| but the article notes that it isn't sufficient because of the
| minutiae of how jump instructions themselves can vary in length
| under x86.
|
| In an act of brilliance that surely proves beyond all doubt
| that Haskell's main reason for existing is to upstage the
| esolang scene, the author solves the problem by trotting out a
| special-purpose constraint satisfaction algorithm. (You can
| imagine what kind of computational complexity this implies. The
| creators of this algorithm note that it can enumerate all tic-
| tac-toe games in just over a second!)
| asplake wrote:
| Explanation here: https://kcsongor.github.io/time-travel-in-
| haskell-for-dummie...
| codebje wrote:
| Yes - it's pretty opaque to trace through, but you can see in
| `valid` that for a given move history, a "good" test is
| performed on each instruction in the assembly listing - a full
| pass. This is invoked, asymptotically at least, once per jump
| per level of the search tree explored in `bigotimes`. Each
| level 'i' of the search tree has a history of `i` jumps decided
| to be short or near and branches 'n-i' times; both of those
| average out to n/2, so the tree searches n^2 nodes with each
| node doing 'n' iterations over the list of instructions to
| check all jumps are valid, so the total running time is O(n^3).
|
| (With some modifications to the code as shown, anyway: `lookup`
| is linear but used in such a way it can be replaced with a
| constant vector index, instruction lists can be compressed to
| just jumps & labels with byte counts following, avoiding the
| duplicate invocations of `p` in `arginf`, etc).
|
| I'd be dubious that there's an algorithm to find the true
| minimum in less than cubic time, but there's plenty of "good
| enough" approaches: most branches will be definitely short or
| definitely not; jumps backwards can be resolved when
| encountered; jumps forward can be written long and replaced
| with a short jump plus no-op padding when the target is
| encountered within short jump distance.
| vdupras wrote:
| Or you could just be pessimistic in your offset calculation and
| assume that all "in between" unconditional jumps are 5 bytes in
| length. I know I know, not proper. But I doubt that in real code,
| that simpler algo would make you miss a lot of opportunities for
| 2 bytes jumps.
| guerrilla wrote:
| Yeah, this is a fun problem. This is where the majority of my
| work went when writing an x86-virus that lived in the NOP areas
| between functions. Compilers produced those blocks of NOPs after
| function bodies to make the following function addresses aligned
| to to 16-byte boundaries. I chained all of those cavities
| together with jumps, distributing as much code as would fit and
| then putting a jump to the next cavity at the end. The trick is
| that you could fit more instructions (3 more bytes?) if you had a
| shorter jump.
|
| I still want to write my own assembler some day.
| tliltocatl wrote:
| I know I might be barking up "just a showcase" tree, but what's
| the memory complexity of this? E g. a nice property of multipass
| assemblers is that these only need to keep a list of symbols in
| the memory rather than the whole program. Not really a concern
| for modern systems with large memories, but then those neat
| haskell tricks usually carry some extra O(n)s - in addition to
| (avoidable) extra constant factors due to hidden linked lists and
| immutability. Aka the infamous haskell quicksort - much shorter
| and clear, but also not really a quicksort.
___________________________________________________________________
(page generated 2025-01-23 23:03 UTC)