hngopher.com

       [HN Gopher] A Proper x86 Assembler in Haskell Using the Escardo-...
       ___________________________________________________________________
        
       A Proper x86 Assembler in Haskell Using the Escardo-Oliva
       Functional
        
       Author : Smaug123
       Score  : 85 points
       Date   : 2025-01-20 13:40 UTC (3 days ago)
        
 (HTM) web link (blog.vmchale.com)
 (TXT) w3m dump (blog.vmchale.com)
        
       | netr0ute wrote:
       | This is basically irrelevant now that better ISAs like RISC-V
       | have a fixed instruction length (2 or 4 bytes) so the fancy
       | algorithm here isn't necessary.
        
         | Coolbeanstoo wrote:
         | As a result of RISC-V existing, all x86 processors have ceased
         | to exist or be produced.
        
           | snvzz wrote:
           | Accurate, if said sometime in the future rather than today.
        
             | saagarjha wrote:
             | There are still people making z80 machines today, so no.
        
         | nicebyte wrote:
         | ARM would have been a better example because the amount of
         | people that care about RISC-V is a rounding error compared to
         | x86 or ARM.
        
         | lifthrasiir wrote:
         | That fancy algorithm is relevant to RISC-V (and in fact, most
         | fixed-length ISAs) because loading an immediate into a register
         | needs one or two instructions depending on the immediate; you
         | surely want to elide a redundant LUI instruction if you can. Of
         | course such redundant instructions don't harm by itself, but
         | that equally applies to x86 as the algorithm _is_ an
         | optimization.
        
         | remexre wrote:
         | This same problem applies to RISC-V with the C extension,
         | because the J and JAL instructions have a larger range than the
         | C.J and C.JAL instructions.
        
         | tliltocatl wrote:
         | Having fixed instruction length doesn't make the need to load
         | large constants magically disappear. These just get split
         | between multiple instructions. If anything, RISC-V might be
         | worse. See also https://maskray.me/blog/2021-03-14-the-dark-
         | side-of-riscv-li....
        
       | ggm wrote:
       | One pass, but.. I'm willing to bet there is some transitional
       | internal state which had to be looped over to do this, subsuming
       | the "pass" inside lazy eval. You have to know the indeterminate
       | "where" to jump to, which depends on subsequent state. So you
       | preserve execution state, calculate on, and return at end to fill
       | in the gaps.
        
         | lmm wrote:
         | That bigotimes function that it's talking about is doing
         | something like tree search with backtracking - lazy evaluation
         | isn't really relevant as far as I can see. I don't think the
         | statement that it's "one pass" is meant to mean that it's a
         | closed form solution or something, the point is that you can
         | write your code as plain single pass logic and this generic
         | functional tool does the rest.
        
           | ggm wrote:
           | It's a showcase for Haskell and well worth reading. I should
           | be less snide.
           | 
           | Nowadays "multipass" means Mila Jovovich but it used to mean
           | "another complex language with massive implications we deal
           | with by divide-and-conquer-through-the-filesystem" and in the
           | case of the York ada compiler that was some amazing number
           | like 10 or more passes.
           | 
           | There was a fashion for running your C instrumented and then
           | recompiling it after runtime branch choice evidence was
           | collected. I think the Dec Alpha OSF/1 compiler did it
           | optionally.
        
             | astrange wrote:
             | > There was a fashion for running your C instrumented and
             | then recompiling it after runtime branch choice evidence
             | was collected. I think the Dec Alpha OSF/1 compiler did it
             | optionally.
             | 
             | This is called profile-guided optimization and new
             | compilers (well, GCC and LLVM) have it.
             | 
             | I mostly think it's bad and not statistically sound; given
             | profile data the compiler is both overly trusting of it (no
             | error bars) and can't find much useful to do with it
             | (because there's no way to hint a modern OoO CPU).
        
         | rhet0rica wrote:
         | Yes, that seems to be what the aptly-named "tardis" monad does,
         | but the article notes that it isn't sufficient because of the
         | minutiae of how jump instructions themselves can vary in length
         | under x86.
         | 
         | In an act of brilliance that surely proves beyond all doubt
         | that Haskell's main reason for existing is to upstage the
         | esolang scene, the author solves the problem by trotting out a
         | special-purpose constraint satisfaction algorithm. (You can
         | imagine what kind of computational complexity this implies. The
         | creators of this algorithm note that it can enumerate all tic-
         | tac-toe games in just over a second!)
        
           | asplake wrote:
           | Explanation here: https://kcsongor.github.io/time-travel-in-
           | haskell-for-dummie...
        
         | codebje wrote:
         | Yes - it's pretty opaque to trace through, but you can see in
         | `valid` that for a given move history, a "good" test is
         | performed on each instruction in the assembly listing - a full
         | pass. This is invoked, asymptotically at least, once per jump
         | per level of the search tree explored in `bigotimes`. Each
         | level 'i' of the search tree has a history of `i` jumps decided
         | to be short or near and branches 'n-i' times; both of those
         | average out to n/2, so the tree searches n^2 nodes with each
         | node doing 'n' iterations over the list of instructions to
         | check all jumps are valid, so the total running time is O(n^3).
         | 
         | (With some modifications to the code as shown, anyway: `lookup`
         | is linear but used in such a way it can be replaced with a
         | constant vector index, instruction lists can be compressed to
         | just jumps & labels with byte counts following, avoiding the
         | duplicate invocations of `p` in `arginf`, etc).
         | 
         | I'd be dubious that there's an algorithm to find the true
         | minimum in less than cubic time, but there's plenty of "good
         | enough" approaches: most branches will be definitely short or
         | definitely not; jumps backwards can be resolved when
         | encountered; jumps forward can be written long and replaced
         | with a short jump plus no-op padding when the target is
         | encountered within short jump distance.
        
       | vdupras wrote:
       | Or you could just be pessimistic in your offset calculation and
       | assume that all "in between" unconditional jumps are 5 bytes in
       | length. I know I know, not proper. But I doubt that in real code,
       | that simpler algo would make you miss a lot of opportunities for
       | 2 bytes jumps.
        
       | guerrilla wrote:
       | Yeah, this is a fun problem. This is where the majority of my
       | work went when writing an x86-virus that lived in the NOP areas
       | between functions. Compilers produced those blocks of NOPs after
       | function bodies to make the following function addresses aligned
       | to to 16-byte boundaries. I chained all of those cavities
       | together with jumps, distributing as much code as would fit and
       | then putting a jump to the next cavity at the end. The trick is
       | that you could fit more instructions (3 more bytes?) if you had a
       | shorter jump.
       | 
       | I still want to write my own assembler some day.
        
       | tliltocatl wrote:
       | I know I might be barking up "just a showcase" tree, but what's
       | the memory complexity of this? E g. a nice property of multipass
       | assemblers is that these only need to keep a list of symbols in
       | the memory rather than the whole program. Not really a concern
       | for modern systems with large memories, but then those neat
       | haskell tricks usually carry some extra O(n)s - in addition to
       | (avoidable) extra constant factors due to hidden linked lists and
       | immutability. Aka the infamous haskell quicksort - much shorter
       | and clear, but also not really a quicksort.
        
       ___________________________________________________________________
       (page generated 2025-01-23 23:03 UTC)