[HN Gopher] Examples of RISC-V Assembly Programs
       ___________________________________________________________________
        
       Examples of RISC-V Assembly Programs
        
       Author : azhenley
       Score  : 46 points
       Date   : 2021-04-26 19:36 UTC (3 hours ago)
        
 (HTM) web link (marz.utk.edu)
 (TXT) w3m dump (marz.utk.edu)
        
       | squarefoot wrote:
       | I see there are several 3 arguments instructions; is that common
       | among modern architectures? I only wrote some M68K assembly ages
       | ago, so it seems rather unusual to me.
        
         | Taniwha wrote:
         | it's very common - risc-v also has a 2-instruction compressed
         | subset
         | 
         | Note: "jalr zero, 1b" can also be written as "j 1b", "jalr
         | zero, 0(ra)" can be written as "ret"
        
           | gioele wrote:
           | > it's very common - risc-v also has a 2-instruction
           | compressed subset > > Note: "jalr zero, 1b" can also be
           | written as "j 1b", "jalr zero, 0(ra)" can be written as "ret"
           | 
           | `j` and `ret` are so-called "pseudo instructions" [1], not
           | compressed instructions.
           | 
           | Pseudo instructions are just shortcuts used in assembly
           | language to pretend that some common operations really
           | "exist" without the need to type (or display) the
           | corresponding more complex (but actually existing)
           | instructions. `nop` is a common pseudo instruction. RISC-V
           | has no real `nop` instruction, but, instead, the "do nothing
           | instruction" is canonically encoded as `addi x0, x0, 0`. The
           | programmer can write a more understandable `nop`, and the
           | assembler will write instead the binary code equivalent to
           | `addi x0, x0, 0`.
           | 
           | The compressed instruction set (a.k.a "extension C"),
           | instead, is a subset of the full [2] instruction set, in
           | which a restricted combinations of operands are possible. The
           | assembly (human readable) code of the compressed instruction
           | set looks similar to that of the full instruction set
           | (including pseudo instructions), but they are encoded as
           | completely different binary sequences.
           | 
           | [1] https://github.com/riscv/riscv-asm-
           | manual/blob/master/riscv-...
           | 
           | [2] https://riscv.org/wp-content/uploads/2019/06/riscv-
           | spec.pdf#...
        
         | monocasa wrote:
         | Yeah, it's really common for RISCs particularly to be three
         | address instructions. Two address means a read modify write of
         | one of the registers, which can make it more difficult to split
         | dependencies for an example of one way it complicates higher
         | perf designs.
        
           | wk_end wrote:
           | Not an expert but I don't think that's quite right. No three-
           | address ISAs (that I know of?) do anything to forbid using
           | one of the sources as a destination, I'm not even aware of a
           | microarch that punishes you for doing so, and the three-
           | address design goes back to the original RISC/MIPS designs
           | where any sort of advanced dependency analysis in the
           | pipeline wasn't even a consideration (it was considered
           | somewhat un-RISC-y maybe even).
           | 
           | I think the motivation was something more like: there's room
           | in the instruction encoding and it's more flexible. Like you
           | can do things like move a value from one register to another
           | using an `add` instruction instead of a dedicated `move` by
           | using `dest = source + zero`. Internally I think some CPUs
           | were doing that sort of thing anyway to simplify the
           | datapath, and the basic point of RISC was to expose the
           | microarch in the instruction set.
        
             | monocasa wrote:
             | I gave it as an example of one way it can complicate a
             | higher perf design, and why it's stuck on new ISAs. There's
             | a lot of choices old school RISC made that happen to still
             | be valid at much higher gate counts, but for different
             | reasons. Three address is one of those. "Why do we still
             | have it?" is the more interesting question compared to "why
             | did we do it in the first place" IMO.
             | 
             | You see the CISC hacking in the "xor REG" instruction as
             | 'erase dependency in OoO hardware' to get around the
             | dependency issues I talked about. Three address doesn't
             | have that root issue because you can always specify a
             | destination that wasn't a source.
             | 
             | Additionally, it's not a case of exposing the CISC internal
             | datapath as RISC instructions. CISC vertical
             | microinstructions tend to be two address as well. The
             | horizontal microinstructions can't really be said to have
             | clear source and destination enough to be either two or
             | three address.
        
       | makach wrote:
       | reading the source code of these essential functions like 'strlen
       | strcpy' etc is like magician showing you how the trick is
       | performed.
       | 
       | and it feels utterly devastating, like I have been cheated, the
       | banality of how it works and how simple it really is...
        
         | f00zz wrote:
         | If that's too simple for you, you may have fun with the strlen
         | implementation in glibc:
         | 
         | https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=strin...
        
           | makach wrote:
           | Excellent reply! Thanks! ..and also thanks for pointing out
           | the obvious *magic in the code!
        
       | atq2119 wrote:
       | Neat, though they look unoptimized at a glance. For example, I'd
       | expect the strlen loop to be rotated so that there is only a
       | single backwards branch instruction per iteration. Basically, you
       | want the loop structure to be:                       count = 0
       | goto check         body:             count++, ptr++
       | check:             if *ptr != 0 goto body
       | 
       | (The initial jump could be a copy of the check instead)
        
         | azhenley wrote:
         | They're for teaching sophomores, so I don't think optimizations
         | would help.
        
         | a1369209993 wrote:
         | > count++, ptr++
         | 
         | Also, you should avoid this; it adds a extra addition per
         | iteration for no real benefit. Try:                   base =
         | ptr - 1  # also avoid branching[0] over ptr++       body:
         | ptr++         if *ptr != 0 goto body         return ptr - base
         | 
         | 0: I assume a single addition is cheaper than a unconditional
         | branch, which is probable, but not nearly as big a difference
         | as with conditional branches.
        
       ___________________________________________________________________
       (page generated 2021-04-26 23:00 UTC)