[HN Gopher] ARM vs. RISC-V Vector Extensions
       ___________________________________________________________________
        
       ARM vs. RISC-V Vector Extensions
        
       Author : zdw
       Score  : 86 points
       Date   : 2021-05-06 14:26 UTC (8 hours ago)
        
 (HTM) web link (erik-engheim.medium.com)
 (TXT) w3m dump (erik-engheim.medium.com)
        
       | kingsuper20 wrote:
       | I hit so many cases in SSE (and others) where I needed 'just one
       | more instruction' and instead had a bit of a mess, my bet is that
       | simple SIMD instruction sets will always grow over time.
       | 
       | For fun, maybe they could bolt on a VLIW set (or KLIW, 'Kinda
       | Large') and push some of the ordering work onto the
       | programmer/compiler.
        
       | ChuckMcM wrote:
       | One of the more interesting and fun things you can do these days
       | is get a ULX3S[1], use the community RISC-V project to turn it
       | into a fully functional RISC-V system, and then design your own
       | vector instructions to play around with different ways of doing
       | things. All for < $200 US which is always amazing to me.
       | 
       | [1] https://www.crowdsupply.com/radiona/ulx3s (85T version
       | recommended)
        
       | CalChris wrote:
       | I think the RISC-V Vector Extensions are very elegant. However,
       | I'm more interested in what hard core practitioners think and the
       | ones I follow are nonplussed.
       | 
       | https://gist.github.com/erincandescent/8a10eeeea1918ee4f9d99...
       | 
       | https://twitter.com/geofflangdale/status/1155122593369710592
       | 
       | I'm reminded of a compliment for Vishy Anand's middlegame
       | technique. _He plays very well with the knight pair._
        
         | jpfr wrote:
         | RISC-V really shines when the op-compression extension and
         | macro-op fusion are taken into account.
         | 
         | https://news.ycombinator.com/item?id=25542963
         | 
         | Many of the early critiques of RISC-V have not considered that.
         | And many points become entirely moot in light of these
         | techniques.
        
       | Scene_Cast2 wrote:
       | I think some of the complaints are about the number of
       | instructions. But when you need to e.g. load a masked set of
       | vectors, it's really nice to have such an instruction handy.
        
       | Symmetry wrote:
       | >If you are a hobbyist like me, who just wants to keep up to date
       | with how technology is evolving and what things like vector
       | processing is, then safe yourself a lot of trouble and just read
       | a RISC-V book.
       | 
       | Yes, I'd absolutely agree. If you're a professional you'll
       | probably find the hundreds of ARM instructions useful for
       | optimizing your kernels if you're hand-crafting assembly but
       | you'll be pretty far up the learning curve before you get to the
       | crossover point. And not just for hobbyists but also for most
       | academics working on exploring vectorization as well.
       | 
       | But if it's me just making use of the effort someone else has put
       | into BLAS for my robot then I'll probably be better off with an
       | SVE processor both from a code optimization scope standpoint and
       | an ease of hardware implementation standpoint.
        
       | sanxiyn wrote:
       | Note that while the article is broadly correct, RISC-V Vector
       | Extension is still in development and the article is based on the
       | old version. SETVL is now three arguments (not two) and renamed
       | to VSETVL, for example.
        
         | _chris_ wrote:
         | It looks like RVV will be going up for ratification this
         | summer, so (https://github.com/riscv/riscv-v-
         | spec/blob/master/v-spec.ado...) is pretty close to the final
         | version.
        
       | jcranmer wrote:
       | > RISC-V however does not work like this. The RISC-V vector
       | registers are in a separate register file not shared with the
       | scalar floating point registers.
       | 
       | Honestly... in hardware, they probably _are_ actually in the same
       | register file. It just now means you have two sets of
       | architectural registers that rename to the same register file.
       | 
       | As for the rest of the article, it looks like it mostly boils
       | down to "I'm intimidated by assembly programming" as opposed to
       | any actual critique of the strengths and weaknesses of the vector
       | ISAs. There's superficial complaints about the numbers of
       | instructions, or different ways to write (the same? I only know
       | scalar ARM assembly, not any vector extensions) instructions. On
       | a quick reread, I see a complaint that's entirely due to how ARM
       | represents indexed load operations, which has absolutely nothing
       | to do with the vector ISA whatsoever.
       | 
       | If your goal is to understand how hardware SIMD works, you're
       | probably better off sticking to C code with intrinsics, that way
       | you're not distracted by the extra hoops you may have to go
       | through that arise just by translating C into assembly.
        
         | _chris_ wrote:
         | >> The RISC-V vector registers are in a separate register file
         | not shared with the scalar floating point registers.
         | 
         | > Honestly... in hardware, they probably are actually in the
         | same register file. It just now means you have two sets of
         | architectural registers that rename to the same register file.
         | 
         | You could have a single unified pool of physical registers that
         | can be handed out to any register, but there's only some
         | advantage to do so and a lot of advantages in keeping them
         | separate. Either way, that's a micro-architectural detail that
         | the designers have the freedom to choose (or not choose) to do.
         | 
         | From the software's point of view, there's a lot of advantages
         | in keeping different architectural registers separate.
        
           | jcranmer wrote:
           | > You could have a single unified pool of physical registers
           | that can be handed out to any register, but there's only some
           | advantage to do so and a lot of advantages in keeping them
           | separate. Either way, that's a micro-architectural detail
           | that the designers have the freedom to choose (or not choose)
           | to do.
           | 
           | What's the advantage to keeping them separate? If you're
           | implementing vector instructions, then your scalar floating-
           | point units are probably going to be the same as the vector
           | floating-point units, with zero-extension for the unused
           | vector slots. At that point, keeping them separate hardware
           | register slots is detrimental: it's now costing you extra
           | area as well, with concomitant power costs. You also need
           | larger register files to accommodate all of the vector
           | registers _and_ the floating-point registers, when you 're
           | only likely to use half of them at any times. If you're
           | pushing the vector units to their throttle, you'll have
           | little scalar code to need all the renaming; if you're
           | pushing the scalar units to their throttle, you'll similarly
           | have little vector code.
           | 
           | From a software viewpoint, eh, there's not really any
           | advantage to keeping them separate. You tend to use scalar
           | xor vector floating point code, not both (this isn't true for
           | integer, though), so there's little impact on actual register
           | pressure. More architectural registers means more state to
           | have to spill on context switches.
        
         | swiley wrote:
         | Undergrad computer architecture class freaked me out. Assembly
         | doesn't even accurately describe what the machine does.
        
       | brigade wrote:
       | SVE was designed mindful of how CPUs currently operate, whereas
       | RISC-V vector extensions were designed with fondness for how CPUs
       | operated decades ago.
       | 
       | Well that's somewhat of an exaggeration, but XT-910 speculatively
       | executes vector instructions based on a prediction of how
       | vsetvl(i) modifies register configurations in order to achieve
       | good performance, so changing this register configuration causes
       | speculation failures as though it were a mispredicted branch.
       | Which you need to do if you're doing any mixed precision
       | operations, or mixed integer/floating point, and discourages
       | small SIMD functions. Quote from their white paper: "this is not
       | friendly to deeply pipelined processor architecture."
       | 
       | Fundamentally, I dislike how completely the meaning of RISC-V
       | vector instructions depends on what instructions were executed an
       | arbitrarily long time beforehand. Also he's really complaining
       | about register indexing in load/stores?
        
         | SuchAnonMuchWow wrote:
         | Also, the RISC-5 vector extension is at the opposite of the
         | philosophy of RISC vs CISC
        
         | _chris_ wrote:
         | > Quote from their white paper: "this is not friendly to deeply
         | pipelined processor architecture."
         | 
         | There is zero reason you can't rename the VL register.
         | "Speculating" that it doesn't change is only one design point.
        
           | brigade wrote:
           | Well... when the operation of register renaming itself
           | depends on the vtype register, that isn't a complete
           | solution.
        
       | klelatti wrote:
       | TLDR Arm SVE is a bit of a pain to hand write assembly for
       | because there is more to look up?
       | 
       | Can anyone recommend a good introduction to SVE2?
        
         | brandmeyer wrote:
         | SVE2 isn't all that different from SVE. The early SVE
         | introductory papers and tutorials are all great information.
         | 
         | https://arxiv.org/abs/1803.06185
         | 
         | https://developer.arm.com/architectures/instruction-sets/sim...
         | 
         | If the author had spent as much time reading the introductory
         | manuals as he did bitching all over that webpage, he might have
         | made some progress /rant.
        
           | klelatti wrote:
           | Thanks - these look great. Entirely justified rant too!
        
       ___________________________________________________________________
       (page generated 2021-05-06 23:02 UTC)