[HN Gopher] The Case of the Missing Increment
       ___________________________________________________________________
        
       The Case of the Missing Increment
        
       Author : eigenform
       Score  : 69 points
       Date   : 2024-09-27 13:36 UTC (4 days ago)
        
 (HTM) web link (www.computerenhance.com)
 (TXT) w3m dump (www.computerenhance.com)
        
       | vardump wrote:
       | Just when you get used with features like x86 CPUs combining two
       | instructions into one micro-op (micro-op fusing), you get
       | something like this.
       | 
       | I guess immediate addressing mode addition is a good choice to
       | execute at rename / allocation stage, as it's common, relatively
       | simple and can't generate exceptions.
        
         | eigenform wrote:
         | > immediate addressing mode addition
         | 
         | Well, except for the fact that you need to read from a register
         | before adding the immediate displacement to it. You'd have to
         | know the physical register and do the read very early (before
         | renaming), or predict the value!
        
           | eigenform wrote:
           | I just realized you were probably referring to the example
           | given from the AnandTech article with `lea r64, [r64+imm8]`.
           | 
           | Caveat is just that [presumably] the source and destination
           | registers have to be matching (since `lea rax, [rax+imm]` is
           | just `add rax, imm`).
        
         | Taniwha wrote:
         | This isn't really combining as the result of the first
         | increment is needed by the intermediate compare, but is a
         | rewriting that removes a dependency (or moves it further back
         | in the stream)
        
           | vardump wrote:
           | Maybe it rewrites multiple immediate additions into one.
        
       | Taniwha wrote:
       | Thinking about this - this may be a pattern that;s designed to
       | match something that expands from a string instruction.
       | 
       | While the loop he's testing is a useless bit of code that does
       | nothing the optimisation he's discovered may help speed things
       | like scasb/stosb allowing portions of 2 unrolled copies to be
       | processed per clock
        
       | buttocks wrote:
       | Deep thoughts: why aren't "increment" and "excrement" opposites?
        
         | Joker_vD wrote:
         | Because "increase" and "excrete" have completely different
         | roots that only coincidentally coincide when the verbal nouns
         | corresponding to those words are formed.
        
           | knodi123 wrote:
           | now do "progress" and "congress"!
        
             | Joker_vD wrote:
             | You mean, the difference between "going forward" and
             | "coming together"? It's in the prefix, "pro-" (for,
             | forward) versus "con-" (with, together) which give you
             | different shades of the meaning. Can't really say what's
             | the verb of movement was though.
        
               | oersted wrote:
               | I think he meant it as an absurdist joke, but this is a
               | great response!
               | 
               | I looked it up, "gress" comes from "gradi" in Latin which
               | directly translates to "walk". More specifically:
               | con(pro) + gradi -> congredi (verb) -> congressus (noun)
               | 
               | Edit: Knowing this, "gradient" has an interesting flavour
               | :)
               | 
               | Edit: It looks like the path is more indirect for
               | "gradient"
               | 
               | "gradi" (walk) -> "gradus" (step) -> "grade" (french
               | influence) + "salient" -> "gradient". I like that in
               | Latin "walk" is "to step", or perhaps "step" is "the unit
               | of walking"? "A walking"? Etymology is fun!
        
               | randomdata wrote:
               | now do "flammable" and "inflammable"!
        
               | dpkirchner wrote:
               | What a country!
        
               | Joker_vD wrote:
               | > I like that in Latin "walk" is "to step", or perhaps
               | "step" is "the unit of walking"? "A walking"?
               | 
               | Consider the verb "to pace", and the corresponding noun
               | "pace": the analogy is almost perfect. Of course, Latin
               | also had other words for going places.
        
         | IWeldMelons wrote:
         | Your name checks out. You should be an expert in that
         | (excremental) matters.
        
       | leiroigh wrote:
       | That's pretty cool.
       | 
       | Normally it would be the either the programmer's or the
       | compiler's job to unroll a loop and then reduce dependency chain
       | lengths.
       | 
       | But its nice if the renamer can do that as well.
       | 
       | Presumably intel have real-world data that suggest that
       | significant real workloads can profit from this.
       | 
       | I wonder whether that points to specific software issues, like
       | hypothetically "oh yeah, openjdk8 hotspot was a little too timid
       | at loop unrolling. It won't get that JIT improvement backported,
       | but our customers will use java8 forever. Better fix that in
       | silicon".
        
       | pkhuong wrote:
       | I believe I first saw this on IACA; uops.info has the
       | measurements for zero-latency inc, add, etc on Alder Lake
       | https://uops.info/html-instr/INC_R64.html . These adds by
       | immediate are nicely closed, so I've been assuming renamed values
       | are uniformly represented in Golden Cove as register+increment.
        
       | zokier wrote:
       | > Since the only Alder Lake machine I had access to was a remote
       | Windows machine that didn't belong to me, I more-or-less had to
       | choose option 3, which meant subjecting myself to The Ultimate
       | Sadness
       | 
       | Well, you can pick up Sapphire Rapids instances from your
       | preferred cloud provider and avoid the sadness.
        
         | deater wrote:
         | do cloud providers give full, unrestricted access to hardware
         | performance counters?
        
           | zokier wrote:
           | It depends. On AWS you can get "metal" instances where afaik
           | you get pretty much unrestricted access. In addition on
           | certain instance types/sizes you get access to virtualized
           | counters (vPMU). See Q11 here
           | https://github.com/intel/pcm/blob/master/doc/FAQ.md#q11 or
           | tables here https://www.intel.com/content/www/us/en/developer
           | /articles/t...
           | 
           | dunno about others
        
       | mzs wrote:
       | You have to use an instruction like cpuid with rdtsc so that the
       | TSC is not read before the loop terminates. There have been
       | changes to the Intel docs and there are more options now:
       | 
       | https://stackoverflow.com/a/58146426
       | 
       | Also in the bad old days SMM would interfere on some CPUs.
        
       ___________________________________________________________________
       (page generated 2024-10-01 23:02 UTC)