[HN Gopher] Dynamic Branch Prediction with Perceptrons (2000) [pdf]
       ___________________________________________________________________
        
       Dynamic Branch Prediction with Perceptrons (2000) [pdf]
        
       Author : luu
       Score  : 40 points
       Date   : 2023-05-10 08:20 UTC (14 hours ago)
        
 (HTM) web link (www.cs.cmu.edu)
 (TXT) w3m dump (www.cs.cmu.edu)
        
       | GaggiX wrote:
       | You should add "(2000)"
        
       | jacknews wrote:
       | It's an interesting idea.
       | 
       | Did this go anywhere? Is it worth revisiting in 2023?
        
         | w1nk wrote:
         | Yes, it did:
         | https://chasethedevil.github.io/post/the_neural_network_in_y...
        
         | peterfirefly wrote:
         | That is a* very, very important paper. The answer is yes.
         | 
         | (The guy who posted the link is Dan Luu who used to work for a
         | company that made x86 CPUs. His blog is worth looking at. He
         | also used to work for Twitter -- but his predictions about the
         | Doom and Disaster after Musk's takeover hasn't panned out at
         | all. Good at some things (CPU architecture and blog posts). Not
         | so good at Elon Musk predictions ;) )
         | 
         | --- Edit: put in the missing "a". My fingers and my brain don't
         | always agree on what to write.
        
           | jackmott42 wrote:
           | [flagged]
        
         | ftxbro wrote:
         | Maybe or maybe not, but they manually put it onto the front
         | page using https://news.ycombinator.com/pool
        
       | gharzol wrote:
       | [dead]
        
       | Tommstein wrote:
       | I implemented this in a CPU simulator for a computer architecture
       | class back in grad school, and I believe I couldn't reproduce
       | their prediction success rate.
        
       | delta_p_delta_x wrote:
       | Modern CPUs use TAGE-like[1] and Perceptron-based branch
       | predictors, achieving a 99.7% prediction accuracy. Zen 2 in
       | particular uses TAGE[2].
       | 
       | I recall implementing one in software in a computer architectures
       | class; it was pretty gnarly but the prediction accuracy (and
       | therefore performance) compared to a simple two-bit saturating
       | counter is immense.
       | 
       | [1]: https://doi.org/10.1145/3226098 [2]:
       | https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Br...
        
       | bjourne wrote:
       | Modern branch predictors are based on relatively simple state
       | machines and already have >95% accuracy. Thus, even if machine
       | learning-based predictors can sometimes beat them, it is not
       | clear that they can beat them enough for the much more
       | complicated circuitry they need to be worth it.
        
         | mirker wrote:
         | One thing to point out is that the threshold of predictor
         | complexity is dependent on the execution pipeline. A very
         | speculative and deep architecture has a bigger need for better
         | predictors, since it has a massive penalty when there is a
         | misprediction.
        
         | albertzeyer wrote:
         | But is it really much more complicated? It looks very simple to
         | me. Maybe actually even simpler than such state machine?
        
           | thesz wrote:
           | There are several multipliers and adders (possible, floating
           | point ones), a register file for perceptron weights and
           | perceptron selection logic. The scheme also need to keep
           | information somewhere for training - the decision and
           | backpropagation stimulus are separeted in time.
           | 
           | So the circuitry is complicated despite superficial
           | simplicity of the model.
        
             | isotypic wrote:
             | The version in the paper only is ever multiplying by 1 or
             | -1, so you do not need a full multiplier circuit. The
             | weights are also stored as signed integers, not floats, so
             | no complicated floating point circuity. I am not sure what
             | current state of the art is, but considering the cost of
             | multipliers/floating point circuity I would be surprised if
             | it changed to those if signed integers work.
             | 
             | All branch predictors need some way of storing their state
             | and selection logic, and the way a perceptron branch
             | predictor stores its data is just a big table indexed by
             | some hash of the program counter of the branch, which is
             | pretty standard for branch predictors. Also, all branch
             | predictors have a sort of "backpropogation" in that
             | pipelined processors produce the actual result of the
             | branch (possibly many) cycles later, so this also is not as
             | much of a factor. Since the training is a function of the
             | weights you do not need to store extra data beyond the
             | threshold, but that is already being computed as the
             | prediction anyways.
        
         | artisanspam wrote:
         | As someone who works on design verification for modern CPUs,
         | this is not true. Most modern CPUs use some form of TAGE branch
         | prediction, if not something more complex.
        
         | titzer wrote:
         | > relatively simple state machines
         | 
         | To be clear, all hardware branch predictors are "relatively
         | simple state machines"; they need storage in the branch
         | prediction tables which must super-fast to access, which means
         | they can only store a few (sometimes dozens, but certainly not
         | hundreds) bits per branch to reach the access latency goal.
         | With input to the predictor encoded as binary, the weights
         | quantized and small and encoded into binary, and the history
         | small, even perceptrons are "relatively simple state machines".
         | After all, their implementation is just going to become some
         | combinatorial logic in the end.
        
         | p_l wrote:
         | AMD been using perceptrons in their branch prediction logic
         | since I think K6-2.
         | 
         | Some updates to it were big point of "what we did in Zen"
         | presentations when first Ryzen and EPYC CPUs landed.
        
         | pfdietz wrote:
         | But what's the impact of those 5% misses?
        
           | gpderetta wrote:
           | Huge. I think 95% was table stakes 30 years ago. IIRC the
           | average rate of a good predictor these days is 99.5%.
        
           | kordlessagain wrote:
           | bot> Querying GPT...
           | 
           | bot> A 5% miss rate in branch prediction leads to a 14.7%
           | improvement in misprediction rates on a trace of SPEC2000
           | benchmarks compared to the gshare predictor. The use of
           | machine learning-based predictors has the potential to
           | improve these results further.
        
       | kordlessagain wrote:
       | Using my DocGPT project to process this...
       | 
       | system> What are the microarchitectural tricks that allow
       | prediction to take place in one clock cycle?
       | 
       | bot> The microarchitectural tricks include using the branch
       | address to hash and select a perceptron from the table,
       | calculating the dot product of the perceptron and the global
       | history register, using the training algorithm to update the
       | weights in the perceptron, and writing the updated perceptron
       | back to the table.
        
       ___________________________________________________________________
       (page generated 2023-05-10 23:01 UTC)