[HN Gopher] Dynamic Branch Prediction with Perceptrons (2000) [pdf]
___________________________________________________________________
Dynamic Branch Prediction with Perceptrons (2000) [pdf]
Author : luu
Score : 40 points
Date : 2023-05-10 08:20 UTC (14 hours ago)
(HTM) web link (www.cs.cmu.edu)
(TXT) w3m dump (www.cs.cmu.edu)
| GaggiX wrote:
| You should add "(2000)"
| jacknews wrote:
| It's an interesting idea.
|
| Did this go anywhere? Is it worth revisiting in 2023?
| w1nk wrote:
| Yes, it did:
| https://chasethedevil.github.io/post/the_neural_network_in_y...
| peterfirefly wrote:
| That is a* very, very important paper. The answer is yes.
|
| (The guy who posted the link is Dan Luu who used to work for a
| company that made x86 CPUs. His blog is worth looking at. He
| also used to work for Twitter -- but his predictions about the
| Doom and Disaster after Musk's takeover hasn't panned out at
| all. Good at some things (CPU architecture and blog posts). Not
| so good at Elon Musk predictions ;) )
|
| --- Edit: put in the missing "a". My fingers and my brain don't
| always agree on what to write.
| jackmott42 wrote:
| [flagged]
| ftxbro wrote:
| Maybe or maybe not, but they manually put it onto the front
| page using https://news.ycombinator.com/pool
| gharzol wrote:
| [dead]
| Tommstein wrote:
| I implemented this in a CPU simulator for a computer architecture
| class back in grad school, and I believe I couldn't reproduce
| their prediction success rate.
| delta_p_delta_x wrote:
| Modern CPUs use TAGE-like[1] and Perceptron-based branch
| predictors, achieving a 99.7% prediction accuracy. Zen 2 in
| particular uses TAGE[2].
|
| I recall implementing one in software in a computer architectures
| class; it was pretty gnarly but the prediction accuracy (and
| therefore performance) compared to a simple two-bit saturating
| counter is immense.
|
| [1]: https://doi.org/10.1145/3226098 [2]:
| https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Br...
| bjourne wrote:
| Modern branch predictors are based on relatively simple state
| machines and already have >95% accuracy. Thus, even if machine
| learning-based predictors can sometimes beat them, it is not
| clear that they can beat them enough for the much more
| complicated circuitry they need to be worth it.
| mirker wrote:
| One thing to point out is that the threshold of predictor
| complexity is dependent on the execution pipeline. A very
| speculative and deep architecture has a bigger need for better
| predictors, since it has a massive penalty when there is a
| misprediction.
| albertzeyer wrote:
| But is it really much more complicated? It looks very simple to
| me. Maybe actually even simpler than such state machine?
| thesz wrote:
| There are several multipliers and adders (possible, floating
| point ones), a register file for perceptron weights and
| perceptron selection logic. The scheme also need to keep
| information somewhere for training - the decision and
| backpropagation stimulus are separeted in time.
|
| So the circuitry is complicated despite superficial
| simplicity of the model.
| isotypic wrote:
| The version in the paper only is ever multiplying by 1 or
| -1, so you do not need a full multiplier circuit. The
| weights are also stored as signed integers, not floats, so
| no complicated floating point circuity. I am not sure what
| current state of the art is, but considering the cost of
| multipliers/floating point circuity I would be surprised if
| it changed to those if signed integers work.
|
| All branch predictors need some way of storing their state
| and selection logic, and the way a perceptron branch
| predictor stores its data is just a big table indexed by
| some hash of the program counter of the branch, which is
| pretty standard for branch predictors. Also, all branch
| predictors have a sort of "backpropogation" in that
| pipelined processors produce the actual result of the
| branch (possibly many) cycles later, so this also is not as
| much of a factor. Since the training is a function of the
| weights you do not need to store extra data beyond the
| threshold, but that is already being computed as the
| prediction anyways.
| artisanspam wrote:
| As someone who works on design verification for modern CPUs,
| this is not true. Most modern CPUs use some form of TAGE branch
| prediction, if not something more complex.
| titzer wrote:
| > relatively simple state machines
|
| To be clear, all hardware branch predictors are "relatively
| simple state machines"; they need storage in the branch
| prediction tables which must super-fast to access, which means
| they can only store a few (sometimes dozens, but certainly not
| hundreds) bits per branch to reach the access latency goal.
| With input to the predictor encoded as binary, the weights
| quantized and small and encoded into binary, and the history
| small, even perceptrons are "relatively simple state machines".
| After all, their implementation is just going to become some
| combinatorial logic in the end.
| p_l wrote:
| AMD been using perceptrons in their branch prediction logic
| since I think K6-2.
|
| Some updates to it were big point of "what we did in Zen"
| presentations when first Ryzen and EPYC CPUs landed.
| pfdietz wrote:
| But what's the impact of those 5% misses?
| gpderetta wrote:
| Huge. I think 95% was table stakes 30 years ago. IIRC the
| average rate of a good predictor these days is 99.5%.
| kordlessagain wrote:
| bot> Querying GPT...
|
| bot> A 5% miss rate in branch prediction leads to a 14.7%
| improvement in misprediction rates on a trace of SPEC2000
| benchmarks compared to the gshare predictor. The use of
| machine learning-based predictors has the potential to
| improve these results further.
| kordlessagain wrote:
| Using my DocGPT project to process this...
|
| system> What are the microarchitectural tricks that allow
| prediction to take place in one clock cycle?
|
| bot> The microarchitectural tricks include using the branch
| address to hash and select a perceptron from the table,
| calculating the dot product of the perceptron and the global
| history register, using the training algorithm to update the
| weights in the perceptron, and writing the updated perceptron
| back to the table.
___________________________________________________________________
(page generated 2023-05-10 23:01 UTC)