[HN Gopher] Xilinx HBM2 Internals (2023)
___________________________________________________________________
Xilinx HBM2 Internals (2023)
Author : hasheddan
Score : 67 points
Date : 2024-05-09 09:08 UTC (13 hours ago)
(HTM) web link (lovehindpa.ws)
(TXT) w3m dump (lovehindpa.ws)
| willis936 wrote:
| I'm not an expert on memory interfaces. How do you use HBM2's
| 1024-bit interface when you have ~200 I/O on a zynq ultrascale+?
| Are these psuedo-channels a SerDes for the HBM2 bus?
| someguydave wrote:
| Look at the (non-Zynq) VCU128 board for an example. The HBM2 is
| on the PL side, and the interconnect is via a die-to-die
| interface. So the 32 AXI3 interfaces to HBM2 here are hard
| silicon, not FPGA I/O pins.
| huntero wrote:
| The HBM stacks are on-package for these parts, so you don't
| have to use any external I/O to interface with them.
|
| You end up with a similar challenge accessing that much
| bandwidth internally from your FPGA logic though, it looks like
| the Xilinx HBM IP presents a set of 16 or 32 separate AXI
| interfaces, each of which gives you about 14.4GB/s of bandwidth
| (https://docs.amd.com/r/en-US/pg276-axi-hbm/Introduction).
| TacticalCoder wrote:
| > Conventions
|
| > MiB = Megabytes (2^20 bytes)
|
| > Gb = Gigabits (2^27 bytes, or 128MiB)
|
| > GiB = Gibibytes (2^30 bytes)
|
| Shouldn't MiB be Mebibytes then?
| pclmulqdq wrote:
| I wonder if the author is doing anything to overclock the HBM
| here or if this is within the ratings of the Samsung HBM stacks.
| It's nice to be able to do this when you have a few cards, but if
| you are working with hundreds, it may not be practical to push
| the HBM this far without overvolting them a bit.
| latchkey wrote:
| I automated the tuning of 150k gpus that were being used to
| mine ethereum.
|
| The trick was that as a whole, you knew the limits of the
| hardware. You know how to set the knobs to max performance. Due
| to the silicon lottery, cards that can't perform at max end up
| crashing.
|
| So what I did was kind of the opposite of what everyone else
| was doing. I first set everything at max, watched for a crash,
| then tuned the knobs to be a bit lower. All of this was done
| with an automated piece of software that I built. The cards we
| used essentially had 3 knobs to twist, which resulted in
| hundreds of combinations. Eventually, the cards stop crashing,
| so you're at the right settings, for that individual piece of
| hardware.
|
| We were running in seasonal climates too... so each
| winter/summer, I'd reset things and let it auto tune back
| again. Heat plays a huge factor on stability.
|
| Of course, each workload has different settings too... so that
| plays into it, but if everything else is static, this ended up
| being a great way to do things.
| rowanG077 wrote:
| That seems great if a failure always results in a crash.
| There are a ton of failure modes where your result will just
| spuriously be wrong.
| latchkey wrote:
| To my knowledge, HPC rarely tunes cards for max
| performance. My MI300x are stock settings and I doubt I'll
| ever modify them.
| pclmulqdq wrote:
| Interesting, I generally assumed Eth miners would undervolt
| their GPUs to get more life out of them rather than
| overclocking them for absolute max performance.
| latchkey wrote:
| Undervolt / overclock / memory timings
| Wolf9466 wrote:
| Author here. I did overclock it - that was one of the points of
| the writeup: when you modify the memory clock, you should
| change the timings along with, because they are often specified
| in tCK (ticks of the memory clock), and as such, they will
| change when the clock changes.
|
| I have reliable information from folks with several thousand of
| these FPGAs that they reliably clock to 1100Mhz - 1150Mhz on
| the HBM2 at stock voltage (or a bit less.) This falls in line
| with my personal experiences - I have seven XCVU35P FPGAs, and
| they range from doing only 1100Mhz to 1150Mhz, to some handling
| 1200Mhz.
|
| Samsung's documentation specifies this HBM2 for 1000Mhz to
| 1100Mhz, based on binning - this is why I was annoyed that
| Xilinx limited it to 900Mhz, and worked to learn how to change
| the PLL settings.
| pclmulqdq wrote:
| I am also aware the Xilinx sets their own clock specs
| annoyingly conservatively, and I think they do it to preserve
| device lifetime or something similar. However, I did want to
| clarify whether you were overvolting these things or just
| raising the clock frequency.
|
| I have run into issues where you do get a dud FPGA that is
| just a lot slower than other FPGAs of its speed bin (it must
| have come from the edge of the wafer or something), and
| debugging that is pretty annoying.
| akira2501 wrote:
| I feel like domains are pretty cheap so it would be easy to
| separate your fetishes from your work life.
| doctor_eval wrote:
| I made the mistake of looking at the gallery. NSFW.
| formerly_proven wrote:
| There are no mistakes, just happy little accidents.
| doctor_eval wrote:
| You're right, I shouldn't have said mistake.
|
| The context switch nearly gave me whiplash, tho.
___________________________________________________________________
(page generated 2024-05-09 23:01 UTC)