[HN Gopher] Partitioning in the Chiplet Era
       ___________________________________________________________________
        
       Partitioning in the Chiplet Era
        
       Author : rbanffy
       Score  : 39 points
       Date   : 2024-10-06 11:31 UTC (11 hours ago)
        
 (HTM) web link (semiengineering.com)
 (TXT) w3m dump (semiengineering.com)
        
       | yatrios wrote:
       | I find this new path pretty fascinating. Have there been any
       | recent advancements in terms of the signal integrity issue when
       | partitioning these designs? To me these chiplets currently seem
       | to still be very proof of concept and I'm not sure of how
       | feasible this is in large scale designs. Could someone care to
       | clarify?
        
         | trynumber9 wrote:
         | AMD has been shipping billions of dollars of "chiplet" GPGPUs
         | by the name of MI300A and MI300X.
         | 
         | So I think they're beyond the experimental phase.
        
           | latchkey wrote:
           | As someone deploying production supercomputers using those
           | chips, I can't agree with you more.
        
             | Numerlor wrote:
             | From their die size and power usage relative to the perf
             | compared to nvidia it's clear they didn't hit the goals
             | they wanted to with navi 31&32 and it's definitely because
             | of the chiplet design.
             | 
             | I don't have any experiences data center wise but consumer
             | side the Navi 31 7900 XTX is also a bit of a temperamental
             | gpu, but don't know how much of that is on the silicon and
             | how much of it is software.
             | 
             | Though it is clear that some form of chiplets will have to
             | be used as building large portions of the chips on cutting
             | edge nodes with higher failure rates will just become more
             | expensive as time goes on. More so with parts of the chips
             | that just don't scale down with nodes anymore anyway.
        
               | latchkey wrote:
               | I ran 150,000 AMD gpus previously for an ethereum mining
               | operation. We ran them on edge of crashing, individually
               | tuned to their highest clock/lowest voltage.
               | 
               | I can definitively say they are ALL snowflakes. Every
               | single one. Wide variance in each chip and across batches
               | as well. We had them OEM placed onto the boards too.
               | Combine that with the OEM and their batches as well,
               | there was variance in that too. Then, it went down to
               | even the datacenter and psu's, and how clean the power
               | was.
               | 
               | I actually started to collect all the data around where
               | they were cut from the wafer, but never got a chance to
               | process it and correlate it to their performance. There
               | was a running theory that the edge chips were not as
               | good.
        
         | lizknope wrote:
         | I've been in integrated circuit physical design for almost 30
         | years.
         | 
         | What signal integrity issue are you referring to? For on chip
         | nets we have SI issues from cross coupling capacitance. For the
         | last 25+ years the routers will try to move these nets apart
         | and jump layers to avoid long cross coupled nets. The RC
         | extraction tools have supported extraction of cross coupled
         | nets and all of the delay calculators and static timing
         | analysis tools to analyze victim / aggressor net coupling and
         | filter out irrelevant nets if they don't switch in the same
         | timing windows.
         | 
         | Chiplets are combining multiple chips in the same package. I
         | had a Pentium Pro from 1996 that did that. In the last 5 years
         | chip packaging technology has continued to advance and we are
         | stacking dies and more within the same package.
         | 
         | SI between chips is not a new issue. There are tools to analyze
         | that as well.
        
       ___________________________________________________________________
       (page generated 2024-10-06 23:01 UTC)