[HN Gopher] How AlphaChip transformed computer chip design
___________________________________________________________________
How AlphaChip transformed computer chip design
Author : isof4ult
Score : 175 points
Date : 2024-09-27 15:56 UTC (7 hours ago)
(HTM) web link (deepmind.google)
(TXT) w3m dump (deepmind.google)
| yeahwhatever10 wrote:
| Why do they keep saying "superhuman"? Algorithms are used for
| these tasks, humans aren't laying out trillions of transistors by
| hand.
| epistasis wrote:
| Google is good at many things, but perhaps their strongest
| skill is media positioning.
| jonas21 wrote:
| I feel like they're particularly bad at this, especially
| compared to other large companies.
| pinewurst wrote:
| Familiarity breeds contempt. They've been pushing the
| Google==Superhuman thing since the Internet Boom with
| declining efficacy.
| lordswork wrote:
| The media hates Google.
| epistasis wrote:
| It a love/hate relationship. Which benefits Google and the
| media greatly.
| jeffbee wrote:
| This is floorplanning the blocks, not every feature. We are
| talking dozens to hundreds of blocks, not billions-trillions of
| gates and wires.
|
| I assume that the human benchmark is a human using existing EDA
| tools, not a guy with a pocket protector and a roll of tape.
| yeahwhatever10 wrote:
| Floorplanning algorithms and solvers already exist https://li
| msk.ece.gatech.edu/course/ece6133/slides/floorplan...
| jeffbee wrote:
| The original paper from DeepMind evaluates what they are
| now calling AlphaChip versus existing optimizers, including
| simulated annealing. They conclude that AlphaChip
| outperforms them with much less compute and real time.
|
| https://www.cl.cam.ac.uk/~ey204/teaching/ACS/R244_2021_2022
| /...
| foobarian wrote:
| Randomized algorithms strike again!
| sudosysgen wrote:
| This is moreso amortized optimization/reinforcement
| learning, not randomized algorithms.
| hulitu wrote:
| > They conclude that AlphaChip outperforms them with much
| less compute and real time.
|
| Of course they do. I'm waiting for their products.
| fph wrote:
| My state-of-art bubblesort implementation is also superhuman at
| sorting numbers.
| xanderlewis wrote:
| Nice. Do you offer API access for a monthly fee?
| int0x29 wrote:
| I'll need 7 5 gigawatt datacenters in the middle of major
| urban areas or we might lose the Bubble Sort race with the
| Chinese.
| gattr wrote:
| Surely a 1.21-GW datacenter would suffice!
| therein wrote:
| Have we decided when are we deprecating it? I'm already
| cultivating another team in a remote location to work on
| a competing product that we will include into Google
| Cloud a month before deprecating this one.
| dgacmu wrote:
| Surely you'll be able to reduce this by getting TSMC to
| build new fabs to construct your new Bubble Sort
| Processors (BSPs).
| HPsquared wrote:
| Nice. Still true though! We are in the bubble sort era of AI.
| jayd16 wrote:
| "superhuman or comparable"
|
| What nonsense! XD
| thomasahle wrote:
| Believe it or not, but there was a time where algorithms were
| worse than humans at layout out transistors. In particular at
| the higher level design decisions.
| negativeonehalf wrote:
| Prior to AlphaChip, macro placement was done manually by human
| engineers in any production setting. Prior algorithmic methods
| especially struggled to manage congestion, resulting in chips
| that weren't manufacturable.
| mirchiseth wrote:
| I must be old because first thing I thought reading AlphaChip was
| why is deepmind talking about chips in DEC Alpha :-)
| https://en.wikipedia.org/wiki/DEC_Alpha.
| mdtancsa wrote:
| haha, same!
| sedatk wrote:
| I first used Windows NT on a PC with a DEC Alpha AXP CPU.
| dreamcompiler wrote:
| Looks like this is only about placement. I wonder if it can be
| applied to routing?
| amelius wrote:
| Exactly what I was thinking.
|
| Also: when is this coming to KiCad? :)
|
| PS: It would also be nice to apply a similar algorithm to graph
| drawing (e.g. trying to optimize for human readability instead
| of electrical performance).
| tdullien wrote:
| The issue is that in order to optimize for human readability
| you'll need a huge number of human evaluations of graphs?
| amelius wrote:
| Maybe start with minimization of some metric based on
| number of edge crossings, edge lengths and edge bends?
| hinkley wrote:
| TSMC made a point of calling out that their latest generation of
| software for automating chip design has features that allow you
| to select logic designs for TDP over raw speed. I think that's
| our answer to keep Dennard scaling alive in spirit if not in
| body. Speed of light is still going to matter, so physical
| proximity of communicating components will always matter, but I
| wonder how many wins this will represent versus avoiding thermal
| throttling.
| mikewarot wrote:
| I understand the achievement, but can't square it with my belief
| that uniform systolic arrays will prove to be the best general
| purpose compute engine for neural networks. Those are almost
| trivial to route, by nature.
| ilaksh wrote:
| Isn't this already the case for large portions of GPUs? Like,
| many of the blocks would be systolic arrays?
|
| I think the next step is arrays of memory-based compute.
| mikewarot wrote:
| Imagine a bit level systolic array. Just a sea of LUTs, with
| latches to allow the magic of graph coloring to remove all
| timing concerns by clocking everything in 2 phases.
|
| GPUs still treat memory as separate from compute, they just
| have wider bottlenecks than CPUs.
| pfisherman wrote:
| Questions for those in the know about chip design. How are they
| measuring the quality of a chip design? Does the metric that
| Google is reporting make sense? Or is it just something to make
| themselves look good?
|
| Without knowing much, my guess is that "quality" of a chip design
| is multifaceted and heavily dependent on the use case. That is
| the ideal chip for a data center would look very different from
| those for a mobile phone camera or automobile.
|
| So again what does "better" mean in the context of this
| particular problem / task.
| q3k wrote:
| This is just floorplanning, which is a problem with fairly well
| defined quality metrics (max speed and chip area used).
| Drunk_Engineer wrote:
| Oh man, if only it were that simple. A floorplanner has to
| guestimate what the P&R tools are going to do with the
| initial layout. That can be very hard to predict -- even if
| the floorplanner and P&R tool are from the same vendor.
| Drunk_Engineer wrote:
| I have not read the latest paper, but their previous work was
| really unclear about metrics being used. Researchers trying to
| replicate results had a hard time getting reliable
| details/benchmarks out of Google. Also, my recollection is that
| Google did not even compute timing, just wirelength and
| congestion; i.e. extremely primitive metrics.
|
| Floorplanning/placement/synthesis is a billion dollar industry,
| so if their approach were really revolutionary they would be
| selling the technology, not wasting their time writing blog
| posts about it.
| IshKebab wrote:
| > Floorplanning/placement/synthesis is a billion dollar
| industry
|
| Maybe all together, but I don't think automatic placement
| algorithms are a billion dollar industry. There's so much
| more to it than that.
| Drunk_Engineer wrote:
| Yes in combination. Customers generally buy these tools as
| a package deal. If the placer/floorplanner blows everything
| else out of the water, then a CAD vendor can upsell a lot
| of related tools.
| negativeonehalf wrote:
| The original paper reports P&R metrics (WNS, TNS, area,
| power, wirelength, horizontal congestion, vertical
| congestion) -
| https://www.nature.com/articles/s41586-021-03544-w
|
| (no paywall): https://www.cl.cam.ac.uk/~ey204/teaching/ACS/R2
| 44_2021_2022/...
| Drunk_Engineer wrote:
| From what I saw in the rebuttal papers, the Google cost-
| function is wirelength based. You can still get good TNS
| from that if your timing is very simplistic -- or if you
| choose your benchmark carefully.
| negativeonehalf wrote:
| They optimize using a fast heuristic based on wirelength,
| congestion, and density, but they evaluate with full P&R.
| It is definitely interesting that they get good timing
| without explicitly including it in their reward function!
| ilaksh wrote:
| How far are we from memory-based computing going from research
| into competitive products? I get the impression that we are
| already well passed the point where it makes sense to invest very
| aggressively to scale up experiments with things like memristors.
| Because they are talking about how many new nuclear reactors they
| are going to need just for the AI datacenters.
| HPsquared wrote:
| And think of the embedded applications.
| sroussey wrote:
| The problem is that the competition (our current von neumann
| architecture) has billions of dollars of R&D per year invested.
|
| Better architectures without the yearly investment train will
| no longer be better quite quickly.
|
| You would need to be 100x to 1000x better in order to pull the
| investment train onto your tracks.
|
| Don't has been impossible for decades.
|
| Even so, I think we will see such a change in my lifetime.
|
| AI could be that use case that has a strong enough demand pull
| to make it happen.
|
| We will see.
| ilaksh wrote:
| I think it's just ignorance and timidity on the part of
| investors. Memristor or memory-computing startups are surely
| the next trend in investing within a few years.
|
| I don't think it's necessarily demand or any particular
| calculation that makes things happen. I think people
| including investors are just herd animals. They aren't
| enthusiastic until they see the herd moving and then they
| want in.
| foota wrote:
| I don't think it's ignorant to not invest in something that
| has a decade long path towards even having a market, much
| less a large market.
| colesantiago wrote:
| A marvellous achievement from DeepMind as usual, I am quite
| surprised that Google acquired them for a significant discount of
| $400M, when I would have expected it to be in the range of $20BN,
| but then again Deepmind wasn't making any money back then.
| dharma1 wrote:
| it was very early. probably one of their all time best
| acquisitions in addition to YouTube.
|
| Re:using RL and other types AI assistance for chip design,
| Nvidia and others are doing this too
| sroussey wrote:
| Applied Semantics for $100m which gave them their advertising
| business seems like their best deal.
| amelius wrote:
| Can this be abstracted and generalized into a more generally
| applicable optimization method?
| loandbehold wrote:
| Every generation of chips is used to design next generation. That
| seems to be the root of exponential growth in Moore's law.
| bgnn wrote:
| That's wrong. Chip design and Moore's law have nothing to do
| with each other.
| smaddox wrote:
| To clarify what the parent is getting at: Moore's law is an
| observation about the density (and, really about the cost) of
| transistors. So it's about the fabrication process, not about
| the logic design.
|
| Practically speaking, though, maintaining Moore's law would
| have been economically prohibitive if circuit design and
| layout had not been automated.
| negativeonehalf wrote:
| Definitely a big part of it. Chips enable better EDA tools,
| which enable better chips. First it was analytic solvers and
| simulated annealing, now ML. Exciting times!
| red75prime wrote:
| I hope I'll still be alive when they'll announce AlephZero.
| idunnoman1222 wrote:
| So one other designer plus Google is using alpha chip for their
| layouts? - not sure on that title, call me when amd and nvidia
| are using it
| kayson wrote:
| I'm pretty sure Cadence and Synopsys have both released
| reinforcement-learning-based placing and floor planning tools.
| How do they compare...?
| hulitu wrote:
| They don't. You cannot compare reality (Cadence, Synopsys) with
| hype (Google).
| pelorat wrote:
| So you're basically saying that Google should have used
| existing tools to layout their chip designs, instead of their
| ML solution, and that these existing tools would have
| produced even better chips than the ones they are actually
| manufacturing?
| RicoElectrico wrote:
| Synopsys tools can use ML, but not for the layout itself,
| rather tuning variables that go into the physical design flow.
|
| > Synopsys DSO.ai autonomously explores multiple design spaces
| to optimize PPA metrics while minimizing tradeoffs for the
| target application. It uses AI to navigate the design-
| technology solution space by automatically adjusting or fine-
| tuning the inputs to the design (e.g., settings, constraints,
| process, flow, hierarchy, and library) to find the best PPA
| targets.
| cobrabyte wrote:
| I'd love a tool like this for PCB design/layout
| onjectic wrote:
| First thing my mind went to as well, I'm sure this is already
| being worked on, I think it would be more impactful than even
| this.
| bgnn wrote:
| why do you think that?
| bittwiddle wrote:
| Far more people / companies are designing PCBs than there
| are designing custom chips.
| foota wrote:
| I think the real value would be in ease of use. I imagine
| the top N chip creators represent a fair bit of the
| marginal value in pushing the state of the art forward.
| E.g., for hobbyists or small shops, there's likely not
| much value in tiny marginal improvements, but for the big
| ones it's worth the investment.
| abc-1 wrote:
| Why aren't they using this technique to design better transformer
| architectures or completely novel machine learning architectures
| in general? Are plain or mostly plain transformers really peak? I
| find that hard to believe.
| jebarker wrote:
| Because chip placement and the design of neural network
| architectures are entirely different problems, so this solution
| won't magically transfer from one to the other.
| abc-1 wrote:
| And AlphaGo is trained to play Go? The point is training a
| model through self play to build neural network
| architectures. If it can play Go and architect chip
| placements, I don't see why it couldn't be trained to build
| novel ML architectures.
| vighneshiyer wrote:
| This work from Google (original Nature paper:
| https://www.nature.com/articles/s41586-021-03544-w) has been
| credibly criticized by several researchers in the EDA CAD
| discipline. These papers are of interest:
|
| - A rebuttal by a researcher within Google who wrote this at the
| same time as the "AlphaChip" work was going on ("Stronger
| Baselines for Evaluating Deep Reinforcement Learning in Chip
| Placement"): http://47.190.89.225/pub/education/MLcontra.pdf
|
| - The 2023 ISPD paper from a group at UCSD ("Assessment of
| Reinforcement Learning for Macro Placement"):
| https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.p...
|
| - A paper from Igor Markov which critically evaluates the
| "AlphaChip" algorithm ("The False Dawn: Reevaluating Google's
| Reinforcement Learning for Chip Macro Placement"):
| https://arxiv.org/pdf/2306.09633
|
| In short, the Google authors did not fairly evaluate their RL
| macro placement algorithm against other SOTA algorithms: rather
| they claim to perform better than a human at macro placement,
| which is far short of what mixed-placement algorithms are capable
| of today. The RL technique also requires significantly more
| compute than other algorithms and ultimately is learning a
| surrogate function for placement iteration rather than learning
| any novel representation of the placement problem itself.
|
| In full disclosure, I am quite skeptical of their work and wrote
| a detailed post on my website: https://vighneshiyer.com/misc/ml-
| for-placement/
| s-macke wrote:
| When I first read about AlphaChip yesterday, my first question
| was how it compares to other optimization algorithms such as
| genetic algorithms or simulated annealing. Thank you for
| confirming that my questions are valid.
| gdiamos wrote:
| Criticism is an important part of the scientific process.
|
| Whichever approach ends up winning is improved by careful
| evaluation and replication of results
| jeffbee wrote:
| It seems like this is multiple parties pursuing distinct
| arguments. Is Google saying that this technique is applicable
| in the way that the rebuttals are saying it is not? When I read
| the paper and the update I did not feel as though Google
| claimed that it is general, that you can just rip it off and
| run it and get a win. They trained it to make TPUs, then they
| used it to make TPUs. The fact that it doesn't optimize
| whatever "ibm14" is seems beside the point.
| Workaccount2 wrote:
| To be fair, some of these criticisms are a few years old. Which
| normally would be fair game, but the progress in AI has been
| breakneck. Criticism of other AI tech from 2021 or 2022 are
| pretty dated today.
| jeffbee wrote:
| It certainly looks like the criticism at the end of the
| rebuttal that DeepMind has abandoned their EDA efforts is a
| bit stale in this context.
| porphyra wrote:
| The Deepmind chess paper was also criticized for unfair
| evaluation, as they were using an older version of Stockfish
| for comparison. Apparently, the gap between AlphaZero and that
| old version of Stockfish (about 50 elo iirc) was about the same
| as the gap between consecutive versions of Stockfish.
| nemonemo wrote:
| What is your opinion of the addendum? I think the addendum and
| the pre-trained checkpoint are the substance of the
| announcement, and it is surprising to see little mention of
| those here.
| negativeonehalf wrote:
| FD: I have been following this whole thing for a while, and
| know personally a number of the people involved.
|
| The AlphaChip authors address criticism in their addendum, and
| in a prior statement from the co-lead authors:
| https://www.nature.com/articles/s41586-024-08032-5 ,
| https://www.annagoldie.com/home/statement
|
| - The 2023 ISPD paper didn't pre-train at all. This means no
| learning from experience, for a learning-based algorithm. I
| feel like you can stop reading there.
|
| - The ISPD paper and the MLcontra paper both used much larger
| older technology node sizes, which have pretty different
| physical properties. TPU has a sub 10nm technology node size,
| whereas ISPD uses 45nm and 12nm. These are really different
| from a physical design perspective. Even worse, MLcontra uses a
| truly ancient benchmark with >100nm technology node size.
|
| Markov's paper just summarizes the other two.
|
| (Incidentally, none of ISPD / MLcontra / Markov were peer
| reviewed - ISPD 2023 was an invited paper.)
|
| There's a lot of other stuff wrong with the ISPD paper and the
| MLcontra paper - happy to go into it - and a ton of weird
| financial incentives lurking in the background. Commercial EDA
| companies do NOT want a free open-source tool like AlphaChip to
| take over.
|
| Reading your post, I appreciate the thoroughness, but it seems
| like you are too quick to let ISPD 2023 off the hook for
| failing to pre-train and using less compute. The code for pre-
| training is just the code for training --- you train on some
| chips, and you save and reuse the weights between runs. There's
| really no excuse for failing to do this, and the original
| Nature paper described at length how valuable pre-training was.
| Given how different TPU is from the chips they were evaluating
| on, they should have done their own pre-training, regardless of
| whether the AlphaChip team released a pre-trained checkpoint on
| TPU.
|
| (Using less compute isn't just about making it take longer -
| ISPD 2023 used half as many GPUs and 1/20th as many RL
| experience collectors, which may screw with the dynamics of the
| RL job. And... why not just match the original authors'
| compute, anyway? Isn't this supposed to be a reproduction
| attempt? I really do not understand their decisions here.)
| clickwiseorange wrote:
| Oh, man... this is the same old stuff from the 2023 Anna
| Goldie statement (is this Anna Goldie's comment?). This was
| all addressed by Kahng in 2023 - no valid criticisms. Where
| do I start?
|
| Kahng's ISPD 2023 paper is not in dispute - no established
| experts objected to it. The Nature paper is in dispute.
| Dozens of experts objected to it: Kahng, Cheng, Markov,
| Madden, Lienig, Swartz objected publically.
|
| The fact that Kahng's paper was invited doesn't mean it
| wasn't peer reviewed. I checked with ISPD chairs in 2023 -
| Kahng's paper was thoroughly reviewed and went through
| multiple rounds of comments. Do you accept it now? Would you
| accept peer-reviewed versions of other papers?
|
| Kahng is the most prominent active researcher in this field.
| If anyone knows this stuff, it's Kahng. There were also five
| other authors in that paper, including another celebrated
| professor, Cheng.
|
| The pre-training thing was disclaimed in the Google release.
| No code, data or instructions for pretraining were given by
| Google for years. The instructions said clearly: you can get
| results comparable to Nature without pre-training.
|
| The "much older technology" is also a bogus issue because the
| HPWL scales linearly and is reported by all commercial tools.
| Rectangles are rectangles. This is textbook material. But
| Kahng etc al prepared some very fresh examples, including
| NVDLA, with two recent technologies. Guess what, RL did
| poorly on those. Are you accepting this result?
|
| The bit about financial incentives and open-source is
| blatantly bogus, as Kahng leads OpenROAD - the main open-
| source EDA framework. He is not employed by any EDA
| companies. It is Google who has huge incentives here, see
| Demis Hassabis tweet "our chips are so good...".
|
| The "Stronger Baselines" matched compute resources exactly.
| Kahng and his coauthors performed _fair comparisons_ between
| annealing and RL, giving the same resources to each. Giving
| greater resources is unlikely to change results. This was
| thoroughly addressed in Kahng 's FAQ - if you only could read
| that.
|
| The resources used by Google were huge. Cadence tools in
| Kahng's paper ran hundreds times faster and produced better
| results. That is as conclusive as it gets.
|
| It doesn't take a Ph.D. to understand fair comparisons.
| negativeonehalf wrote:
| For AlphaChip, pre-training is just training. You train,
| and save the weights in between. This has always been
| supported by the Google's open-source repository. I've read
| Kahng's FAQ, and he fails to address this, which is
| unsurprising, because there's simply no excuse for cutting
| out pre-training for a learning-based method. In his setup,
| every time AlphaChip sees a new chip, he re-randomizes the
| weights and makes it learn from scratch. This is obviously
| a terrible move.
|
| HPWL (half-perimeter wirelength) is an approximation of
| wirelength, which is only one component of the chip
| floorplanning objective function. It is relatively easy to
| crunch all the components together and optimize HPWL ---
| minimizing actual wirelength while avoiding congestion
| issues is much harder.
|
| Simulated annealing is good at quickly converging on a bad
| solution to the problem, with relatively little compute. So
| what? We aren't compute-limited here. Chip design is a
| lengthy, expensive process where even a few-percent
| wirelength reduction can be worth millions of dollars. What
| matters is the end result, and ML has SA beat.
|
| (As for conflict of interest, my understanding is Cadence
| has been funding Kahng's lab for years, and Markov's
| LinkedIn says he works for Synopsis. Meanwhile, Google has
| released a free, open-source tool.)
| clickwiseorange wrote:
| It's not that one needs an excuse. The Google CT repo
| said clearly you don't need to pretrain. "supported"
| usually includes at least an illustration - no such thing
| there before Kahng's paper.
|
| Everything optimized in Nature RL is an approximation.
| HPWL is where you start, and RL uses it in the objective
| function too. As shown in "Stronger Baselines", RL loses
| a lot by HPWL - so much that nothing else can save it. If
| your wires are very long, you need routing tracks to
| route them, and you end up with congestion too.
|
| SA consistently produces better solutions than RL for
| various time budgets. That's what matters. Both papers
| have shown that SA produces competent solutions. You give
| SA more time, you get better solutions. In a fair
| comparison, you give equal budgets to SA and RL. RL
| loses. This was confirmed using Google's RL code and two
| independent SA implementations, on many circuits. Very
| definitively. No, ML did not have SA beat - please read
| the papers.
|
| Cadence hasn't funded Kahng for a long time. In fact,
| Google funded Kahng more recently, so he has all the
| incentives to support Google. Markov's LinkedIn page says
| he worked at Google before. Even Chatterjee, of all
| people, worked at Google.
|
| Google's open-source tool is a head fake, it's
| practically unusable.
|
| Update: I'll respond to the next comment here since
| there's no Reply button.
|
| 1. The Nature paper said one thing, the code did
| something else, as we've discovered. The RL method does
| some training as it goes. So, pre-training is not the
| same as training. Hence "pre". Another problem with
| pretraining in Google work is data contamination - we
| can't compare test and training data. The Google folks
| admitted to training and testing on different versions of
| the same design. That's bad. Rejection-level bad.
|
| 2. HPWL is indeed a nice simple objective. So nice that
| Jeff Dean's recent talks use it. And all commercial
| placers without exception optimize it and report it. All
| EDA publications report it. Google's RL optimized HPWL +
| density + congestion
|
| 3. This shows you aren't familiar with EDA. Simulated
| Annealing was the king of placement from mid 1980s to mid
| 1990s. Most chips were placed by SA. But you don't have
| to go far - as I recall, the Nature paper says they used
| SA to postprocess macro placements.
| negativeonehalf wrote:
| The Nature paper describes the importance of pre-training
| repeatedly. The ability to learn from experience is the
| whole point of the method. Pre-training is just training
| and saving the weights -- this is ML 101.
|
| I'm glad you agree that HPWL is a proxy metric.
| Optimizing HPWL is a fun applied math puzzle, but it's
| not chip design.
|
| I am unaware of a single instance of someone using SA to
| generate real-world, usable macro layouts that were
| actually taped out, much less for modern chip design, in
| part due to SA's struggles to manage congestion,
| resulting in unusable layouts. SA converges quickly to a
| bad solution, but this is of little practical value.
| sijnapq wrote:
| Google can shove open-source in its a$$! Evidence will
| appear gradually. Tell your wife to be ready.
| sijnapq wrote:
| Gabe. Shut up! Your field is not even hardware.
| smokel wrote:
| I don't really understand all the fuss about this particular
| paper. Nearly _all_ papers on AI techniques are pretty much
| impossible to reproduce, due to details that the authors don 't
| understand or are trying to cover up.
|
| This is what you get if you make academic researchers compete
| for citation counts.
|
| Pretraining seems to be an important aspect here, and it makes
| sense that such pretraining requires good examples, which
| unfortunately for the free lunch people, is not available to
| the public.
|
| That's what you get when you let big companies do fundamental
| research. Would it be better if the companies did not publish
| anything about their research at all?
|
| It all feels a bit unproductive to attack one another.
| bsder wrote:
| EDA claims in the digital domain are fairly easy to evaluate.
| Look at the picture of the layout.
|
| When you see a chip that has the datapath _identified_ and laid
| out properly by a computer algorithm, you 've got something. If
| not, it's vapor.
|
| So, if your layout still looks like a random rat's nest? Nope.
|
| If even a random person can see that your layout actually
| follows the obvious symmetric patterns from bit 0 to bit 63,
| maybe you've got something worth looking at.
|
| Analog/RF is a little tougher to evaluate because the smaller
| number of building blocks means you can use Moore's Law to
| brute force things much more exhaustively, but if things "looks
| pretty" then you've got something. If it looks weird, you
| don't.
| glitchc wrote:
| That doesn't mean the fabricated netlist doesn't work. I'm
| not supporting Google by any means, but the test should be:
| Does it fabricate and function as intended? If not, clearly
| gibberish. If so, we now have computers building computers,
| which is one step closer to SkyNet. The truth is probably
| somewhere in between. But even if some of the samples, with
| the terrible layouts, are actually functional, then we might
| learn something new. Maybe the gibberish design has reduced
| crosstalk, which would be fascinating.
| lordswork wrote:
| Some interesting context on this work: 2 researchers were bullied
| to the point of leaving Google for Anthropic by a senior
| researcher (who has now been terminated himself):
| https://www.wired.com/story/google-brain-ai-researcher-fired...
|
| They must feel vindicated by their work turning out to be so
| fruitful now.
| FrustratedMonky wrote:
| So AI designing it's own chips. Now that is moving towards
| exponential growth. Like at the end of "Colossus" the movie.
|
| Forget LLM's. What DeepMind is doing seems more like how an AI
| will rule, in the world. Building real world models, and applying
| game logic like winning.
|
| LLM's will just be the text/voice interface to what DeepMind is
| building.
| DrNosferatu wrote:
| Yet, their "frontier" LLM lags all the others...
| 7e wrote:
| Did it, though? Google's chips still aren't very good compared
| with competitors.
| ninetyninenine wrote:
| What occupation is there that is purely intellectual that has no
| chance of an AI ever progressing to a point where it can take it
| over?
| alexyz12 wrote:
| anything that needs very real-time info. AI's will always be
| limited by us feeding them info, or them collecting it
| themselves. But humans can travel to more places than an AI
| can, until robots are everywhere too I suppose
| QuadrupleA wrote:
| How good are TPUs in comparison with state of the art Nvidia
| datacenter GPUs, or Groq's ASICs? Per watt, per chip, total cost,
| etc.? Is there any published data?
| jeffbee wrote:
| MLPerf is a good place to start. The only problem is you don't
| have any verifiable information about TPU energy consumption.
| https://mlcommons.org/benchmarks/inference-datacenter/
| wslh wrote:
| I have some company notes from early 2024 which cannot be
| accurate but could help,
|
| TPU v5e [1]: not available for purchase, only through GCP,
| storage=5B, LLM-Model=7B, efficiency=393TFLOP.
|
| [1] https://cloud.google.com/tpu/docs/v5e
| ur-whale wrote:
| Seems to me the article is claiming a lot of things, but is very
| light on actual comparisons that matter to you and me, namely:
| how does one of those fabled AI-designed chop compare to their
| competition ?
|
| For example, how much better are these latest gen TPU's when
| compared to NVidia's equivalent offering ?
| thesz wrote:
| Eurisco [1], if I remember correctly, was once used to perform
| placement-and-route task and was pretty good at it.
|
| [1] https://en.wikipedia.org/wiki/Eurisko
|
| What's more, Eurisco was then used in designing Traveler TCS'
| game fleet of battle spaceships. And Eurisco used symmetry-based
| placement learned from VLSI design in the design of the
| spaceships' fleet.
|
| Can AlphaChip's heuistics be used anywhere else?
___________________________________________________________________
(page generated 2024-09-27 23:00 UTC)