[HN Gopher] Intel's Battlemage Architecture
___________________________________________________________________
Intel's Battlemage Architecture
Author : ksec
Score : 132 points
Date : 2025-02-11 16:00 UTC (6 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| netbioserror wrote:
| A lot of commentators have pointed out that Intel is reaching
| nowhere near the performance/mm2 of Nvidia or AMD designs, though
| contrary to what I thought that might imply, it seems that power
| consumption is very much under control on Battlemage. So it seems
| the primary trade-off here is on the die cost.
|
| Can anyone explain what might be going on here, especially as it
| relates to power consumption? I thought (bigger die ^ bigger
| wires -> more current -> higher consumption).
| tonetegeatinst wrote:
| It mainly seems to boil down to design choice and process
| technology.
|
| They might be targeting a lower power density per squad mm than
| compared to amd or nvidia, focusing more on lower power levels.
|
| Instruction set architecture and layout of the chips and PCB
| also factor into this as well.
| MisterTea wrote:
| > I thought (bigger die ^ bigger wires -> more current ->
| higher consumption).
|
| I am not a semi expert but bigger die doesn't mean bigger wires
| if you are referring to cross-section, the wires would be
| thinner meaning less current. Power is consumed pushing and
| pulling electrons from the transistor gates which are all of
| the FET type, field effect transistor. The gate is a capacitor
| that needs to be charged to open the gate to allow current to
| flow through the transistor. discharging the gate closes it.
| That current draw then gets multiplied by a few billion gates
| so you can see where the load comes from.
| williamDafoe wrote:
| Actually the wires don't scale down like the transistors do.
| I remember in graduate school taking VLSI circuit complexity
| theory and the conclusion was for two dimensional circuits
| the wires will end Moore's Law. However I've seen articles
| about backside power delivery and they are already using
| seven+ layers so the wires are going through three dimensions
| now. Copper interconnects were a one-time bonus in the late
| 90s and after that wires just don't scale down-signal delay
| would go up too fast. Imagine taking a city with all the
| streets and houses and the houses now become the size of dog
| houses but you can't shrink the streets they have to stay the
| same size to carry signals quickly!
| gruez wrote:
| >I thought (bigger die ^ bigger wires -> more current -> higher
| consumption).
|
| All things being equal, a bigger die would result in more power
| consumption, but the factor you're not considering is the
| voltage/frequency curve. As you increase the frequency, you
| also need to up the voltage. However, as you increase voltage,
| there's diminishing returns to how much you can increase the
| frequency, so you end up massively increasing power consumption
| to get minor performance gains.
| wmf wrote:
| If it's a similar number of transistors on a larger die then I
| can believe the power consumption is good. Less dense layout
| probably requires less design effort and may reduce hotspots.
|
| If Intel is getting similar performance from more transistors
| that could be caused by extra control logic from a 16-wide core
| instead of 32.
| kimixa wrote:
| Increasing clocks tends to have a greater-than-linear cost on
| power, as you need transistors to switch quicker so often need
| a higher voltage, which causes more leakage and other losses on
| top of the switching cost itself (that all turn into heat).
| Higher clock targets also have a cost for the design itself,
| often needing more transistors for things like extra redrivers
| to ensure you get fast switching speed, or even things like
| more pipeline stages. Plus not all area is "transistors" - it's
| often easier to place related units that need a _lot_ of
| interconnectivity with shorter interconnects if an adjacent,
| less interconnected unit isn 't also trying to be packed into
| much of the same space, routing on modern chips is _really_
| difficult (and a place where companies can really differentiate
| by investing more).
|
| For tasks that tend to scale well with increased die area,
| which is often the case for GPUs as they're already focused on
| massively parallel tasks so laying down more parallel units is
| a realistic option, running a larger die at lower clocks is
| often notably more efficient in terms of performance per unit
| power.
|
| For GPUs generally that's just part of the pricing and cost
| balance, a larger lower clocked die would be more efficient,
| but would that really sell for as much as the same die clocked
| _even higher_ to get peak results?
| netbioserror wrote:
| >For tasks that tend to scale well with increased die area,
| which is often the case for GPUs as they're already focused
| on massively parallel tasks so laying down more parallel
| units is a realistic option, running a larger die at lower
| clocks is often notably more efficient in terms of
| performance per unit power.
|
| I should've considered this, I have an RTX A5000. It's a
| gigantic GA102 die (3090, 3080) that's underclocked to 230W,
| putting it at roughly 3070 throughput. That's ~15% less
| performance than a 3090 for a ~35% power reduction.
| Absolutely nonlinear savings there. Though some of that may
| have to do with power savings using GDDR6 over GDDR6X.
|
| (I should mention that relative performance estimates are all
| over the place, by some metrics the A5000 is ~3070, by others
| it's ~3080.)
| bgnn wrote:
| Yeah the power consumption scales, to first order, with
| Vdd^2 (square of power supply voltage) but performance
| scales with Vdd. Though you cannot simply reduce the Vdd
| and clock rate and do more pipelining etc to gain back the
| performance. If you are willing to back off on performance
| a bit you can gain hugely on power. Plus thermal management
| of it is more manageable.
| bloomingkales wrote:
| They are holding back the higher vram models of this card. GPU
| makers always do some nerfing of their cards in the same
| product line. Often times there's no good reason for this other
| than they found specs that they can market and sell simply by
| moving voltages around.
|
| Anyway, expecting good earnings throughout the year as they use
| Battlemage sales to hide the larger concerns about standing up
| their foundry (great earnings for the initial 12gb cards, and
| so on for the inevitable 16/24gb cards).
| elric wrote:
| I couldn't find any information regarding power consumption in
| the article. I'd love to upgrade my aging gaming rig, but all
| modern AMD/Nvidia graphics cards consume significantly more
| power than my current card.
| stoatstudios wrote:
| Is nobody going talk about how the architecture is called
| "Battlemage?" Is that just normal to GPU enthusiasts?
| ZeWaka wrote:
| It's their 2nd generation, the 'B' series. The previous was
| their 'A' / Alchemist.
|
| > According to Intel, the brand is named after the concept of
| story arcs found in video games. Each generation of Arc is
| named after character classes sorted by each letter of the
| Latin alphabet in ascending order.
| (https://en.wikipedia.org/wiki/Intel_Arc)
| reginald78 wrote:
| The generations are all fantasy type names in alphabetical
| order. The first was Alchemist (and the cards were things like
| A310) and the next is Celestial. Actually when I think about
| product names for GPUs and CPUs these seem above average in
| clarity and only slightly dorkier than average. I'm sure
| they'll get more confusing and nonsensical with time as that
| seems to be a constant of the universe.
| Workaccount2 wrote:
| Can't wait for Dungeon architecture.
| meragrin_ wrote:
| Dungeon architecture? What's that?
| sevg wrote:
| Looks to have been a joke about the alphabetical naming:
| Alchemist, Battlemage, Celestial .. Dungeon
|
| (There's no name decided yet for the fourth in the
| series.)
| CodesInChaos wrote:
| Dragon and Druid sound like viable options.
| spiffytech wrote:
| Dorky, alphabetical codenames are a big step up from a bunch
| of lakes in no obvious order.
| PaulHoule wrote:
| Yeah, with the way Intel has been struggling I thought they
| should get it out of their system and name one of their
| chips "Shit Creek."
| ReptileMan wrote:
| It has been 20 years since Prescott But the name is
| suitable still.
| dark-star wrote:
| A well-known commercial storage vendor gives their system
| releases codenames from beer brands. We had Becks, Guinnes,
| Longboard, Voodoo Ranger, and many others. Presumably what the
| devs drank during that release cycle, or something ;-)
|
| It's fun for the developers and the end-users alike... So no,
| it's not limited to GPU enthusiasts at all. Everyone likes
| codenames :)
| B1FF_PSUVM wrote:
| > Everyone likes codenames :)
|
| Except butt-headed astronomers
| homarp wrote:
| https://www.engadget.com/2014-02-26-when-carl-sagan-sued-
| app... if you miss the ref
| monocasa wrote:
| I mean, living people seems like a dick move in general for
| codenames.
| wincy wrote:
| That's what we make sure our codenames are sensible
| things like Jimmy Carter and James Earl Jones
|
| We were actually told to change our internal names for
| our servers after someone named an AWS instance "stupid"
| and I rolled my eyes so hard, one dev ruined the fun for
| everyone.
| monocasa wrote:
| I mean, sure, for a lot of the same reasons you can't
| file a defamation claim in defense of someone who's dead.
| The idea of them is in the public domain in a lot of
| ways.
|
| So sure, pour one out to whoever's funeral is on the
| grocery store tabloids that week with your codenames.
| throw16180339 wrote:
| Are you referring to NetApp?
| baq wrote:
| A codename as good as any. Nvidia has Tesla, Turing etc.
| high_na_euv wrote:
| Cool name, easy to remember, aint it?
| tdb7893 wrote:
| It's dorky but there isn't much else to say about it. Personal
| GPU enthusiasts are almost always video game enthusiasts so
| it's not really a particularly weird name in context.
| faefox wrote:
| It sounds cool and has actual personality. What would you
| prefer, Intel Vision Pro Max? :)
| babypuncher wrote:
| It's just the code name for this generation of their GPU
| architecture, not the name for its instruction set. Intel's are
| all fantasy themed. Nvidia names theirs after famous scientists
| and mathematicians (Alan Turing, Ada Lovelace, David Blackwell)
| userbinator wrote:
| It's very much normal "gamer" aesthetic.
| treve wrote:
| I wonder if these GPUs are good options for Linux rigs and if
| first-party drivers are made.
| baq wrote:
| Of all the god awful Linux GPU drivers Intel's are the least
| awful IME. Unless you're talking purely compute, then nvidia,
| have fun matching those cuda versions though...
| dralley wrote:
| AMD's Linux drivers are pretty good. I get better performance
| playing games through Proton on Linux than I do playing the
| same games on Windows, despite whatever overhead the
| translation adds.
|
| The only really annoying bug I've run into is the one where
| the system locks up if you go to sleep with more used swap
| space than free memory, but that one just got fixed.
| bradfa wrote:
| Yes, first party drivers are made. Upstream Linux and mesa
| project should have good support in their latest releases. If
| you're running a non-bleeding edge distro, you may need to wait
| or do a little leg work to get the newer versions of things,
| but this is not unusual for new hardware.
|
| If you're running Ubuntu, Intel has some exact steps you can
| follow: https://dgpu-docs.intel.com/driver/client/overview.html
| ThaDood wrote:
| Here are some benchmarks from a few months back. Seems
| promising. https://www.phoronix.com/review/intel-arc-b580-gpu-
| compute
|
| Whoops - included the wrong link!
| https://www.phoronix.com/review/intel-arc-b580-graphics-linu...
| ZeWaka wrote:
| I use an Alchemist series A380 on my nix media server, but it's
| absolutely fantastic for video encoding.
| VTimofeenko wrote:
| Same; recently built SFF with low profile A310. Goes through
| video streams like hot knife through butter.
|
| Do you have your config posted somewhere? I'd be interested
| to compare notes
| dingi wrote:
| In fact, Intel has been a stellar contributor to the Linux
| kernel and associated projects, compared to all other vendors.
| They usually have launch day Linux support provided that you
| are running a bleeding edge Linux kernel.
| everfrustrated wrote:
| Intel also have up-streamed their video encoding acceleration
| support into software like ffmpeg.
|
| Intel Arc gpus also support hardware video encoding for the AV1
| codec which even the just released Nvidia 50 series still
| doesn't support.
| lostmsu wrote:
| This is wrong. AV1 encoding is supported since Nvidia 40
| series.
| bee_rider wrote:
| I have always associated Intel iGPUs with good drivers but
| people seem to often complain about their Linux dGPU drivers in
| these threads. I hope it is just an issue of them trying to
| break into a new field, rather than a slipping of their GPU
| drivers in general...
| jorvi wrote:
| Intel switched over to a new driver for dGPUs and any iGPU
| newer than Skylake(?).
|
| The newest beta-ish driver is Xe, the main driver is Intel
| HD, and the old driver is i915.
|
| People complaining experienced the teething issues of early
| Xe builds.
| mtlmtlmtlmtl wrote:
| Been running Linux on the A770 for about 2 years now. Very
| happy with the driver situation. Was a bit rough very early on,
| but it's nice and stable now. Recommend at least Linux 6.4, but
| preferably newer. I use a rolling release distro(Artix) to get
| up to date kernels.
|
| ML stuff can be a pain sometimes because support in pytorch and
| various other libraries is not as prioritised as CUDA. But I've
| been able to get llama.cpp working via ollama, which has
| experimental intel gpu support. Worked fine when I tested it,
| though I haven't actually used it very much, so don't quote me
| on it.
|
| For image gen, your best bet is to use
| sdnext(https://github.com/vladmandic/sdnext), which supports
| Intel on linux officially, and will automagically install the
| right pytorch version, and do a bunch of trickery to get
| libraries that insist on CUDA to work in many of the cases.
| Though some things are still unsupported due to various
| libraries still not supporting intel on Linux. Some types of
| quantization are unavailable for instance. But at least if you
| have the A770, quantisation for image gen is not as important
| due to plentyful VRAM, unless you're trying to use the flux
| models.
| jcarrano wrote:
| Last year I was doing a livestream for a band. The NVidia
| encoder on my friend's computer (running Windows) just wouldn't
| work. We tried in vain to install drivers and random stuff from
| Nvidia. I pulled out my own machine with Linux and Intel iGPU
| and not only did it worked flawlessly, but did so on battery
| and with charge to spare.
|
| On the other hand, I have to keep the driver for the secondary
| GPU (also intel) blacklisted because last time I tried to use
| it it was constantly drawing power.
| jorvi wrote:
| > Unfortunately, today's midrange cards like the RTX 4060 and RX
| 7600 only come with 8 GB of VRAM
|
| Just a nit: one step up (RX 7600 XT) comes with 16GB memory,
| although in clamshell configuration. With the B580 falling
| inbetween the 7600 and 7600 XT in terms of pricing, it seems a
| bit unfair to only compare it with the former.
|
| - RX 7600 (8GB) ~EUR300
|
| - RTX 4060 (8GB) ~EUR310
|
| - Intel B580 (12GB) ~EUR330
|
| - RX 7600 XT (16GB) ~EUR350
|
| - RTX 4060 Ti (8GB) ~EUR420
|
| - RTX 4060 Ti (16GB) ~EUR580*
|
| *Apparently this card is really rare plus a bad value
| proposition, so it is hard to find
| mananaysiempre wrote:
| All sources I've seen say the 4060 Ti 8GB is also really bad
| value. Here's GamersNexus for example:
| https://www.youtube.com/watch?v=Y2b0MWGwK_U.
| jandrese wrote:
| And that is also one of the most popular cards on prebuilt
| systems. Just search through Amazon listings and see which
| card shows up all the damn time.
| hassleblad23 wrote:
| > Intel takes advantage of this by launching the Arc B580 at
| $250, undercutting both competitors while offering 12 GB of
| VRAM.
|
| Not sure where you got that 350 EUR number for B580?
| xmodem wrote:
| 330 EUR is roughly reflective of the street price of the B580
| in Europe.
|
| For example:
|
| https://www.mindfactory.de/product_info.php/12GB-ASRock-
| Inte... (~327 EUR)
|
| https://www.overclockers.co.uk/sparkle-intel-
| arc-b580-guardi... (~330 EUR)
|
| https://www.inet.se/produkt/5414587/acer-arc-b580-12gb-
| nitro... (~336 EUR)
| qball wrote:
| _All_ RTX xx60 cards are really bad value propositions, though
| (especially in comparison to the xx80 series cards).
|
| If the 4060 was the 3080-for-400USD that everyone actually
| wants, that'd be a different story. Fortunately, its
| nonexistence is a major contributor to why the B580 can even be
| a viable GPU for Intel to produce in the first place.
| jorvi wrote:
| Not all of them. The 3060 Ti was great because it was
| actually built on the same underlying chip as the 3070 and
| 3070 Ti. Which ironically made those less valuable.
|
| But the release of those cards was during Covid pricing
| weirdness times. I scored a 3070 Ti at EUR650, whilst the
| 3060 Ti's that I actually wanted were being sold for EUR700+.
| Viva la Discord bots.
| mrbonner wrote:
| Let me know where you could find 4060Ti 16GB for under $1000
| USD
| hedgehog wrote:
| What's annoying is they were under $500 just a few months
| ago.
| clamchowder wrote:
| (author here) When I checked the 7600 XT was much more
| expensive. Right now it's still $360 on eBay, vs the B580's
| $250 MSRP, though yeah I guess it's hard to find the B580 in
| stock
| jorvi wrote:
| Yeah I guess regional availability really works into it..
| bummer
|
| I wonder if the B580 will drop to MSRP at all, or if
| retailers will just keep it slotted into the greater GPU
| line-up the way it is now and pocket the extra money.
| glitchc wrote:
| Double the memory for double the price and I would buy one in a
| heartbeat.
| talldayo wrote:
| If your application is video transcoding or AI inference, you
| could probably buy two and use them in a multi-GPU
| configuration.
| taurknaut wrote:
| I don't really care about how it performs so long as it's better
| than a CPU. I just want to target the GPU myself and remove the
| vendor from the software equation. Nvidia has taught me there
| isn't any value that can't be destroyed with sufficiently bad
| drivers.
| williamDafoe wrote:
| BattleMage B580 specs from TechPowerUp match the 4070 almost
| precisely - same RAM, same bus, same bus speed, same power
| rating, TSMC GPU at N4 node and nearly identical size (290 vs 272
| mm square) - $10 difference, tops.
|
| But it was released TWO YEARS LATER than the 4070 and it performs
| ONE GENERATION WORSE (4060 performance). 2+2 = 4 Years behind! I
| am not too impressed with the "chips and cheese ant's view
| article" as they don't uncover the reason why performance is SO
| PATHETIC!
|
| A weird thing goes on in the TSMC / GPU business. Intel prepaid
| for the N4 wafers and the design is so poor it's not profitable
| to make the GPU and sell it at LESS THAN HALF the 4070 price of
| $550. Normally a mistake like this would lead to product
| cancellation but the prepayment is a stranded cost so Intel MUST
| sell these at a loss to get back a sliver of their wasted TSMC
| prepayments!
|
| What's even worse is that the a770 was also 4 years behind so
| Intel is not catching up - not one iota! The A770 was an attempt
| by Intel to make a 3070 clone and they failed badly - I'll let
| you look up specs and the timelines to do the comparison on your
| own ...
| ksec wrote:
| > I am not too impressed with the "chips and cheese ant's view
| article" as they don't uncover the reason why performance is SO
| PATHETIC!
|
| Performance on GPU has always been about Drivers. Chip and
| Cheese is only here to show the uArch behind it. This isn't
| even new as we should have learned all about it during Voodoo
| 3Dfx era. And 9 years have passed since an ( now retired )
| Intel Engineers said that they would be completing against
| Nvidia by 2020 if not 2021. We are now in 2025 and they are not
| even close. But somehow Raja Koduri was suppose to save them
| and now gone.
| rincebrain wrote:
| Intel seems to have deep-seated issues with their PR
| department writing checks their engineers can't pay out on
| time for.
|
| Not that Intel engineers are bad - on the contrary. But as
| you pointed out, they've been promising they'd be further
| than they are now for over 5 years now, and even 10+ years
| ago when I was working in HPC systems, they kept promising
| things you should build your systems on that would be "in the
| next gen" that were not, in fact, there.
|
| It seems much like the Bioware Problem(tm) where Bioware got
| very comfortable promising the moon in 12 months and assuming
| 6 months of crunch would Magically produce a good outcome,
| and then discovered that Results May Vary.
| wqaatwt wrote:
| > 4060 performance
|
| That's really not true though. It's closer to 4060 Ti and
| somewhat ahead/behind depending on specific game.
| adgjlsfhk1 wrote:
| I think this is a bad take because it assumes that NVidia is
| making rapid price/performance improvements in the consumer
| space The RTX 4060 is roughly equivalent to a 2080 (similar
| performance and ram and transistors). Intel isn't making much
| margin, but from what I've seen they're probably roughly
| breaking even not taking a huge loss.
|
| Also, a ton of the work for Intel is in drivers which are (as
| the A770 showed) very improvable after launch. Based on the
| hardware, it seems very possible that B580 could get an extra
| 10% (especially in 1080p) which would bring it clear above the
| 4060ti in perf.
| keyringlight wrote:
| The other major issue with regards pricing is that intel need
| to pay one way or another to get market penetration, if no one
| buys their cards at all and they don't establish a beachhead
| then it's even more wasted money.
|
| As I see it AMD get _potentially_ squeezed between intel and
| nvidia. Nvidia's majority marketshare seems pretty secure for
| the foreseeable future, intel undercutting AMD plus their
| connections to prebuilt system manufacturers would likely grab
| them a few more nibbles into AMD territory. If intel release a
| competent B770 versus AMD products priced a few hundred dollars
| more, even if Arc isn't as mature I'm not sure they have solid
| answers for why someone should buy Radeon.
|
| In my view AMD's issue is that they don't have any vision for
| what their GPUs can offer besides a slightly better version of
| the previous generation, it appears back in 2018 that the RTX
| offering must have blindsided them, and years later they're not
| giving us any alternative vision for what comes next for
| graphics to make Radeon desirable besides catching up to nvidia
| (who I imagine will have something new to move the goalposts if
| anyone gets close), and this is an AMD that is currently well
| resourced from Zen
| wirybeige wrote:
| Strange to point out those comparisons but not the actual
| transistor difference between the two.
|
| B580 only has 19.6B transistors while the RTX 4070 has 35.8B
| transistors. So the RTX 4070 has nearly double (1.82x) the
| transistors of B580.
|
| The RTX 4060 ti has 22.9B and the RTX 4060 has 18.9B
| transistors
| throwawaythekey wrote:
| Would the difference in density be more likely due to a
| difference in design philosophy or the intel design team
| being less expert?
|
| As a customer do intel pay for mm2 or for transistors?
|
| Forgive me if you are not the right person for these
| questions.
| myrmidon wrote:
| Loosely related question:
|
| What prevents manufacturers from taking some existing
| mid/toprange consumer GPU design, and just slapping like 256GB
| VRAM onto it? (enabling consumers to run big-LLM inference
| locally).
|
| Would that be useless for some reason? What am I missing?
| patmorgan23 wrote:
| Because then they couldn't sell you the $10k enterprise GPU
| ksec wrote:
| Bandwidth. GDDR / HBM, both used by GPU depending on usage are
| high bandwidth low capacity, comparatively speaking. Modern GPU
| tries to put more VRAM with more memory channel up to 512bit
| but requires more die space and hence are expensive.
|
| We will need a new memory design for both GDDR and HBM. And I
| wont be surprised they are working on it already. But hardware
| takes time so it will be few more years down the road.
| reginald78 wrote:
| You'd need memory chips with double the memory capacity to slap
| the extra vram in, at least without altering the memory bus
| width. And indeed, some third party modded entries like that
| seem to have shown up: https://www.tomshardware.com/pc-
| components/gpus/nvidia-gamin...
|
| As far as official products, I think the real reason another
| commentator mentioned is that they don't want to cannibalize
| their more powerful card sales. I know I'd be interested in a
| lower powered card with a lot of vram just to get my foot in
| the door, that is why I bought a RTX 3060 12GB which is
| unimpressive for gaming but actually had the second most vram
| available in that generation. Nvidia seem to have noticed this
| mistake and later released a crappier 8GB version to replace
| it.
|
| I think if the market reacted to create a product like this to
| compete with nvidia they'd pretty quickly release something to
| fit the need, but as it is they don't have too.
| SunlitCat wrote:
| The 3060 with 12GB was an outlier for it's time of release
| because the crypto (currency) hype was raging at that moment
| and scalpers, miners and everyone in between were buying
| graphic cards left and right! Hard times were these! D:
| protimewaster wrote:
| You can actually getting GPUs from the Chinese markets (e.g.,
| AliExpress) that have had their VRAM upgraded. Someone out
| there is doing aftermarket VRAM upgrades on cards to make them
| more usable for GPGPU tasks.
|
| Which also answers your question: The manufacturers aren't
| doing it because they're assholes.
| fulafel wrote:
| Seems some years away to get that into consumer price range.
| elabajaba wrote:
| The amount of memory you can put on a GPU is mainly constrained
| by the GPU's memory bus width (which is both expensive and
| power hungry to expand) and the available GDDR chips (generally
| require 32bits of the bus per chip). We've been using 16Gbit
| (2GB) chips for awhile, and they're just starting to roll out
| 24Gbit (3GB) GDDR7 modules, but they're expensive and in
| limited demand. You also have to account for VRAM being
| somewhat power hungry (~1.5-2.5w per module under load).
|
| Once you've filled all the slots your only real option is to do
| a clamshell setup that will double the VRAM capacity by putting
| chips on the back of the PCB in the same spot as the ones on
| the front (for timing reasons the traces all have to be the
| same length). Clamshell designs then need to figure out how to
| cool those chips on the back (~1.5-2.5w per module depending on
| speed and if it's GDDR6/6X/7, meaning you could have up to 40w
| on the back).
|
| Some basic math puts us at 16 modules for a 512 bit bus (only
| the 5090, have to go back a decade+ to get the last 512bit bus
| GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080,
| 4080, 7800xt).
|
| A clamshell 5090 with 2GB modules has a max limit of 64GB, or
| 96GB with (currently expensive and limited) 3GB modules (you'll
| be able to buy this at some point as the RTX 6000 Blackwell at
| stupid prices).
|
| HBM can get you higher amounts, but it's extremely expensive to
| buy (you're competing against H100s, MI300Xs, etc), supply
| limited (AI hardware companies are buying all of it and want
| even more), requires a different memory controller (meaning
| you'll still have to partially redesign the GPU), and requires
| expensive packaging to assemble it.
| lostmsu wrote:
| What of previous generations of HBM? Older consumer AMD GPUs
| (Vega) and Titan V had HBM2. According to https://en.wikipedi
| a.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get
| 16GB with 1TB/s for $700 at release. It is no longer use in
| data centers. I'd gladly pay $2800 for 48GB with 4TB/s.
| devit wrote:
| I wonder if a multiplexer would be feasible?
|
| Hardware-wise instead of putting the chips on the PCB surface
| one would mount an 16-gonal arrangement of perpendicular
| daughterboards, each containing 2-16 GDDR chips where there
| would be normally one, with external liquid cooling, power
| delivery and PCIe control connection.
|
| Then each of the daughterboards would feature a multiplexer
| with a dual-ported SRAM containing a table whether for each
| memory page it would store the chip to map it to and it would
| use it to route requests from the GPU, using the second port
| to change the mapping from the extra PCIe interface.
|
| API-wise, for each resource you would have N overlays and
| would have a new operation allowing to switch the resource
| (which would require a custom driver that properly
| invalidates caches).
|
| This would depend on the GPU supporting the much higher
| latency of this setup and providing good enough support for
| cache flushing and invalidation, as well as deterministic
| mapping from physical addresses to chip addresses, and the
| ability to manufacture all this in a reasonably affordable
| fashion.
| Animats wrote:
| There are companies in China doing that, recycling older NVidia
| GPUs.[1]
|
| [1]
| https://www.reddit.com/r/hardware/comments/182nmmy/special_c...
| joelthelion wrote:
| That's cool and all but can you use it for deep learning?
| SG- wrote:
| it's a nice technical article but the charts are just terrible
| and seem blurry even when zoomed in.
| clamchowder wrote:
| Yea Wordpress was a terrible platform and Substack is also a
| terrible platform. I don't know why every platform wants to
| take a simple uploaded PNG and apply TAA to it. And don't get
| me started on how Substack has no native table support, when
| HTML had it since prehistoric times.
|
| If I had more time I'd roll my own site with basic HTML/CSS.
| It's not even hard, just time consuming.
| dark__paladin wrote:
| TAA is temporal anti-aliasing, correct? There is no time
| dimension here, isn't it just compression + bilinear
| filtering?
| daneel_w wrote:
| Missing detail: 190 watt TDP.
___________________________________________________________________
(page generated 2025-02-11 23:00 UTC)