hngopher.com

       [HN Gopher] Intel's Battlemage Architecture
       ___________________________________________________________________
        
       Intel's Battlemage Architecture
        
       Author : ksec
       Score  : 132 points
       Date   : 2025-02-11 16:00 UTC (6 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | netbioserror wrote:
       | A lot of commentators have pointed out that Intel is reaching
       | nowhere near the performance/mm2 of Nvidia or AMD designs, though
       | contrary to what I thought that might imply, it seems that power
       | consumption is very much under control on Battlemage. So it seems
       | the primary trade-off here is on the die cost.
       | 
       | Can anyone explain what might be going on here, especially as it
       | relates to power consumption? I thought (bigger die ^ bigger
       | wires -> more current -> higher consumption).
        
         | tonetegeatinst wrote:
         | It mainly seems to boil down to design choice and process
         | technology.
         | 
         | They might be targeting a lower power density per squad mm than
         | compared to amd or nvidia, focusing more on lower power levels.
         | 
         | Instruction set architecture and layout of the chips and PCB
         | also factor into this as well.
        
         | MisterTea wrote:
         | > I thought (bigger die ^ bigger wires -> more current ->
         | higher consumption).
         | 
         | I am not a semi expert but bigger die doesn't mean bigger wires
         | if you are referring to cross-section, the wires would be
         | thinner meaning less current. Power is consumed pushing and
         | pulling electrons from the transistor gates which are all of
         | the FET type, field effect transistor. The gate is a capacitor
         | that needs to be charged to open the gate to allow current to
         | flow through the transistor. discharging the gate closes it.
         | That current draw then gets multiplied by a few billion gates
         | so you can see where the load comes from.
        
           | williamDafoe wrote:
           | Actually the wires don't scale down like the transistors do.
           | I remember in graduate school taking VLSI circuit complexity
           | theory and the conclusion was for two dimensional circuits
           | the wires will end Moore's Law. However I've seen articles
           | about backside power delivery and they are already using
           | seven+ layers so the wires are going through three dimensions
           | now. Copper interconnects were a one-time bonus in the late
           | 90s and after that wires just don't scale down-signal delay
           | would go up too fast. Imagine taking a city with all the
           | streets and houses and the houses now become the size of dog
           | houses but you can't shrink the streets they have to stay the
           | same size to carry signals quickly!
        
         | gruez wrote:
         | >I thought (bigger die ^ bigger wires -> more current -> higher
         | consumption).
         | 
         | All things being equal, a bigger die would result in more power
         | consumption, but the factor you're not considering is the
         | voltage/frequency curve. As you increase the frequency, you
         | also need to up the voltage. However, as you increase voltage,
         | there's diminishing returns to how much you can increase the
         | frequency, so you end up massively increasing power consumption
         | to get minor performance gains.
        
         | wmf wrote:
         | If it's a similar number of transistors on a larger die then I
         | can believe the power consumption is good. Less dense layout
         | probably requires less design effort and may reduce hotspots.
         | 
         | If Intel is getting similar performance from more transistors
         | that could be caused by extra control logic from a 16-wide core
         | instead of 32.
        
         | kimixa wrote:
         | Increasing clocks tends to have a greater-than-linear cost on
         | power, as you need transistors to switch quicker so often need
         | a higher voltage, which causes more leakage and other losses on
         | top of the switching cost itself (that all turn into heat).
         | Higher clock targets also have a cost for the design itself,
         | often needing more transistors for things like extra redrivers
         | to ensure you get fast switching speed, or even things like
         | more pipeline stages. Plus not all area is "transistors" - it's
         | often easier to place related units that need a _lot_ of
         | interconnectivity with shorter interconnects if an adjacent,
         | less interconnected unit isn 't also trying to be packed into
         | much of the same space, routing on modern chips is _really_
         | difficult (and a place where companies can really differentiate
         | by investing more).
         | 
         | For tasks that tend to scale well with increased die area,
         | which is often the case for GPUs as they're already focused on
         | massively parallel tasks so laying down more parallel units is
         | a realistic option, running a larger die at lower clocks is
         | often notably more efficient in terms of performance per unit
         | power.
         | 
         | For GPUs generally that's just part of the pricing and cost
         | balance, a larger lower clocked die would be more efficient,
         | but would that really sell for as much as the same die clocked
         | _even higher_ to get peak results?
        
           | netbioserror wrote:
           | >For tasks that tend to scale well with increased die area,
           | which is often the case for GPUs as they're already focused
           | on massively parallel tasks so laying down more parallel
           | units is a realistic option, running a larger die at lower
           | clocks is often notably more efficient in terms of
           | performance per unit power.
           | 
           | I should've considered this, I have an RTX A5000. It's a
           | gigantic GA102 die (3090, 3080) that's underclocked to 230W,
           | putting it at roughly 3070 throughput. That's ~15% less
           | performance than a 3090 for a ~35% power reduction.
           | Absolutely nonlinear savings there. Though some of that may
           | have to do with power savings using GDDR6 over GDDR6X.
           | 
           | (I should mention that relative performance estimates are all
           | over the place, by some metrics the A5000 is ~3070, by others
           | it's ~3080.)
        
             | bgnn wrote:
             | Yeah the power consumption scales, to first order, with
             | Vdd^2 (square of power supply voltage) but performance
             | scales with Vdd. Though you cannot simply reduce the Vdd
             | and clock rate and do more pipelining etc to gain back the
             | performance. If you are willing to back off on performance
             | a bit you can gain hugely on power. Plus thermal management
             | of it is more manageable.
        
         | bloomingkales wrote:
         | They are holding back the higher vram models of this card. GPU
         | makers always do some nerfing of their cards in the same
         | product line. Often times there's no good reason for this other
         | than they found specs that they can market and sell simply by
         | moving voltages around.
         | 
         | Anyway, expecting good earnings throughout the year as they use
         | Battlemage sales to hide the larger concerns about standing up
         | their foundry (great earnings for the initial 12gb cards, and
         | so on for the inevitable 16/24gb cards).
        
         | elric wrote:
         | I couldn't find any information regarding power consumption in
         | the article. I'd love to upgrade my aging gaming rig, but all
         | modern AMD/Nvidia graphics cards consume significantly more
         | power than my current card.
        
       | stoatstudios wrote:
       | Is nobody going talk about how the architecture is called
       | "Battlemage?" Is that just normal to GPU enthusiasts?
        
         | ZeWaka wrote:
         | It's their 2nd generation, the 'B' series. The previous was
         | their 'A' / Alchemist.
         | 
         | > According to Intel, the brand is named after the concept of
         | story arcs found in video games. Each generation of Arc is
         | named after character classes sorted by each letter of the
         | Latin alphabet in ascending order.
         | (https://en.wikipedia.org/wiki/Intel_Arc)
        
         | reginald78 wrote:
         | The generations are all fantasy type names in alphabetical
         | order. The first was Alchemist (and the cards were things like
         | A310) and the next is Celestial. Actually when I think about
         | product names for GPUs and CPUs these seem above average in
         | clarity and only slightly dorkier than average. I'm sure
         | they'll get more confusing and nonsensical with time as that
         | seems to be a constant of the universe.
        
           | Workaccount2 wrote:
           | Can't wait for Dungeon architecture.
        
             | meragrin_ wrote:
             | Dungeon architecture? What's that?
        
               | sevg wrote:
               | Looks to have been a joke about the alphabetical naming:
               | Alchemist, Battlemage, Celestial .. Dungeon
               | 
               | (There's no name decided yet for the fourth in the
               | series.)
        
             | CodesInChaos wrote:
             | Dragon and Druid sound like viable options.
        
           | spiffytech wrote:
           | Dorky, alphabetical codenames are a big step up from a bunch
           | of lakes in no obvious order.
        
             | PaulHoule wrote:
             | Yeah, with the way Intel has been struggling I thought they
             | should get it out of their system and name one of their
             | chips "Shit Creek."
        
               | ReptileMan wrote:
               | It has been 20 years since Prescott But the name is
               | suitable still.
        
         | dark-star wrote:
         | A well-known commercial storage vendor gives their system
         | releases codenames from beer brands. We had Becks, Guinnes,
         | Longboard, Voodoo Ranger, and many others. Presumably what the
         | devs drank during that release cycle, or something ;-)
         | 
         | It's fun for the developers and the end-users alike... So no,
         | it's not limited to GPU enthusiasts at all. Everyone likes
         | codenames :)
        
           | B1FF_PSUVM wrote:
           | > Everyone likes codenames :)
           | 
           | Except butt-headed astronomers
        
             | homarp wrote:
             | https://www.engadget.com/2014-02-26-when-carl-sagan-sued-
             | app... if you miss the ref
        
             | monocasa wrote:
             | I mean, living people seems like a dick move in general for
             | codenames.
        
               | wincy wrote:
               | That's what we make sure our codenames are sensible
               | things like Jimmy Carter and James Earl Jones
               | 
               | We were actually told to change our internal names for
               | our servers after someone named an AWS instance "stupid"
               | and I rolled my eyes so hard, one dev ruined the fun for
               | everyone.
        
               | monocasa wrote:
               | I mean, sure, for a lot of the same reasons you can't
               | file a defamation claim in defense of someone who's dead.
               | The idea of them is in the public domain in a lot of
               | ways.
               | 
               | So sure, pour one out to whoever's funeral is on the
               | grocery store tabloids that week with your codenames.
        
           | throw16180339 wrote:
           | Are you referring to NetApp?
        
         | baq wrote:
         | A codename as good as any. Nvidia has Tesla, Turing etc.
        
         | high_na_euv wrote:
         | Cool name, easy to remember, aint it?
        
         | tdb7893 wrote:
         | It's dorky but there isn't much else to say about it. Personal
         | GPU enthusiasts are almost always video game enthusiasts so
         | it's not really a particularly weird name in context.
        
         | faefox wrote:
         | It sounds cool and has actual personality. What would you
         | prefer, Intel Vision Pro Max? :)
        
         | babypuncher wrote:
         | It's just the code name for this generation of their GPU
         | architecture, not the name for its instruction set. Intel's are
         | all fantasy themed. Nvidia names theirs after famous scientists
         | and mathematicians (Alan Turing, Ada Lovelace, David Blackwell)
        
         | userbinator wrote:
         | It's very much normal "gamer" aesthetic.
        
       | treve wrote:
       | I wonder if these GPUs are good options for Linux rigs and if
       | first-party drivers are made.
        
         | baq wrote:
         | Of all the god awful Linux GPU drivers Intel's are the least
         | awful IME. Unless you're talking purely compute, then nvidia,
         | have fun matching those cuda versions though...
        
           | dralley wrote:
           | AMD's Linux drivers are pretty good. I get better performance
           | playing games through Proton on Linux than I do playing the
           | same games on Windows, despite whatever overhead the
           | translation adds.
           | 
           | The only really annoying bug I've run into is the one where
           | the system locks up if you go to sleep with more used swap
           | space than free memory, but that one just got fixed.
        
         | bradfa wrote:
         | Yes, first party drivers are made. Upstream Linux and mesa
         | project should have good support in their latest releases. If
         | you're running a non-bleeding edge distro, you may need to wait
         | or do a little leg work to get the newer versions of things,
         | but this is not unusual for new hardware.
         | 
         | If you're running Ubuntu, Intel has some exact steps you can
         | follow: https://dgpu-docs.intel.com/driver/client/overview.html
        
         | ThaDood wrote:
         | Here are some benchmarks from a few months back. Seems
         | promising. https://www.phoronix.com/review/intel-arc-b580-gpu-
         | compute
         | 
         | Whoops - included the wrong link!
         | https://www.phoronix.com/review/intel-arc-b580-graphics-linu...
        
         | ZeWaka wrote:
         | I use an Alchemist series A380 on my nix media server, but it's
         | absolutely fantastic for video encoding.
        
           | VTimofeenko wrote:
           | Same; recently built SFF with low profile A310. Goes through
           | video streams like hot knife through butter.
           | 
           | Do you have your config posted somewhere? I'd be interested
           | to compare notes
        
         | dingi wrote:
         | In fact, Intel has been a stellar contributor to the Linux
         | kernel and associated projects, compared to all other vendors.
         | They usually have launch day Linux support provided that you
         | are running a bleeding edge Linux kernel.
        
         | everfrustrated wrote:
         | Intel also have up-streamed their video encoding acceleration
         | support into software like ffmpeg.
         | 
         | Intel Arc gpus also support hardware video encoding for the AV1
         | codec which even the just released Nvidia 50 series still
         | doesn't support.
        
           | lostmsu wrote:
           | This is wrong. AV1 encoding is supported since Nvidia 40
           | series.
        
         | bee_rider wrote:
         | I have always associated Intel iGPUs with good drivers but
         | people seem to often complain about their Linux dGPU drivers in
         | these threads. I hope it is just an issue of them trying to
         | break into a new field, rather than a slipping of their GPU
         | drivers in general...
        
           | jorvi wrote:
           | Intel switched over to a new driver for dGPUs and any iGPU
           | newer than Skylake(?).
           | 
           | The newest beta-ish driver is Xe, the main driver is Intel
           | HD, and the old driver is i915.
           | 
           | People complaining experienced the teething issues of early
           | Xe builds.
        
         | mtlmtlmtlmtl wrote:
         | Been running Linux on the A770 for about 2 years now. Very
         | happy with the driver situation. Was a bit rough very early on,
         | but it's nice and stable now. Recommend at least Linux 6.4, but
         | preferably newer. I use a rolling release distro(Artix) to get
         | up to date kernels.
         | 
         | ML stuff can be a pain sometimes because support in pytorch and
         | various other libraries is not as prioritised as CUDA. But I've
         | been able to get llama.cpp working via ollama, which has
         | experimental intel gpu support. Worked fine when I tested it,
         | though I haven't actually used it very much, so don't quote me
         | on it.
         | 
         | For image gen, your best bet is to use
         | sdnext(https://github.com/vladmandic/sdnext), which supports
         | Intel on linux officially, and will automagically install the
         | right pytorch version, and do a bunch of trickery to get
         | libraries that insist on CUDA to work in many of the cases.
         | Though some things are still unsupported due to various
         | libraries still not supporting intel on Linux. Some types of
         | quantization are unavailable for instance. But at least if you
         | have the A770, quantisation for image gen is not as important
         | due to plentyful VRAM, unless you're trying to use the flux
         | models.
        
         | jcarrano wrote:
         | Last year I was doing a livestream for a band. The NVidia
         | encoder on my friend's computer (running Windows) just wouldn't
         | work. We tried in vain to install drivers and random stuff from
         | Nvidia. I pulled out my own machine with Linux and Intel iGPU
         | and not only did it worked flawlessly, but did so on battery
         | and with charge to spare.
         | 
         | On the other hand, I have to keep the driver for the secondary
         | GPU (also intel) blacklisted because last time I tried to use
         | it it was constantly drawing power.
        
       | jorvi wrote:
       | > Unfortunately, today's midrange cards like the RTX 4060 and RX
       | 7600 only come with 8 GB of VRAM
       | 
       | Just a nit: one step up (RX 7600 XT) comes with 16GB memory,
       | although in clamshell configuration. With the B580 falling
       | inbetween the 7600 and 7600 XT in terms of pricing, it seems a
       | bit unfair to only compare it with the former.
       | 
       | - RX 7600 (8GB) ~EUR300
       | 
       | - RTX 4060 (8GB) ~EUR310
       | 
       | - Intel B580 (12GB) ~EUR330
       | 
       | - RX 7600 XT (16GB) ~EUR350
       | 
       | - RTX 4060 Ti (8GB) ~EUR420
       | 
       | - RTX 4060 Ti (16GB) ~EUR580*
       | 
       | *Apparently this card is really rare plus a bad value
       | proposition, so it is hard to find
        
         | mananaysiempre wrote:
         | All sources I've seen say the 4060 Ti 8GB is also really bad
         | value. Here's GamersNexus for example:
         | https://www.youtube.com/watch?v=Y2b0MWGwK_U.
        
           | jandrese wrote:
           | And that is also one of the most popular cards on prebuilt
           | systems. Just search through Amazon listings and see which
           | card shows up all the damn time.
        
         | hassleblad23 wrote:
         | > Intel takes advantage of this by launching the Arc B580 at
         | $250, undercutting both competitors while offering 12 GB of
         | VRAM.
         | 
         | Not sure where you got that 350 EUR number for B580?
        
           | xmodem wrote:
           | 330 EUR is roughly reflective of the street price of the B580
           | in Europe.
           | 
           | For example:
           | 
           | https://www.mindfactory.de/product_info.php/12GB-ASRock-
           | Inte... (~327 EUR)
           | 
           | https://www.overclockers.co.uk/sparkle-intel-
           | arc-b580-guardi... (~330 EUR)
           | 
           | https://www.inet.se/produkt/5414587/acer-arc-b580-12gb-
           | nitro... (~336 EUR)
        
         | qball wrote:
         | _All_ RTX xx60 cards are really bad value propositions, though
         | (especially in comparison to the xx80 series cards).
         | 
         | If the 4060 was the 3080-for-400USD that everyone actually
         | wants, that'd be a different story. Fortunately, its
         | nonexistence is a major contributor to why the B580 can even be
         | a viable GPU for Intel to produce in the first place.
        
           | jorvi wrote:
           | Not all of them. The 3060 Ti was great because it was
           | actually built on the same underlying chip as the 3070 and
           | 3070 Ti. Which ironically made those less valuable.
           | 
           | But the release of those cards was during Covid pricing
           | weirdness times. I scored a 3070 Ti at EUR650, whilst the
           | 3060 Ti's that I actually wanted were being sold for EUR700+.
           | Viva la Discord bots.
        
         | mrbonner wrote:
         | Let me know where you could find 4060Ti 16GB for under $1000
         | USD
        
           | hedgehog wrote:
           | What's annoying is they were under $500 just a few months
           | ago.
        
         | clamchowder wrote:
         | (author here) When I checked the 7600 XT was much more
         | expensive. Right now it's still $360 on eBay, vs the B580's
         | $250 MSRP, though yeah I guess it's hard to find the B580 in
         | stock
        
           | jorvi wrote:
           | Yeah I guess regional availability really works into it..
           | bummer
           | 
           | I wonder if the B580 will drop to MSRP at all, or if
           | retailers will just keep it slotted into the greater GPU
           | line-up the way it is now and pocket the extra money.
        
       | glitchc wrote:
       | Double the memory for double the price and I would buy one in a
       | heartbeat.
        
         | talldayo wrote:
         | If your application is video transcoding or AI inference, you
         | could probably buy two and use them in a multi-GPU
         | configuration.
        
       | taurknaut wrote:
       | I don't really care about how it performs so long as it's better
       | than a CPU. I just want to target the GPU myself and remove the
       | vendor from the software equation. Nvidia has taught me there
       | isn't any value that can't be destroyed with sufficiently bad
       | drivers.
        
       | williamDafoe wrote:
       | BattleMage B580 specs from TechPowerUp match the 4070 almost
       | precisely - same RAM, same bus, same bus speed, same power
       | rating, TSMC GPU at N4 node and nearly identical size (290 vs 272
       | mm square) - $10 difference, tops.
       | 
       | But it was released TWO YEARS LATER than the 4070 and it performs
       | ONE GENERATION WORSE (4060 performance). 2+2 = 4 Years behind! I
       | am not too impressed with the "chips and cheese ant's view
       | article" as they don't uncover the reason why performance is SO
       | PATHETIC!
       | 
       | A weird thing goes on in the TSMC / GPU business. Intel prepaid
       | for the N4 wafers and the design is so poor it's not profitable
       | to make the GPU and sell it at LESS THAN HALF the 4070 price of
       | $550. Normally a mistake like this would lead to product
       | cancellation but the prepayment is a stranded cost so Intel MUST
       | sell these at a loss to get back a sliver of their wasted TSMC
       | prepayments!
       | 
       | What's even worse is that the a770 was also 4 years behind so
       | Intel is not catching up - not one iota! The A770 was an attempt
       | by Intel to make a 3070 clone and they failed badly - I'll let
       | you look up specs and the timelines to do the comparison on your
       | own ...
        
         | ksec wrote:
         | > I am not too impressed with the "chips and cheese ant's view
         | article" as they don't uncover the reason why performance is SO
         | PATHETIC!
         | 
         | Performance on GPU has always been about Drivers. Chip and
         | Cheese is only here to show the uArch behind it. This isn't
         | even new as we should have learned all about it during Voodoo
         | 3Dfx era. And 9 years have passed since an ( now retired )
         | Intel Engineers said that they would be completing against
         | Nvidia by 2020 if not 2021. We are now in 2025 and they are not
         | even close. But somehow Raja Koduri was suppose to save them
         | and now gone.
        
           | rincebrain wrote:
           | Intel seems to have deep-seated issues with their PR
           | department writing checks their engineers can't pay out on
           | time for.
           | 
           | Not that Intel engineers are bad - on the contrary. But as
           | you pointed out, they've been promising they'd be further
           | than they are now for over 5 years now, and even 10+ years
           | ago when I was working in HPC systems, they kept promising
           | things you should build your systems on that would be "in the
           | next gen" that were not, in fact, there.
           | 
           | It seems much like the Bioware Problem(tm) where Bioware got
           | very comfortable promising the moon in 12 months and assuming
           | 6 months of crunch would Magically produce a good outcome,
           | and then discovered that Results May Vary.
        
         | wqaatwt wrote:
         | > 4060 performance
         | 
         | That's really not true though. It's closer to 4060 Ti and
         | somewhat ahead/behind depending on specific game.
        
         | adgjlsfhk1 wrote:
         | I think this is a bad take because it assumes that NVidia is
         | making rapid price/performance improvements in the consumer
         | space The RTX 4060 is roughly equivalent to a 2080 (similar
         | performance and ram and transistors). Intel isn't making much
         | margin, but from what I've seen they're probably roughly
         | breaking even not taking a huge loss.
         | 
         | Also, a ton of the work for Intel is in drivers which are (as
         | the A770 showed) very improvable after launch. Based on the
         | hardware, it seems very possible that B580 could get an extra
         | 10% (especially in 1080p) which would bring it clear above the
         | 4060ti in perf.
        
         | keyringlight wrote:
         | The other major issue with regards pricing is that intel need
         | to pay one way or another to get market penetration, if no one
         | buys their cards at all and they don't establish a beachhead
         | then it's even more wasted money.
         | 
         | As I see it AMD get _potentially_ squeezed between intel and
         | nvidia. Nvidia's majority marketshare seems pretty secure for
         | the foreseeable future, intel undercutting AMD plus their
         | connections to prebuilt system manufacturers would likely grab
         | them a few more nibbles into AMD territory. If intel release a
         | competent B770 versus AMD products priced a few hundred dollars
         | more, even if Arc isn't as mature I'm not sure they have solid
         | answers for why someone should buy Radeon.
         | 
         | In my view AMD's issue is that they don't have any vision for
         | what their GPUs can offer besides a slightly better version of
         | the previous generation, it appears back in 2018 that the RTX
         | offering must have blindsided them, and years later they're not
         | giving us any alternative vision for what comes next for
         | graphics to make Radeon desirable besides catching up to nvidia
         | (who I imagine will have something new to move the goalposts if
         | anyone gets close), and this is an AMD that is currently well
         | resourced from Zen
        
         | wirybeige wrote:
         | Strange to point out those comparisons but not the actual
         | transistor difference between the two.
         | 
         | B580 only has 19.6B transistors while the RTX 4070 has 35.8B
         | transistors. So the RTX 4070 has nearly double (1.82x) the
         | transistors of B580.
         | 
         | The RTX 4060 ti has 22.9B and the RTX 4060 has 18.9B
         | transistors
        
           | throwawaythekey wrote:
           | Would the difference in density be more likely due to a
           | difference in design philosophy or the intel design team
           | being less expert?
           | 
           | As a customer do intel pay for mm2 or for transistors?
           | 
           | Forgive me if you are not the right person for these
           | questions.
        
       | myrmidon wrote:
       | Loosely related question:
       | 
       | What prevents manufacturers from taking some existing
       | mid/toprange consumer GPU design, and just slapping like 256GB
       | VRAM onto it? (enabling consumers to run big-LLM inference
       | locally).
       | 
       | Would that be useless for some reason? What am I missing?
        
         | patmorgan23 wrote:
         | Because then they couldn't sell you the $10k enterprise GPU
        
         | ksec wrote:
         | Bandwidth. GDDR / HBM, both used by GPU depending on usage are
         | high bandwidth low capacity, comparatively speaking. Modern GPU
         | tries to put more VRAM with more memory channel up to 512bit
         | but requires more die space and hence are expensive.
         | 
         | We will need a new memory design for both GDDR and HBM. And I
         | wont be surprised they are working on it already. But hardware
         | takes time so it will be few more years down the road.
        
         | reginald78 wrote:
         | You'd need memory chips with double the memory capacity to slap
         | the extra vram in, at least without altering the memory bus
         | width. And indeed, some third party modded entries like that
         | seem to have shown up: https://www.tomshardware.com/pc-
         | components/gpus/nvidia-gamin...
         | 
         | As far as official products, I think the real reason another
         | commentator mentioned is that they don't want to cannibalize
         | their more powerful card sales. I know I'd be interested in a
         | lower powered card with a lot of vram just to get my foot in
         | the door, that is why I bought a RTX 3060 12GB which is
         | unimpressive for gaming but actually had the second most vram
         | available in that generation. Nvidia seem to have noticed this
         | mistake and later released a crappier 8GB version to replace
         | it.
         | 
         | I think if the market reacted to create a product like this to
         | compete with nvidia they'd pretty quickly release something to
         | fit the need, but as it is they don't have too.
        
           | SunlitCat wrote:
           | The 3060 with 12GB was an outlier for it's time of release
           | because the crypto (currency) hype was raging at that moment
           | and scalpers, miners and everyone in between were buying
           | graphic cards left and right! Hard times were these! D:
        
         | protimewaster wrote:
         | You can actually getting GPUs from the Chinese markets (e.g.,
         | AliExpress) that have had their VRAM upgraded. Someone out
         | there is doing aftermarket VRAM upgrades on cards to make them
         | more usable for GPGPU tasks.
         | 
         | Which also answers your question: The manufacturers aren't
         | doing it because they're assholes.
        
         | fulafel wrote:
         | Seems some years away to get that into consumer price range.
        
         | elabajaba wrote:
         | The amount of memory you can put on a GPU is mainly constrained
         | by the GPU's memory bus width (which is both expensive and
         | power hungry to expand) and the available GDDR chips (generally
         | require 32bits of the bus per chip). We've been using 16Gbit
         | (2GB) chips for awhile, and they're just starting to roll out
         | 24Gbit (3GB) GDDR7 modules, but they're expensive and in
         | limited demand. You also have to account for VRAM being
         | somewhat power hungry (~1.5-2.5w per module under load).
         | 
         | Once you've filled all the slots your only real option is to do
         | a clamshell setup that will double the VRAM capacity by putting
         | chips on the back of the PCB in the same spot as the ones on
         | the front (for timing reasons the traces all have to be the
         | same length). Clamshell designs then need to figure out how to
         | cool those chips on the back (~1.5-2.5w per module depending on
         | speed and if it's GDDR6/6X/7, meaning you could have up to 40w
         | on the back).
         | 
         | Some basic math puts us at 16 modules for a 512 bit bus (only
         | the 5090, have to go back a decade+ to get the last 512bit bus
         | GPU), 12 with 384bit (4090, 7900xtx), or 8 with 256bit (5080,
         | 4080, 7800xt).
         | 
         | A clamshell 5090 with 2GB modules has a max limit of 64GB, or
         | 96GB with (currently expensive and limited) 3GB modules (you'll
         | be able to buy this at some point as the RTX 6000 Blackwell at
         | stupid prices).
         | 
         | HBM can get you higher amounts, but it's extremely expensive to
         | buy (you're competing against H100s, MI300Xs, etc), supply
         | limited (AI hardware companies are buying all of it and want
         | even more), requires a different memory controller (meaning
         | you'll still have to partially redesign the GPU), and requires
         | expensive packaging to assemble it.
        
           | lostmsu wrote:
           | What of previous generations of HBM? Older consumer AMD GPUs
           | (Vega) and Titan V had HBM2. According to https://en.wikipedi
           | a.org/wiki/Radeon_RX_Vega_series#Radeon_V... you could get
           | 16GB with 1TB/s for $700 at release. It is no longer use in
           | data centers. I'd gladly pay $2800 for 48GB with 4TB/s.
        
           | devit wrote:
           | I wonder if a multiplexer would be feasible?
           | 
           | Hardware-wise instead of putting the chips on the PCB surface
           | one would mount an 16-gonal arrangement of perpendicular
           | daughterboards, each containing 2-16 GDDR chips where there
           | would be normally one, with external liquid cooling, power
           | delivery and PCIe control connection.
           | 
           | Then each of the daughterboards would feature a multiplexer
           | with a dual-ported SRAM containing a table whether for each
           | memory page it would store the chip to map it to and it would
           | use it to route requests from the GPU, using the second port
           | to change the mapping from the extra PCIe interface.
           | 
           | API-wise, for each resource you would have N overlays and
           | would have a new operation allowing to switch the resource
           | (which would require a custom driver that properly
           | invalidates caches).
           | 
           | This would depend on the GPU supporting the much higher
           | latency of this setup and providing good enough support for
           | cache flushing and invalidation, as well as deterministic
           | mapping from physical addresses to chip addresses, and the
           | ability to manufacture all this in a reasonably affordable
           | fashion.
        
         | Animats wrote:
         | There are companies in China doing that, recycling older NVidia
         | GPUs.[1]
         | 
         | [1]
         | https://www.reddit.com/r/hardware/comments/182nmmy/special_c...
        
       | joelthelion wrote:
       | That's cool and all but can you use it for deep learning?
        
       | SG- wrote:
       | it's a nice technical article but the charts are just terrible
       | and seem blurry even when zoomed in.
        
         | clamchowder wrote:
         | Yea Wordpress was a terrible platform and Substack is also a
         | terrible platform. I don't know why every platform wants to
         | take a simple uploaded PNG and apply TAA to it. And don't get
         | me started on how Substack has no native table support, when
         | HTML had it since prehistoric times.
         | 
         | If I had more time I'd roll my own site with basic HTML/CSS.
         | It's not even hard, just time consuming.
        
           | dark__paladin wrote:
           | TAA is temporal anti-aliasing, correct? There is no time
           | dimension here, isn't it just compression + bilinear
           | filtering?
        
       | daneel_w wrote:
       | Missing detail: 190 watt TDP.
        
       ___________________________________________________________________
       (page generated 2025-02-11 23:00 UTC)