[HN Gopher] Intel announces Arc B-series "Battlemage" discrete g...
       ___________________________________________________________________
        
       Intel announces Arc B-series "Battlemage" discrete graphics with
       Linux support
        
       Author : rbanffy
       Score  : 222 points
       Date   : 2024-12-03 17:19 UTC (5 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | rbanffy wrote:
       | For me, the most important feature is Linux support. Even if I'm
       | not a gamer, I might want to use the GPU for compute and buggy
       | proprietary drivers are much more than just an inconvenience.
        
         | zokier wrote:
         | Sure, but open drivers have been AMDs selling point for a
         | decade, and even nVidia is finally showing signs of opening up.
         | So it's bit dubious if these new Intels really can compete on
         | this front, at least for very long.
        
       | Night_Thastus wrote:
       | We'll have to wait for first-party benchmarks, but they seem
       | decent so far. A 4060 equivalent $200-$250 isn't bad at all. for
       | I'm curious if we'll get a B750 or B770 and how they'll perform.
       | 
       | At the very least, it's nice to have some decent BUDGET cards
       | now. The ~$200 segment has been totally dead for years. I have a
       | feeling Intel is losing a fair chunk of $ on each card though,
       | just to enter the market.
        
         | rbanffy wrote:
         | I'd love to see their GPGPU software support under Linux.
        
           | zargon wrote:
           | The keywords you're looking for are Intel basekit, oneapi,
           | and ipex.
           | 
           | https://christianjmills.com/posts/intel-pytorch-extension-
           | tu...
           | 
           | https://chsasank.com/intel-arc-gpu-driver-oneapi-
           | installatio...
        
       | jmclnx wrote:
       | >Battlemage is still treated to fully open-source graphics driver
       | support on Linux.
       | 
       | I am hoping these are open in such a manner that they can be used
       | in OpenBSD. Right now I avoid all hardware with a Nvidia GPU.
       | That makes for somewhat slim pickings.
       | 
       | If the firmware is acceptable to the OpenBSD folks, then I will
       | happly use these.
        
         | rbanffy wrote:
         | They are promising good Linux support, which kind of implies,
         | at least, that everything but opaque blobs are open.
        
       | rmm wrote:
       | I put an a360 Card into an old machine I turned into a plex
       | server. It turned it into a transcoding powerhouse. I can do
       | multiple indepdent streams now without it skipping a beat. Price-
       | performance ratio was off the chart
        
         | jeffbee wrote:
         | Interesting application. Was this a machine lacking an iGPU, or
         | does the Intel GPU-on-a-stick have more quicksync power than
         | the iGPU?
        
           | 6SixTy wrote:
           | A not inconsequential possibility is that both the iGPU and
           | dGPU are sharing the transcoding workload, rather than the
           | dGPU replacing the iGPU. It's a fairly forgotten feature of
           | Intel Arc, but I don't blame anyone because the help articles
           | are dusty to say the least.
        
         | kridsdale1 wrote:
         | Any idea how that compares to Apple Silicon for that job? I
         | bought the $599 MacBook Air with M1 as my plex server for this
         | reason. Transcodes 4k HEVC and doesn't even need a fan. Sips
         | watts.
        
           | 2OEH8eoCRo0 wrote:
           | All Intel arc even the $99 A310 has HW accel h265 and AV1
           | encoding.
        
           | machinekob wrote:
           | Apple Silicon still don't support AV1 encoding but it is good
           | enough for simple Jellyfin server i'm using one myself
        
         | ThatMedicIsASpy wrote:
         | My 7950X3Ds GPU does 4k HDR (33Mb/s) to 1080p at 40fps
         | (proxmox, jellyfin). If these GPUs would support SR-IOV I would
         | grab one for transcoding and GPU accelerated remote desktop.
         | 
         | Untouched video (star wars 8) 4k HDR (60Mb/s) to 1080p at 28fps
        
           | c2h5oh wrote:
           | All first gen arc gpus share the same video encoder/decoder,
           | including the sub-$100 A310, that can handle four (I haven't
           | tested more than two) simultaneous 4k HDR -> 1080p AV1
           | transcodes at high bitrate with tone mapping while using
           | 12-15W of power.
           | 
           | No SR-IOV.
        
         | baq wrote:
         | Intel has been a beast at transcoding for years, it's a
         | relatively niche application though.
        
         | 2OEH8eoCRo0 wrote:
         | How's the Linux compatibility? I was tempted to do the same for
         | my CentOS Stream Plex box.
        
       | Lapra wrote:
       | Unlabelled graphs are infuriating. Are the charts average
       | framerate? Mean framerate? Maximum framerate?
        
         | rbanffy wrote:
         | The two graphs on the page show FPS.
        
           | zamadatix wrote:
           | GP is asking what measure of FPS. The most likely value when
           | unspecified is usually "mean FPS" but, being a marketing
           | graph, it doesn't explicitly say.
        
       | confident_inept wrote:
       | I'm really curious to see if these still rely heavily on
       | resizable BAR. Putting these in old computers in linux without
       | reBAR support makes the driver crash with literally any load
       | rendering the cards completely unusable.
       | 
       | It's a real shame, the single slot a380 is a great performance
       | for price light gaming and general use card for small machines.
        
         | jeffbee wrote:
         | What is the newest platform that lacks resizable BAR? It was
         | standardized in 2006. Is 4060-level graphics performance useful
         | in whatever old computer has that problem?
        
           | bryanlarsen wrote:
           | Sandy Bridge (2009) is still a very usable CPU with a modern
           | GPU. In theory Sandy Bridge supported resizable BAR but in
           | practice they didn't. I think the problem was BIOS's.
        
             | stusmall wrote:
             | Oh wow. That's older than I thought. This is definitely
             | less of an issue than folks make out of it.
             | 
             | I cling onto my old hardware to limit ewaste where I can. I
             | still gave up on my old sandybridge machine once it hit
             | about a decade old. Not only would the CPU have trouble
             | keeping up, its mostly only PCIe 2.0. A few had 3.0. You
             | wouldn't get the full potential even out of the cheapest
             | one of these intel cards. If you are putting a GPU in a
             | system like that I can't imagine even buying new. Just get
             | something used off ebay.
        
             | 6SixTy wrote:
             | On paper any PCIe 2.0 motherboard can receive a BIOS update
             | adding ReBAR support with 2.1, but reality is that you
             | pretty much have to get a PCIe 3.0 motherboard to have any
             | chance of having it or modding it in yourself.
             | 
             | Another issue is that not every GPU actually supports
             | ReBAR, I'm reasonably certain the Nvidia drivers turn it
             | off for some titles, and pretty much the only vendor that
             | reliably wants ReBAR on at all times is Intel Arc.
             | 
             | I also personally wouldn't say that Sandy Bridge is very
             | usable with a modern GPU without also specifying what kind
             | of CPU or GPU. Or context in how it's being used.
        
             | vel0city wrote:
             | My old Ice Lake CPU was very much a bottleneck in lots of
             | games in 2018 when I finally replaced it. It was a
             | noticeable improvement across the board making the jump to
             | a Zen+ CPU at the time, even with the same GPU.
        
           | vel0city wrote:
           | Ryzen 2000 series processors don't support AMD's "Smart
           | Access Memory" which is pretty much resizable BAR. That's
           | 2018.
           | 
           | Coffee Lake also didn't really support ReBAR either, also
           | 2018.
        
           | tremon wrote:
           | The newest platform is probably POWER10. ReBar is not
           | supported on any POWER platform, most likely including the
           | upcoming POWER11.
           | 
           | Also, I don't think you'll find many mainboards from 2006
           | supporting it. It may have been standardized in 2006, but a
           | quick online search leads me to think that even on x86
           | mainboards it didn't become commonly available until at least
           | 2020.
        
             | jeffbee wrote:
             | Congrats on a pretty niche reply. I wonder if literally
             | anyone has tried to put an ARC dGPU in a POWER system.
             | Maybe someone from Libre-SOC will chime in.
        
             | Palomides wrote:
             | do you have a reference for power rebar support? just
             | curious, I couldn't find anything with a quick look
        
             | IshKebab wrote:
             | Oh no... my POWER gaming rig... no..
        
           | babypuncher wrote:
           | ReBAR was standardized in 2006 but consumer motherboards
           | didn't start shipping with an option to enable it until much
           | later, and didn't start turning it on by default until a few
           | years ago.
        
       | mushufasa wrote:
       | what's the current status of using cuda on non-gpu chips?
       | 
       | IIRC that was one of the original goals of geohot's tinybox
       | project, though I'm not sure exactly where that evolved
        
       | mtlmtlmtlmtl wrote:
       | Bit disappointed there's no 16gig(or more) version. But
       | absolutely thrilled the rumours of Intel discrete graphics'
       | demise were wildly exaggerated(looking at you, Moore's Law is
       | Dead...).
       | 
       | Very happy with my A770. Godsend for people like me who want
       | plenty VRAM to play with neural nets, but don't have the money
       | for workstation GPUs or massively overpriced Nvidia flagships.
       | Works painlessly with linux, gaming performance is fine, price
       | was the first time I haven't felt fleeced buying a GPU in many
       | years. Not having CUDA does lead to some friction, but I think
       | nVidia's CUDA moat is a temporary situation.
       | 
       | Prolly sit this one out unless they release another SKU with 16G
       | or more ram. But if Intel survives long enough to release
       | Celestial, I'll happily buy one.
        
         | khimaros wrote:
         | have you tested llama.cpp with this card on Linux? when i
         | tested about a year ago, it was a nightmare.
        
           | mtlmtlmtlmtl wrote:
           | A few months ago, yeah. Had to set an environment
           | variable(added to the ollama systemd unit file), but
           | otherwise it worked just fine.
        
       | gs17 wrote:
       | > Intel with their Windows benchmarks are promoting the Arc B580
       | as being 24% faster than the Intel Arc A750
       | 
       | Not a huge fan of the numbering system they've used. B > A
       | doesn't parse as easily as 5xxx > 4xxx to me.
        
         | vesrah wrote:
         | They're going in alphabetical order: A - Alchemist B -
         | Battlemage C - Celestial (Future gen) D - Druid (Future gen)
        
           | bee_rider wrote:
           | Hey we complained about all the numbers in their product
           | names. Getting names from the D&D PHB is... actually very
           | cool, no complaints.
        
           | gs17 wrote:
           | Yes, I understand that. I'm saying it doesn't read as easily
           | IMO as (modern) NVIDIA/AMD model numbers. Most numbers I deal
           | with are base-10, not base-36.
        
             | BadHumans wrote:
             | The naming scheme they are using is easier to parse for me
             | so all in the eye of the beholder.
        
             | baq wrote:
             | You aren't using excel or sheets I see?
        
             | Ekaros wrote:
             | On other hand considering Geforce is 3rd loop of base 10
             | maybe it is not so bad... Radeon is on other hand a pure
             | absolute mess... Going back same 20 years.
             | 
             | I kinda like the idea of Intel.
        
       | CoastalCoder wrote:
       | Given Intel's recent troubles, I'm trying to decide how risky it
       | is to invest in their platform. Especially discrete GPUs for
       | Linux gaming
       | 
       | Fortunately, having their Linux drivers be (mostly?) open source
       | makes a purchase seem less risky.
        
         | babypuncher wrote:
         | I can't speak from experience with their GPUs on Linux, but I
         | know on Windows most of their problems stem from supporting
         | pre-DX12 Direct3D titles. Nvidia and AMD have spent many years
         | polishing up their Direct3D support and putting in driver-side
         | hacks that paper over badly programmed Direct3D games.
         | 
         | These are obviously Windows-specific issues that don't come up
         | at all in Linux, where all that Direct3D headache is taken care
         | of by DXVK. Amusingly a big part of Intel's efforts to improve
         | D3D performance on Windows has been to use DXVK for many
         | titles.
        
         | beAbU wrote:
         | Intel isn't going anywhere for at least a couple of hardware
         | genrations. Buying a GPU is also not "investing" in anything.
         | In 2 years' time you can replace it whith whatever is best
         | value for money at that time.
        
           | CoastalCoder wrote:
           | > Buying a GPU is also not "investing" in anything.
           | 
           | It is in the (minor) sense that I'd rely on Intel for
           | warranty support, driver updates (if closed source), and
           | firmware fixes.
           | 
           | But I agree with your main point that the worst-case downside
           | isn't that big of a deal.
        
             | throwaway48476 wrote:
             | There's no way you're going to maintain and develop the
             | intel linux driver as a solo dev.
        
               | CoastalCoder wrote:
               | > There's no way you're going to maintain and develop the
               | intel linux driver as a solo dev.
               | 
               | I agree entirely.
               | 
               | My point was that even if Intel disappeared tomorrow,
               | there's a good chance that Linux developer _community_
               | would take over maintenance of those drivers.
               | 
               | In contrast to, e.g., 10-years-ago nvidia, where IIUC it
               | was very difficult for outsiders to obtain the
               | documentation needed to write proper drivers for their
               | GPUs.
        
       | Scene_Cast2 wrote:
       | I wonder how many transistors it has and what the chip size it
       | is.
       | 
       | For power, it's 190W compared to 4060's 115 W.
       | 
       | EDIT: from [1]: B580 has 21.7 billion transistors at 406 mm2 die
       | area, compared to 4060's 18.9 billion and 146 mm2. That's a big
       | die.
       | 
       | [1] https://www.techpowerup.com/gpu-specs/arc-b580.c4244
        
         | zokier wrote:
         | > Both the Arc B580 and B570 are based on the "BMG-G21" a new
         | monolithic silicon built on the TSMC 5 nm EUV process node. The
         | silicon has a die-area of 272 mm2, and a transistor count of
         | 19.6 billion
         | 
         | https://www.techpowerup.com/review/intel-arc-b580-battlemage...
         | 
         | These numbers seem bit more believable
        
       | Archit3ch wrote:
       | They say the best predictor for the future is the past.
       | 
       | How was driver support for their A-series?
        
         | Night_Thastus wrote:
         | Drivers were _very_ rough at launch. Some games didn 't run at
         | all, some basic functionality and configuration either crashes
         | or failed to work, some things ran very poorly, etc. However,
         | it was essentially all ironed out over many months of work.
         | 
         | They likely won't need to do the same discovery and fixing for
         | B-series as they've already dealt with it.
        
       | treprinum wrote:
       | Why don't they just release a basic GPU with 128GB RAM and eat
       | NVidia's local generative AI lunch? The networking effect of all
       | devs porting their LLMs etc. to that card would instantly put
       | them as a major CUDA threat. But beancounters running the company
       | would never get such an idea...
        
         | 01HNNWZ0MV43FF wrote:
         | Judging by the number of 16 GB laptops I see around, 128 GB of
         | RAM would probably cost a bajillion dollars
        
           | gs17 wrote:
           | One of the great things about having a desktop is being able
           | to get that much for under $300 instead of the price of a
           | second laptop.
        
           | qwytw wrote:
           | Not the laptop RAM. It costs pennies, Apple's is just
           | charging $200 for 12GB because they can. It's way too slow
           | though..
           | 
           | And Nvidia doesn't want to cannibalize its high end chips but
           | putting more memory into consumer ones.
        
         | gs17 wrote:
         | Even 24 or 32 GB for an accessible price would sell out fast.
         | NVIDIA wants $2000 for the 5090 to get 32.
        
         | Numerlor wrote:
         | 48 GB is at the tail end of what's reasonale for normal GPUs.
         | The IO requires a lot of die space. And intel's architecture is
         | not very space efficient right now compared to nvidia's
        
           | jsheard wrote:
           | > The IO requires a lot of die space.
           | 
           | And even if you spend a lot of die space on memory
           | controllers, you can only fit so many GDDR chips around the
           | GPU core while maintaining signal integrity. HBM sidesteps
           | that issue but it's still too expensive for anything but the
           | highest end accelerators, and the ordinary LPDDR that Apple
           | uses is lacking in bandwidth compared to GDDR, so they have
           | to compensate with ginormous amounts of IO silicon. The M4
           | Ultra is expected to have similar bandwidth to a 4090 but the
           | former will need a 1024bit bus to get there while the latter
           | is only 384bit.
        
             | Numerlor wrote:
             | Going off of how the 4090 and 7900 xtx is arranged I think
             | you could maybe fit on or two chips more around the die
             | over their 12, but that's still a far cry from 128. That
             | would probably just need a shared bus like normal DDR as
             | you're not fitting that much with 16 gbit density
        
               | SmellTheGlove wrote:
               | What if we did what others suggested was the practical
               | limit - 48GB. Then just put 2-3 cards in the system and
               | maybe had a little bridge over a separate bus for them to
               | communicate?
        
         | rapsey wrote:
         | Who manufactures the type of RAM and can they buy enough
         | capacity? I know nVidia bought up the high bandwidth memory
         | supply for years to come.
        
         | wtallis wrote:
         | Just how "basic" do you think a GPU can be while having the
         | capability to interface with that much DRAM? Getting there with
         | GDDR6 would require a _really_ wide memory bus even if you
         | could get it to operate with multiple ranks. Getting to 128GB
         | with LPDDR5x would be possible with the 256-bit bus width they
         | used on the top parts of the last generation, but would result
         | in having half the bandwidth of an already mediocre card.
         | "Just add more RAM" doesn't work the way you wish it could.
        
           | treprinum wrote:
           | M3/M4 Max MacBooks with 128GB RAM are already way better than
           | an A6000 for very large local LLMs. So even if the GPU is as
           | slow as the one in M3/M4 Max (<3070), and using some basic
           | RAM like LPDDR5x it would still be way faster than anything
           | from NVidia.
        
             | kevingadd wrote:
             | Are you suggesting that Intel 'just' release a GPU at the
             | same price point as an M4 Max SOC? And that there would be
             | a large market for it if they did so? Seems like an
             | extremely niche product that would be demanding to
             | manufacture. The M4 Max makes sense because it's a complete
             | system they can sell to Apple's price-insensitive audience,
             | Intel doesn't have a captive market like that to sell
             | bespoke LLM accelerator cards to yet.
             | 
             | If this hypothetical 128GB LLM accelerator was also a
             | capable GPU that would be more interesting but Intel hasn't
             | proven an ability to execute on that level yet.
        
               | treprinum wrote:
               | Nothing in my comment says about pricing it at the M4 Max
               | level. Apple charges as much because they can (typing
               | this on an $8000 M3 Max). 128GB LPDDR5 is dirt cheap
               | these days just Apple adds its premium because they like
               | to. Nothing prevents Intel from releasing a basic GPU
               | with that much RAM for under $1k.
        
               | wtallis wrote:
               | You're asking for a GPU die at least as large as NVIDIA's
               | TU102 that was $1k in 2018 when paired with only 11GB of
               | RAM (because $1k couldn't get you a fully-enabled die to
               | use 12GB of RAM). I think you're off by at least a factor
               | of two in your cost estimates.
        
               | treprinum wrote:
               | Intel has Xeon Phi which was a spin-off of their first
               | attempt at GPU so they have a lot of tech in place they
               | can reuse already. They don't need to go with GDDRx/HBMx
               | designs that require large dies.
        
               | ksec wrote:
               | I don't want to further this discussions but may be you
               | dont realise some of the people who replied to you either
               | design hardware for a living or has been in the hardware
               | industry for longer than 20 years.
        
               | treprinum wrote:
               | For some reason Apple did it with M3/M4 Max likely by
               | folks that are also on HN. The question is how many of
               | the years spent designing HW were spent also by educating
               | oneselves on the latest best ways to do it.
        
               | ksec wrote:
               | >For some reason.....
               | 
               | They already replied with an answer.
        
               | wtallis wrote:
               | Even LPDDR requires a large die. It only takes things out
               | of the realm of technologically impossible to merely
               | economically impractical. A 512-bit bus is still very
               | inconveniently large for a single die.
        
               | m00x wrote:
               | It's also impossible and it would need to be a CPU.
               | 
               | CPUs and GPUs access memory very differently.
        
             | jsheard wrote:
             | The M4 Max needs an enormous 512bit memory bus to extract
             | enough bandwidth out of those LPDDR5x chips, while the GPUs
             | that Intel just launched are 192/160bit and even flagships
             | rarely exceed 384bit. They can't just slap more memory on
             | the board, they would need to dedicate significantly more
             | silicon area to memory IO and drive up the cost of the
             | part, assuming their architecture would even scale that
             | wide without hitting weird bottlenecks.
        
               | p1esk wrote:
               | Apple could do it. Why can't Intel?
        
               | jsheard wrote:
               | Because Apple isn't playing the same game as everyone
               | else. They have the money and clout to buy out TSMCs
               | bleeding-edge processes and leave everyone else with the
               | scraps, and their silicon is only sold in machines with
               | extremely fat margins that can easily absorb the BOM cost
               | of making huge chips on the most expensive processes
               | money can buy.
        
               | p1esk wrote:
               | Bleeding edge processes is what Intel specializes in.
               | Unlike Apple, they don't need TSMC. This should have been
               | a huge advantage for Intel. Maybe that's why Gelsinger
               | got the boot.
        
               | AlotOfReading wrote:
               | Intel Arc hardware is manufactured by TSMC, specifically
               | on N6 and N5 for this latest announcement.
               | 
               | Intel doesn't currently have nodes competitive with TSMC
               | or excess capacity in their better processes.
        
               | jsheard wrote:
               | Intel's foundry side has been floundering so hard that
               | they've resorted to using TSMC themselves in an attempt
               | to keep up with AMD. Their recently launched CPUs are a
               | mix of Intel-made and TSMC-made chiplets, but the latter
               | accounts for most of the die area.
        
               | duskwuff wrote:
               | > Bleeding edge processes is what Intel specializes in.
               | Unlike Apple, they don't need TSMC.
               | 
               | Intel literally outsourced their Arrow Lake manufacturing
               | to TSMC because they couldn't fabricate the parts
               | themselves - their 20A (2nm) process node never reached a
               | production-ready state, and was eventually cancelled
               | about a month ago.
        
               | p1esk wrote:
               | OK, so the question becomes: TSMC could do it. Why can't
               | Intel?
        
               | BonoboIO wrote:
               | They are trying ... for like 10 years
        
               | wtallis wrote:
               | These days, Intel merely specializes in bleeding
               | processes. They spent far too many years believing the
               | unrealistic promises from their fab division, and in the
               | past few years they've been suffering the consequences as
               | the problems are too big to be covered up by the cost
               | savings of vertical integration.
        
               | JBiserkov wrote:
               | > and their silicon is only sold in machines with
               | extremely fat margins
               | 
               | Like the brand new Mini that cost 600 USD and went to 500
               | during Black week.
        
               | dragontamer wrote:
               | Because LPDDR5x is soldered on RAM.
               | 
               | Everyone else wants configurable RAM that scales both
               | down (to 16GB) and up (to 2TB), to cover smaller laptops
               | and bigger servers.
               | 
               | GPUs with soldered on RAM has 500GB/sec bandwidths, far
               | in excess of Apples chips. So the 8GB or 16GB offered by
               | NVidia or AMD is just far superior at vid o game graphics
               | (where textures are the priority)
        
               | jsheard wrote:
               | > GPUs with soldered on RAM has 500GB/sec bandwidths, far
               | in excess of Apples chips.
               | 
               | Apple is doing 800GB/sec on the M2 Ultra and should reach
               | about 1TB/sec with the M4 Ultra, but that's still lagging
               | behind GPUs. The 4090 was already at the 1TB/sec mark two
               | years ago, the 5090 is supposedly aiming for 1.5TB/sec,
               | and the H200 is doing _5TB /sec._
        
               | dragontamer wrote:
               | HBM is kind of not fair lol. But 4096-line bus is gonna
               | have more bandwidth than any competitor.
               | 
               | It's pretty expensive though.
               | 
               | The 500GB/sec number is for a more ordinary GPU like the
               | B580 Battlemage in the $250ish price range. Obviously the
               | $2000ish 4090 will be better, but I don't expect the
               | typical consumer to be using those.
        
               | kimixa wrote:
               | But an on-package memory bus has some of the advantages
               | of HBM, just to a lesser extent, so it's arguably
               | comparable as an "intermediate stage" between RAM chips
               | and HBM. Distances are shorter (so voltage drop and
               | capacitance are lower, so can be driven at lower power),
               | routing is more complex but can be worked around by more
               | layers, which increases cost but on a _significantly_
               | smaller area than required for dimms, and the dimms
               | connections themselves can hurt performance (reflection
               | from poor contacts, optional termination makes things
               | more complex, and the expectations of mix-and-match for
               | dimm vendors and products likely reduce fine tuning
               | possibilities).
               | 
               | There's pretty much a direct opposite scaling between
               | flexibility and performance - dimms > soldered ram > on-
               | package ram > die-interconnects.
        
               | Der_Einzige wrote:
               | It doesn't matter if the "cost is driven up". Nvidia has
               | proven that we're all lil pay pigs for them. 5090 will be
               | 3000$ for 32gb of VRAM. Screenshot this now, it will age
               | well.
               | 
               | We'd be happy to pay 5000 for 128gb from Intel.
        
               | pixelpoet wrote:
               | You are absolutely correct, and even my non-prophetic ass
               | echoed exactly the first sentence of the top comment in
               | this HN thread ("Why don't they just release a basic GPU
               | with 128GB RAM and eat NVidia's local generative AI
               | lunch?").
               | 
               | Yes, yes, it's not trivial to have a GPU with 128gb of
               | memory with cache tags and so on, but is that really in
               | the same universe of complexity of taking on Nvidia and
               | their CUDA / AI moat any other way? Did Intel ever give
               | the impression they don't know how to design a cache?
               | There really has to be a GOOD reason for this, otherwise
               | everyone involved with this launch is just plain stupid
               | or getting paid off to not pursue this.
               | 
               | Saying all this with infinite love and 100% commercial
               | support of OpenCL since version 1.0, a great enjoyer of
               | A770 with 16GB of memory, I live to laugh in the face of
               | people who claimed for over 10 years that OpenCL is
               | deprecated on MacOS (which I cannot stand and will never
               | use, yet the hardware it runs on...) and still routinely
               | crushes powerful desktop GPUs, in reality and practice
               | today.
        
               | timschmidt wrote:
               | Both Intel and AMD produce server chips with 12 channel
               | memory these days (that's 12x64bit for 768bit) which
               | combined with DDR5 can push effective socket bandwidth
               | beyond 800GB/s, which is well into the area occupied by
               | single GPUs these days.
               | 
               | You can even find some attractive deals on
               | motherboard/ram/cpu bundles built around grey market
               | engineering sample CPUs on aliexpress with good reports
               | about usability under Linux.
               | 
               | Building a whole new system like this is not exactly as
               | simple as just plugging a GPU into an existing system,
               | but you also benefit from upgradeability of the memory,
               | and not having to use anything like CUDA. llamafile, as
               | an example, really benefits from AVX-512 available in
               | recent CPUs. LLMs are memory bandwidth bound, so it
               | doesn't take many CPU cores to keep the memory bus full.
               | 
               | Another benefit is that you can get a large amount of
               | usable high bandwidth memory with a relatively low total
               | system power usage. Some of AMD's parts with 12 channel
               | memory can fit in a 200W system power budget. Less than a
               | single high end GPU.
        
               | pixelpoet wrote:
               | My desktop machine has had 128gb since 2018, but for the
               | AI workloads currently commanding almost infinite market
               | value, it really needs the 1TB/s bandwidth and teraflops
               | that only a bona fide GPU can provide. An early AMD GPU
               | with these characteristics is the Radeon VII with 16gb
               | HBM, which I bought for 500 eur back in 2019 (!!!).
               | 
               | I'm a rendering guy, not an AI guy, so I really just want
               | the teraflops, but all GPU users urgently need a 3rd
               | market player.
        
               | timschmidt wrote:
               | That 128gb is hanging off a dual channel memory bus with
               | only 128 total bits of bandwidth. Which is why you need
               | the GPU. The Epyc and Xeon CPUs I'm discussing have 6x
               | the memory bandwidth, and will trade blows with that GPU.
        
               | pixelpoet wrote:
               | At a mere 20x the cost or something, to say nothing about
               | the motherboard etc :( 500 eur for 16GB of 1TB/s with
               | tons of fp32 (and even fp64! The main reason I bought it)
               | back in 2019 is no joke.
               | 
               | Believe me, as a lifelong hobbyist-HPC kind of person, I
               | am absolutely dying for such a HBM/fp64 deal again.
        
               | timschmidt wrote:
               | $1,961.19: H13SSL-N Motherboard And EPYC 9334 QS CPU +
               | DDR5 4*128GB 2666MHZ REG ECC RAM Server motherboard kit
               | 
               | https://www.aliexpress.us/item/3256807766813460.html
               | 
               | Doesn't seem like 20x to me. I'm sure spending more than
               | 30 seconds searching could find even better deals.
        
               | pixelpoet wrote:
               | Isn't 2666 MHz ECC RAM obscenely slow? 32 cores without
               | the fast AVX-512 of Zen5 isn't what anyone is looking for
               | in terms of floating point throughput (ask me about
               | electricity prices in Germany), and for that money I'd
               | rather just take a 4090 with 24GB memory and do my own
               | software fixed point or floating point (which is exactly
               | what I do personally and professionally).
               | 
               | This is exactly what I meant about Intel's recent launch.
               | Imagine if they went full ALU-heavy on latest TSMC
               | process and packaged 128GB with it, for like, 2-3k Eur.
               | Nvidia would be whipping their lawyers to try to do
               | something about that, not just their engineers.
        
               | timschmidt wrote:
               | I don't think anyone's stopping you, buddy. Great chat. I
               | hope you have a nice evening.
        
               | mirekrusin wrote:
               | Me too, probably 2x. I'd sell like hot cakes.
        
               | jsheard wrote:
               | The question is whether there's enough overall demand for
               | a GPU architecture with 4x the VRAM of a 5090 but only
               | about 1/3rd of the bandwidth. At that point it would only
               | really be good for AI inferencing, so why not make
               | specialized inferencing silicon instead?
        
               | mandelken wrote:
               | I genuinely wonder why no one is doing this? Why can't I
               | buy this specialized AI inference silicon with plenty of
               | VRAM?
        
               | hughesjj wrote:
               | Man, I'm old enough to remember when 512 was a thing for
               | consumer cards back when we had 4-8gb memory
               | 
               | Sure that was only gddr5 and not gddr6 or lpddr5, but I
               | would have bet we'd be up to 512bit again 10 years down
               | the line..
               | 
               | (I mean supposedly hbm3 has done 1024-2048bit busses but
               | that seems more research or super high end cards, not
               | consumer)
        
               | jsheard wrote:
               | Rumor is the 5090 will be bringing back the 512bit bus,
               | for a whopping 1.5TB/sec bandwidth.
        
               | CoastalCoder wrote:
               | > The M4 Max needs an enormous 512bit memory bus to
               | extract enough bandwidth out of those LPDDR5x chips
               | 
               | Does M4 Max have 64-byte cache lines?
               | 
               | If they can fetch or flush an entire cache line in a
               | single memory-bus transaction, I wonder if that opens up
               | any additional hardware / performance optimizations.
        
               | modeless wrote:
               | The memory controller would be bigger, and the cost would
               | be higher, but not radically higher. It would be an
               | attractive product for local inference even at triple the
               | current price and the development expense would be 100%
               | justified if it helped Intel get _any_ kind of foothold
               | in the ML market.
        
             | wtallis wrote:
             | That would basically mean Intel doubling the size of their
             | current GPU die, with a different memory PHY. They're
             | clearly not ready to make that an affordable card. Maybe
             | when they get around to making a chiplet-based GPU.
        
           | amelius wrote:
           | What if they put 8 identical GPUs in the package, each with
           | 1/8 the memory? Would that be a useful configuration for a
           | modern LLM?
        
             | keyboard_slap wrote:
             | It could work, but would it be cost-competitive?
        
               | rini17 wrote:
               | Also, cooling.
        
             | ben_w wrote:
             | Last I've heard, the architecture makes that difficult. But
             | my information may be outdated, and even if it isn't, I'm
             | not a hardware designer and may have just misunderstood the
             | limits I hear others discuss.
        
             | numeri wrote:
             | GPU inference is always a balancing act, trying to avoid
             | bottlenecks on memory bandwidth (loading data from the
             | GPU's global memory/VRAM to the much smaller internal
             | shared memory, where it can be used for calculations) and
             | compute (once the values are loaded).
             | 
             | Splitting the model up between several GPUs would add a
             | third much worse bottleneck - memory bandwidth between the
             | GPUs. No matter how well you connect them, it'll be slower
             | than transfer within a single GPU.
             | 
             | Still, the fact that you can fit an 8x larger GPU might be
             | worth it to you. It's a trade-off that's almost universally
             | made while training LLMs (sometimes even with the model
             | split down both its width and length), but is much less
             | attractive for inference.
        
               | amelius wrote:
               | > Splitting the model up between several GPUs would add a
               | third much worse bottleneck - memory bandwidth between
               | the GPUs.
               | 
               | What if you allowed the system to only have a shared
               | memory between every neighboring pair of GPUs?
               | 
               | Would that make sense for an LLM?
        
             | treprinum wrote:
             | K80 used to be two glued K40 but their interconnect was
             | barely faster than PCIe so it didn't have much benefit as
             | one had to move stuff between two internal GPUs anyway.
        
           | ksec wrote:
           | Thank You Wtallis. Somewhere along the line, this basic
           | "knowledge" of hardware is completely lost. I dont expect
           | this to be explained in any comment section on old Anandtech.
           | It seems hardware enthusiast has mostly disappeared, I guess
           | that is also why Anandtech closed. We now live in a world
           | where most site are just BS rumours.
        
             | ethbr1 wrote:
             | That's because Anand Lal Shimpi is a CompE by training.
             | 
             | Not too many hardware enthusiast site editors have that
             | academic background.
             | 
             | And while fervor can sometimes substitute for education...
             | probably not in microprocessor / system design.
        
         | chessgecko wrote:
         | GDDR isnt like the ram that connects to cpu, it's much more
         | difficult and expensive to add more. You can get up to 48GB
         | with some expensive stacked gddr, but if you wanted to add more
         | stacks you'd need to solve some serious signal timing related
         | headaches that most users wouldn't benefit from.
         | 
         | I think the high memory local inference stuff is going to come
         | from "AI enabled" cpus that share the memory in your computer.
         | Apple is doing this now, but cheaper options are on the way. As
         | a shape its just suboptimal for graphics, so it doesn't make
         | sense for any of the gpu vendors to do it.
        
           | treprinum wrote:
           | They can use LPDDR5x, it would still massively accelerate
           | inference of large local LLMs that need more than 48GB RAM.
           | Any tensor swapping between CPU RAM and GPU RAM kills the
           | performance.
        
             | chessgecko wrote:
             | I think we don't really disagree, I just think that this
             | shape isn't really a gpu its just a cpu because it isn't
             | very good for graphics at that point.
        
               | treprinum wrote:
               | That's why I said "basic GPU". It doesn't have to be too
               | fast but it should still be way faster than a regular
               | CPU. Intel already has Xeon Phi so a lot of things were
               | developed already (like memory controller, heavy parallel
               | dies etc.)
        
               | chessgecko wrote:
               | I guess it's hard to know how well this would compete
               | with integrated gpus, especially at a reasonable
               | pricepoint. If you wanted to spend $4000+ on it, it could
               | be very competitive and might look something like nvidias
               | grace-hopper superchip, but if you want the product to be
               | under $1k I think it might be better just to buy separate
               | cards for your graphics and ai stuff.
        
           | smcleod wrote:
           | As someone else said - I don't think you have to have GDDR,
           | surely there are other options. Apple does a great job of it
           | on their APUs with up to 192GB, even an old AMD Threadripper
           | chip can do quite well with its DDR4/5 performance
        
             | chessgecko wrote:
             | For ai inference you definitely have other options, but for
             | low end graphics? the lpddr that apple (and nvidia in
             | grace) use would be super expensive to get a comparable
             | bandwidth (think $3+/gb and to get 500GB/sec you need at
             | least 128GB).
             | 
             | And that 500GB/sec is pretty low for a gpu, its like a 4070
             | but the memory alone would add $500+ to the cost of the
             | inputs, not even counting the advanced packaging (getting
             | those bandwidths out of lpddr needs organic substrate).
             | 
             | It's not that you can't, just when you start doing this it
             | stops being like a graphics card and becomes like a cpu.
        
         | beAbU wrote:
         | They are probably held back by same reason thats preventing AMD
         | and nVidia from doing it either.
        
           | treprinum wrote:
           | NVidia and AMD make $$$ on datacenter GPUs so it makes sense
           | they don't want to discount their own high-end. Intel has
           | nothing there so they can happily go for commodization of AI
           | hardware like what Meta did when releasing LLaMA to the wild.
        
             | beAbU wrote:
             | Is nVidia or AmD offering 128gb cards in any configuration?
        
               | latchkey wrote:
               | They aren't "cards" but MI300x has 192GB and MI325x has
               | 256GB.
        
               | phkahler wrote:
               | You can run an AMD APU with 128GB of shared RAM.
        
               | treprinum wrote:
               | It's too slow and not very compatible.
        
           | bryanlarsen wrote:
           | The reason is AMD and Nvidia don't is that they don't want to
           | cannibalize their high end AI market. Intel doesn't have a
           | high end AI market to protect.
        
             | fweimer wrote:
             | There are products like this one: https://www.intel.com/con
             | tent/www/us/en/products/sku/232592/...
             | 
             | As far as I understand it, it gives you 64 GiB of HBM per
             | socket.
        
         | Muskyinhere wrote:
         | Because if they could just do that and it would rival what
         | NVidia has, they would just do it.
         | 
         | But obvoiusly they don't.
         | 
         | And for reasons: NVidia has worked on CUDA for ages, do you
         | believe they just replace this whole thing in no time?
        
           | treprinum wrote:
           | llama.cpp and its derivatives say yes.
        
             | pjmlp wrote:
             | A fraction of CUDA capabilities.
        
               | treprinum wrote:
               | Sufficient for LLMs and image/video gen.
        
               | m00x wrote:
               | FLUX.1 D generation is about a minute at 20 steps on a
               | 4080, but takes 35 minutes on the CPU.
        
               | treprinum wrote:
               | 4080 won't do video due to low RAM. The GPU doesn't have
               | to be as fast there, it can be 5x slower which is still
               | way faster than a CPU. And Intel can iterate from there.
        
               | m00x wrote:
               | It won't be 5x slower, it would be 20-50x slower if you
               | would implement it as you said.
               | 
               | You can't just "add more ram" to GPUs and have them work
               | the same way. Memory access is completely different than
               | on CPUs.
        
               | Der_Einzige wrote:
               | Not even close. Llama.cpp isn't even close to a
               | production ready LLM inference engine, and it runs
               | overwhelmingly faster when using CUDA
        
               | pjmlp wrote:
               | A fraction of what a GPU is used for.
        
             | m00x wrote:
             | This is the most script kiddy comment I've seen in a while.
             | 
             | llama.cpp is just inference, not training, and the CUDA
             | backend is still the fastest one by far. No one is even
             | close to matching CUDA on either training or inference. The
             | closest is AMD with ROCm, but there's likely a decade of
             | work to be done to be competitive.
        
               | treprinum wrote:
               | Inference on very large LLMs where model + backprop
               | exceed 48GB is already way faster on a 128GB MacBook than
               | on NVidia unless you have one of those monstrous Hx00s
               | with lots of RAM which most devs don't.
        
               | m00x wrote:
               | Because the CPU has to load the model in parts for every
               | cycle so you're spending a lot of time on IO and it
               | offsets processing.
               | 
               | You're talking about completely different things here.
               | 
               | It's fine if you're doing a few requests at home, but if
               | you're actually serving AI models, CUDA is the only
               | reasonable choice other than ASICs.
        
               | treprinum wrote:
               | My comment was about Intel having a starter project,
               | getting enthusiastic response from devs, network effects
               | and iterate from there. They need a way to threaten
               | Nvidia and just focusing on what they can't do won't
               | bring them there. There is one route where they can
               | disturb Nvidia's high end over time and that's a cheap
               | basic GPU with lots of RAM. Like Ryzen 1st gen whose
               | single core performance was two generations behind Intel
               | trashed Intel by providing 2x as many cores for cheap.
        
               | m00x wrote:
               | It would be a good idea to start with some basic
               | understanding of GPU, and realizing why this can't easily
               | be done.
        
               | treprinum wrote:
               | That's a question M3 Max with its internal GPU already
               | answered. It's not like I didn't do any HPC or CUDA work
               | in the past to be completely clueless about how GPUs work
               | though I haven't created those libraries myself.
        
               | Muskyinhere wrote:
               | No one is running LLMs on consumer NVidia GPUs or apple
               | MacBooks.
               | 
               | A dev, if they want to run local models, probably run
               | something which just fits on a proper GPU. For everything
               | else, everyone uses an API key from whatever because its
               | fundamentaly faster.
               | 
               | IF a affordable intel GPU would be relevant faster for
               | inferencing, is not clear at all.
               | 
               | A 4090 is at least double the speed of Apples GPU.
        
               | treprinum wrote:
               | 4090 is 5x faster than M3 Max 128GB according to my tests
               | but it can't even inference LLaMA-30B. The moment you hit
               | that memory limit the inference is suddenly 30x slower
               | than M3 Max. So a basic GPU with 128GB RAM would trash
               | 4090 on those larger LLMs.
        
               | m00x wrote:
               | Do you have the code for that test?
        
               | treprinum wrote:
               | I ran some variation of llama.cpp that could handle large
               | models by running portion of them on GPU and if too
               | large, the rest on CPU and those were the results. Maybe
               | I can dig it from some computer at home but it was almost
               | like a year ago when I got M3 Max with 128GB RAM.
        
               | yumraj wrote:
               | Yes, and inference is a huge market in itself and
               | potentially larger than training (gut feeling haven't run
               | numbers)
               | 
               | Keep NVIDIA for training and Intel/AMD/Cerebras/... for
               | interference.
        
               | Muskyinhere wrote:
               | NVidia Blackwell is not just a GPU. Its a Rack with a
               | interconnect through a custom Nvidia based Network.
               | 
               | And it needs liquid cooling.
               | 
               | You don't just plugin intel cards 'out of the box'.
        
               | m00x wrote:
               | Inference is still a lot faster on CUDA than on CPU. It's
               | fine if you run it at home or on your laptop for privacy,
               | but if you're serving those models at any scale, you're
               | going to be using GPUs with CUDA.
               | 
               | Inference is also a much smaller market right now, but
               | will likely be overtaken later as we have more people
               | using the models than competing to train the best one.
        
               | latchkey wrote:
               | The funny thing about Cerebras is that it doesn't scale
               | well at all for inference and if you talk to them in
               | person, they are currently making all their money on
               | training workloads.
        
           | Wytwwww wrote:
           | Does CUDA even matter than much for LLMs? Especially
           | inference? I don't think software would be the limiting
           | factor for this hypothetical GPU. Afterall it would be
           | competing with Apple's M chips not with the 4090 or Nvidia's
           | enterprise GPUs.
        
             | Der_Einzige wrote:
             | It's the only thing that matters. Folks act like AMD
             | support is there because suddenly you can run the most
             | basic LLM workload. Try doing anything actually interesting
             | (i.e, try running anything cool in the mechanistic
             | interoperability or representation/attention engineer
             | world) with AMD and suddenly everything broken, nothing
             | works, and you have to spend millions worth of AI engineer
             | developer time to try to salvage a working solution.
             | 
             | Or you can just buy Nvidia.
        
         | heraldgeezer wrote:
         | This is a gaming card. Look at benchmarks.
        
         | whatudb wrote:
         | Meta comment: "why don't they just" phrase usually indicates
         | significant ignorance about a subject, it's better to learn a
         | little bit before dispensing criticism about beancounters or
         | whatnot.
         | 
         | In this case, the die I/O limits precludes more than a
         | reasonable number of DDR channels.
        
         | FuriouslyAdrift wrote:
         | HBM3E memory is at least 3x the price of DDR5 (it requires 3x
         | the wafer as DDR5) and capacity is sold out for all of 2025
         | already... that's the price and production bottleneck.
         | 
         | High speed, low latency server grade DDR5 is around $800-$1600
         | for 128GB. Triple that for $2400 - $4800 just for the memory.
         | Still need the GPUs/APUs, card, VRMs, etc.
         | 
         | Even the nVidia H100 with "only" 94GB starts at $30k...
        
           | adventured wrote:
           | Nvidia's $30,000 is a 90% margin product at scale. They could
           | charge 1/3 that and still be very profitable. There has
           | rarely been such a profitable large corporation in terms of
           | the combo of profit & margin.
           | 
           | Their last quarter was $35b in sales and $26b in gross profit
           | ($21.8b op income; 62% op income margin vs sales).
           | 
           | Visa is notorious for their extreme margin (66% op income
           | margin vs sales) due to being basically a brand + transaction
           | network. So the fact that a hardware manufacturer is hitting
           | those levels is truly remarkable.
           | 
           | It's very clear that either AMD or Intel could accept far
           | lower margins to go after them. And indeed that's exactly
           | what will be required for any serious attempt to cut into
           | their monopoly position.
        
             | talldayo wrote:
             | > And indeed that's exactly what will be required for any
             | serious attempt to cut into their monopoly position.
             | 
             | You misunderstand why and how Nvidia is a monopoly. Many
             | companies make GPUs, and all those GPUs _can_ be used for
             | computation if you develop compute shaders for them. This
             | part is not the problem, _you can already_ go buy cheaper
             | hardware that outperforms Nvidia if price is your only
             | concern.
             | 
             | Software is the issue. That's it - it's CUDA and nothing
             | else. You cannot assail Nvidia's position, and moreover
             | their hardware's value, without a really solid reason for
             | datacenters to own them. Datacenters do not want to own
             | GPUs because once the AI bubble pops they'll be bagholders
             | for Intel and AMD's depreciated software. Nvidia hardware
             | can at least crypto mine, or be leased out to industrial
             | customers that have their own remote CUDA applications. The
             | demand for generic GPU compute is basically nonexistent,
             | the reason this market exists at all is because CUDA
             | exists, and you cannot turn over Nvidia's foothold without
             | accepting that fact.
             | 
             | The only way the entire industry can fuck over Nvidia is if
             | they choose to invest in a complete CUDA replacement like
             | OpenCL. That is the only way that Nvidia's value can be
             | actually deposed without any path of recourse for their
             | business, and it will never happen because every single one
             | of Nvidia's competitors hate each other's guts and would
             | rather watch each other die in gladiatorial combat than
             | help each other fight the monster. And Jensen Huang
             | probably revels in it, CUDA is a hedged bet against the
             | industry ever working together for common good.
        
               | adventured wrote:
               | I do not misunderstand why Nvidia has a monopoly. You
               | jumped drastically beyond anything I was discussing and
               | incorrectly assumed ignorance on my part. I never said
               | why I thought they had one. I never brought up matters of
               | performance or software or moats at all. I matter of fact
               | stated they had a monopoly, you assumed the rest.
               | 
               | It's impossible to assail their monopoly without
               | utilizing far lower prices, coming up under their extreme
               | margin products. It's how it is almost always done
               | competitively in tech (see: ARM, or Office (dramatically
               | undercut Lotus with a cheaper inferior product), or
               | Linux, or Huawei, or Chromebooks, or Internet Explorer,
               | or just about anything).
               | 
               | Note: I never said lower prices is all you'd need. Who
               | would think that? The implication is that I'm ignorant of
               | the entire history of tech, it's a poor approach to
               | discussion with another person on HN frankly.
        
               | talldayo wrote:
               | Nvidia's monopoly is pretty much detached from price at
               | this point. That's the entire reason _why_ they can
               | charge insane margins - nobody cares! There is not a
               | single business squaring Nvidia up with serious intent to
               | take down CUDA. It 's been this way for nearly two
               | decades at this point, with not a single spark of hope to
               | show for it.
               | 
               | In the case of ARM, Office, Linux, Huawei, and ChromeOS,
               | these were all _actual_ alternatives to the incumbent
               | tools people were familiar with. You can directly compare
               | Office and Lotus because they are fundamentally similar
               | products - ARM had a real chance against x86 because wasn
               | 't a complex ISA to unseat. Nvidia is not analogous to
               | these businesses because they occupy a league of their
               | own as the provider of CUDA. It's not exaggeration to say
               | that they have completely seceded from the market of GPUs
               | and can sustain themselves on demand from crypto miners
               | and AI pundits alone.
               | 
               | AMD, Intel and even Apple have bigger things to worry
               | about than hitting an arbitrary price point, if they want
               | Nvidia in their crosshairs. All of them have already
               | solved the "sell consumer tech at attractive prices"
               | problem but not the "make it complex, standardize it and
               | scale it up" problem.
        
               | DSingularity wrote:
               | I feel people are exaggerating the impossibility of
               | replacing CUDA. Adopting CUDA is convenient right now
               | because yes it is difficult to replace it. Barrier to
               | entry for orgs that can do that is very high. But it has
               | been done. Google has the TPU for example.
        
               | Der_Einzige wrote:
               | They're not exaggerating it. The more things change, the
               | more they stay the same. Nvidia and AMD had the exact
               | same relationship 15 years ago that they do today. The
               | AMD crowd clutching about their better efficiencies, and
               | the Nvidia crowd having grossly superior
               | drivers/firmware/hardware, including unique PhysX stuff
               | that STILL has not been matched since 2012 (remember
               | Planetside 2 or Broderlands 2 physics? Pepperidge Farm
               | Remembers...)
               | 
               | So many billions of dollars and no one is even 1% close
               | to displacing CUDA in any meaningful way. ZULDA is dead.
               | ROCM is a meme, Scale is a meme. Either you use CUDA or
               | you don't do meaningful AI work.
        
               | talldayo wrote:
               | The TPU is not a GPU nor is it commercially available. It
               | is a chip optimized around a limited featureset with a
               | limited software layer on top of it. It's an impressive
               | demonstration on Google's behalf to be sure, but it's
               | also not a shot across the bow at Nvidia's business.
               | Nvidia has the TSMC relations, a refined and complex
               | streaming multiprocessor architecture and _actual_
               | software support their customers can go use today. TPUs
               | haven 't quite taken over like people anticipated
               | anyways.
               | 
               | I don't personally think CUDA is impossible to replace -
               | but I do think that everyone capable of replacing CUDA
               | has been ignoring it recently. Nvidia's role as the GPGPU
               | compute people is secure for the foreseeable future.
               | Apple wants to design _simpler_ GPUs, AMD wants to design
               | cheaper GPUs, and Intel wants to pretend like they can
               | compete with AMD. Every stakeholder with the capacity to
               | turn this ship around is pretending like Nvidia doesn 't
               | exist and whistling until they go away.
        
             | Der_Einzige wrote:
             | Thank you for laying it out. It's so silly to see people in
             | the comments act like Intel or Nvidia can't EASILY add more
             | VRAM to their cards. Every single argument against it is
             | all hogwash.
        
             | arcticbull wrote:
             | Visa doesn't actually make a ton of money off each
             | transaction, if you divide out their revenue against their
             | payment volume (napkin math)...
             | 
             | They processed $12T in payments last year (almost a billion
             | payments per day), with a net revenue of $32B. That's a
             | gross transaction margin of 0.26% and their GAAP net income
             | was half that, about 0.14%. [1]
             | 
             | They're just a transaction network, unlike say Amex which
             | is both an issuer and a network. Being just the network is
             | more operationally efficient.
             | 
             | [1] https://annualreport.visa.com/financials/default.aspx
        
               | oivey wrote:
               | That's a weird way to account for their business size.
               | There isn't a significant marginal cost per transaction.
               | They didn't sell $12T in products. They facilitated that
               | much in payments. Their profits are fantastic.
        
         | elorant wrote:
         | AMD has a 192GB GPU. I don't see them eating NVidia's lunch
         | with it.
        
           | treprinum wrote:
           | They are charging as much as Nvidia for it. Now imagine they
           | offered such a card for $2k. Would that allow them to eat
           | Nvidia's lunch?
        
             | p1esk wrote:
             | We would also need to imagine AMD fixing their software.
        
               | treprinum wrote:
               | I think plenty of enthusiastic open source devs would
               | jump at it and fix their software if the software was
               | reasonably open. The same effect as what happened when
               | Meta released LLaMA.
        
               | jjmarr wrote:
               | It is open and they regularly merge PRs.
               | 
               | https://github.com/ROCm/ROCm/pulls?q=is%3Apr+is%3Aclosed
        
               | treprinum wrote:
               | AMD GPUs aren't very attractive to ML folks because they
               | don't outshine Nvidia in any single aspect. Blasting lots
               | of RAM onto a GPU would make it attractive immediately
               | with lots of attention from devs occupied with more
               | interesting things.
        
             | latchkey wrote:
             | If you want to load up 405B @ FP_16 into a single H100 box,
             | how do you do it? You get two boxes. 2x the price.
             | 
             | Models are getting larger, not smaller. This is why H200
             | has more memory, but the same exact compute. MI300x vs.
             | MI325x... more memory, same compute.
        
             | elorant wrote:
             | Let's say for the sake of argument that you could build
             | such a card and sell it for less than $5k. Why would you do
             | it? You know there's huge demand in the tens of billions
             | per quarter for high end cards. Why undercut so heavily
             | that market? To overthrow NVidia? So you'll end up with a
             | profit margin way low and then your shareholders will eat
             | you alive.
        
         | daft_pink wrote:
         | Totally agree. Someone needs to exploit the lack of available
         | gpu memory in graphics cards for model runners. Even training
         | tensors tends to run against memory issues with the current
         | cards.
        
         | zamalek wrote:
         | I think a better idea would be an NPU with slower memory, or
         | tie it to the system DDR. I don't think consumer inference
         | (possibly even training) applications would need the memory
         | bandwidth offered by GDDR/HBM. Inference on my 7950x is already
         | stupid fast (all things considered).
         | 
         | The deeper problem is that the market for this is probably
         | incredibly niche.
        
         | m3kw9 wrote:
         | Because they can't
        
         | Sparkyte wrote:
         | Because you can't stack that much ram on a GPU without
         | sufficient channels to do so. You could probably do 64GB on
         | GDDR6 but you can't do 128GB on GDDR6 without more memory
         | channels. 2GB per chip per channel is the current limit for
         | GDDR6 this is why HBM was invented.
         | 
         | It is why you can only see GPUs with 24GB of memory at the
         | moment.
         | 
         | HBM2 can handle 64GB ( 4 x 8GB Stack ) ( Total capacity 128GB )
         | 
         | HBM3 can handle 192GB ( 4 x 24GB Stack ) ( Total capacity 384GB
         | )
         | 
         | You can not do this with GDDR6.
        
         | bayindirh wrote:
         | Disclosure: HPC admin who works with NIVIDA cards here.
         | 
         | Because, no. It's not as simple as that.
         | 
         | NVIDIA has a complete ecosystem now. They have cards. They have
         | cards of cards (platforms), which they produce, validate and
         | sell. They have NVLink crossbars and switches which connects
         | these cards on their card of cards with very high speeds and
         | low latency.
         | 
         | For inter-server communication they have libraries which
         | coordinate cards, workloads and computations.
         | 
         | They bought Mellanox, but that can be used by anyone, so
         | there's no lock-in for now.
         | 
         | As a tangent, NVIDIA has a whole set of standards for pumping
         | tremendous amount of data in and out of these mesh of cards.
         | Let it be GPU-Direct storage or specialized daemons which
         | handle data transfers on and off cards.
         | 
         | If you think that you can connect n cards on PCIe bus and just
         | send workloads to them and solve problems magically, you'll
         | hurt yourself a lot, both performance and psychology wise.
         | 
         | You have to build a stack which can perform these things with
         | maximum possible performance to be able to compute with NVIDIA.
         | It's not just emulating CUDA, now. Esp., on the high end of the
         | AI spectrum (GenAI, MultiCard, MultiSystem, etc.).
         | 
         | For other lower end, multi-tenant scenarios, they have card
         | virtualization, MIG, etc. for card sharing. You have to
         | complete on that, too, for cloud and smaller applications.
        
           | dgfitz wrote:
           | How does any of this make money?
        
             | arcticbull wrote:
             | Having the complete ecosystem affords them significant
             | margins.
        
               | dgfitz wrote:
               | Against what?
        
               | arcticbull wrote:
               | As of today they have SaaS company margins as a hardware
               | company which is practically unheard of.
        
             | lyime wrote:
             | What?
             | 
             | It's like the most profitable set of products in tech. You
             | have companies like Meta, MSFT, Amazon, Google etc spending
             | $5B every few years buying this hardware.
        
               | dgfitz wrote:
               | Stale money is moving around. Nothing changed .
        
               | HeatrayEnjoyer wrote:
               | What is stale money?
        
               | dgfitz wrote:
               | Hmm. There is a lot of money that exists, doing nothing.
               | I consider that stale money.
               | 
               | Edit: I can't sort this out. Where did all the money go?
        
             | bayindirh wrote:
             | When this walled garden is the only way to use GPUs with
             | high efficiency and everybody is using this stack, and
             | NVIDIA controlling the supply of these "platform boards" to
             | OEMs, they don't make money, but they literally print it.
             | 
             | However, AMD is coming for them because a couple of high
             | profile supercomputer centers (LUMI, Livermore, etc.) are
             | using Instinct cards and pouring money to AMD to improve
             | their cards and stack.
             | 
             | I have not used their (Instinct) cards, yet, but their
             | Linux driver architecture is way better than NVIDIA.
        
           | throwaway48476 wrote:
           | All of that is highly relevant for training but what the
           | poster was asking for is a desktop inference card.
        
             | bayindirh wrote:
             | You use at least half of this stack for desktop setups. You
             | need copying daemons, the ecosystem support (docker-nvidia,
             | etc.), some of the libraries, etc. even when you're on a
             | single system.
             | 
             | If you're doing inference on a server; MIG comes into play.
             | If you're doing inference on a larger cloud, GPU-direct
             | storage comes into play.
             | 
             | It's all modular.
        
               | WanderPanda wrote:
               | No you don't need much bandwidth between cards for
               | inference
        
               | bayindirh wrote:
               | Copying daemons (gdrcopy) is about pumping data in and
               | out of a single card. docker-nvidia and rest of the stack
               | is enablement for using cards.
               | 
               | GPU-Direct is about pumping data from storage devices to
               | cards, esp. from high speed storage systems across
               | networks.
               | 
               | MIG actually shares a single card to multiple instances,
               | so many processes or VMs can use a single card for
               | smaller tasks.
               | 
               | Nothing I have written in my previous comment is related
               | to inter-card, inter-server communication, but all are
               | related to disk-GPU, CPU-GPU or RAM-CPU communication.
               | 
               | Edit: I mean, it's not OK to talk about downvoting, and
               | downvote as you like but, I install and enable these
               | cards for researchers. I know what I'm installing and
               | what it does. C'mon now. :D
        
               | mikhael wrote:
               | Mostly, I think, we don't really understand your argument
               | that Intel couldn't easily replicate the parts needed
               | only for inference.
        
               | landryraccoon wrote:
               | It's possible you're underestimating the open source
               | community.
               | 
               | If there's a competing platform that hobbyists can tinker
               | with, the ecosystem can improve quite rapidly, especially
               | when the competing platform is completely closed and
               | hobbyists basically are locked out and have no
               | alternative.
        
               | throwaway48476 wrote:
               | Innovation is a bottom up process. If they sell the
               | hardware the community will spring up to take advantage.
        
               | bayindirh wrote:
               | > It's possible you're underestimating the open source
               | community.
               | 
               | On the contrary. You really don't know how I love and
               | prefer open source and love a more leveling playing
               | field.
               | 
               | > If there's a competing platform that hobbyists can
               | tinker with...
               | 
               | AMD's cards are better from hardware and software
               | architecture standpoint, but the performance is not there
               | yet. Plus, ROCm libraries are not that mature, but
               | they're getting there. Developing high performance, high
               | quality code is deceivingly expensive, because it's very
               | heavy in theory, and you fly _very close_ to the metal. I
               | did that in my Ph.D., so I know what it entails. So it
               | requires more than a couple (hundred) hobbyists to pull
               | off (see the development of Eigen linear algebra library,
               | or any high end math library).
               | 
               | Some big guns are pouring money into AMD to implement
               | good ROCm libraries, and it started paying off (Debian
               | has a ton of ROCm packages now, too). However, you need
               | to be able to pull it off in the datacenter to be able to
               | pull it off on the desktop.
               | 
               | AMD also needs to be able to enable ROCm on desktop
               | properly, so people can start hacking it at home.
               | 
               | > especially when the competing platform is completely
               | closed...
               | 
               | NVIDIA gives a lot of support to universities,
               | researchers and institutions who play with their cards.
               | Big cards may not be free, but know-how, support and
               | first steps are always within reach. Plus, their
               | researchers dogfood their own cards, and write papers
               | with them.
               | 
               | So, as long as papers got published, researchers do their
               | research, and something got invented, many people don't
               | care about how open source the ecosystem is. This upsets
               | me a ton, but when closed source AI companies and
               | researchers who forget to add crucial details to their
               | papers so what they did can't be reproduced don't care
               | about open source, because they think like NVIDIA. "My
               | research, my secrets, my fame, my money".
               | 
               | It's not about sharing. It's about winning, and it's ugly
               | in some aspects.
        
               | phkahler wrote:
               | No. I've been reading up. I'm planning to run Flux 12b on
               | my AMD 5700G with 64GB RAM. CPU will take 5-10minutes per
               | image which will be fine for me tinkering while writing
               | code. Maybe I'll be able to get the GPU going on it too.
               | 
               | Point of the OP is this is entirely possible with even an
               | iGPU if only we have the RAM. nVidia _should be_
               | irrelevant for local inference.
        
           | postalrat wrote:
           | Lets see how quickly that changes if intel releases cards
           | with massive amounts of ram for a fraction of the cost.
        
           | jmward01 wrote:
           | Most of the above infra is predicated on limiting RAM so that
           | you need so much communication between cards. Bump the RAM up
           | and you could do single card inference and all those
           | connections become overhead that could have gone to more ram.
           | For training there is an argument still, but even there the
           | more RAM you have the less all that connectivity gains you.
           | RAM has been used to sell cards and servers for a long time
           | now, it is time to open the floodgates.
        
             | foobiekr wrote:
             | Correct for inference - the main use of the interconnect is
             | RDMA requests between GPUs to fit models that wouldn't
             | otherwise fit.
             | 
             | Not really correct for training - training has a lot of
             | all-to-all problems, so hierarchical reduction is useful
             | but doesn't really solve the incast problem - Nvlink
             | _bandwidth_ is less of an issue than perhaps the SHARP
             | functions in the NVLink switch ASICs.
        
           | epistasis wrote:
           | Rather than tackling the entire market at once, they could
           | start with one section and build from there. NVIDIA didn't
           | get to where it was in a year, it took many strategic
           | acquisitions. (All the networking and other HPC-specialized
           | stuff I was buying a decade ago has seemingly been bought by
           | NVIDIA).
           | 
           | Start by being a "second vendor" for huge customers of NVIDIA
           | that want to foster competition, as well as a few others
           | willing to take risks, and build from there.
        
           | teekert wrote:
           | I have a question for you, since I'm somewhat entering the
           | HPC world. In the EU the EuroHPC-JU is building what they
           | call AI factories, afaict these are just batch processing
           | (Slurm I think) clusters with GPUs in the nodes. So I wonder
           | where you'd place those cards of cards. Are you saying there
           | is another, perhaps better ways to use massive amounts of
           | these cards? Or is that still in the "super powerful
           | workstation" domain? Thanx in advance.
        
             | treprinum wrote:
             | View it as Raspberry Pi for AI workloads. Initial stage is
             | for enthusiasts that would develop the infra, figure out
             | what is possible and spread the word. Then the next phase
             | will be SME industry adoption, making it commercially
             | interesting, while bypassing Nvidia completely. At some
             | point it would live its own life and big players jump in.
             | Classical disrupt strategy via low cost unique offerings.
        
         | segmondy wrote:
         | They don't need to do 128gb, 48gb+ would eat their lunch, Intel
         | and AMD are sleeping.
        
       | ThatMedicIsASpy wrote:
       | SR-IOV is supported on their iGPUs and outside of it exclusive to
       | their enterprise offering. Give it to me on desktop and I'll buy.
        
         | throwaway48476 wrote:
         | Intel is allergic to competition.
        
       | karmakaze wrote:
       | I wanted to have alternative choices than Nvidia for high power
       | GPUs. Then the more I thought about it, the more it made sense to
       | rent cloud services for AI/ML workloads and lesser powered ones
       | for gaming. The only use cases I could come up with for wanting
       | high-end cards are 4k gaming (a luxury I can't justify for
       | infrequent use) or for PC VR which may still be valid if/when a
       | decent OLED (or mini-OLED) headset is available--the Sony PSVR2
       | with PC adapter is pretty close. The Bigscreen Beyond is also a
       | milestone/benchmark.
        
         | oidar wrote:
         | Which video card are you using for PSVR?
        
           | karmakaze wrote:
           | I haven't decided/pulled-the-trigger but the Intel ARC series
           | are giving the AMD parts a good run for the money.
           | 
           | The only concern is how well the new Intel drivers work (full
           | support for DX12) with older titles which are continuously
           | being improved (for DX11, 10, and some for 9 others via
           | emulation).
           | 
           | There's likely some deep discounting of Intel cards because
           | of how bad the drivers were at launch and the prices may not
           | stay so low once things are working much better.
        
         | gigaflop wrote:
         | Don't rent a GPU for gaming, unless you're doing something like
         | a full-on game streaming service. +10ms isn't much for some
         | games, but would be noticeable on plenty.
         | 
         | IMO you want those frames getting rendered as close to the
         | monitor as possible, and you'd probably have a better time with
         | lower fidelity graphics rendered locally. You'd also get to
         | keep gaming during a network outage.
        
           | babypuncher wrote:
           | I don't even think network latency is the real problem, it's
           | all the buffering needed to encode a game's output to a video
           | stream and keep it v-synced with a network-attached display.
           | 
           | I've tried game streaming under the best possible conditions
           | (<1ms network latency) and it still feels a little off.
           | Especially shooters and 2D platformers.
        
             | oidar wrote:
             | Yeah - there's no way to play something like
             | Overwatch/Fornite on a streaming service and have a good
             | time. The only things that seems to be ok is turned based
             | or platformers.
        
           | karmakaze wrote:
           | Absolutely. By "and lesser powered ones for gaming" I meant
           | purchase.
        
       | BadHumans wrote:
       | I'm considering getting one to replace my 8 year old NVIDIA card
       | but why are there 2 SKUs almost identical in price?
        
         | layer8 wrote:
         | Binning.
         | 
         | https://en.wikipedia.org/wiki/Product_binning#Core_unlocking
        
       | tcdent wrote:
       | If they were serious about AI they would have published TOPS
       | stats at at least float32 and bfloat16.
       | 
       | The lack of quantified stats on the marketing pages tells me
       | Intel is way behind.
        
       | andrewstuart wrote:
       | Intel can't compete head to head with Nvidia on performance.
       | 
       | But surely it's easy enough to compete on video ram - why not
       | load their GPUs to the max with video ram?
       | 
       | And also video encoder cores - Intel has a great video encoder
       | core and these vary little across high end to low end GPUs - so
       | they could make it a standout feature to have, for example, 8
       | video encoder cores instead of 2.
       | 
       | It's no wonder Nvidia is the king because AMD and Intel just
       | don't seem willing to fight.
        
         | AndrewDucker wrote:
         | Which market segment wants to encode 8 streams at once for
         | cheap, and how big is it?
        
       | hx8 wrote:
       | I like Intel's aggressive pricing against entry/mid level GPUs,
       | which hopefully puts downward pressure on all GPUs. Overall,
       | their biggest concern is software support. We've had reports of
       | certain DX11/12 games failing to run properly on Proton, and the
       | actual performance of the A series varied greatly between games
       | even on Windows. I suspect we'll see the same issues when the
       | B580 gets proper third party benchmarking.
       | 
       | Their dedication to Linux Support, combined with their good
       | pricing makes this a potential buy for me in future versions. To
       | be frank, I won't be replacing my 7900 XTX with this. Intel needs
       | to provide more raw power in their cards and third parties need
       | to improve their software support before this captures my
       | business.
        
       | Sparkyte wrote:
       | Intel over there with two spears in the knees looking puzzled and
       | in pain.
        
       | smcleod wrote:
       | 12GB of vRAM? What a wasted opportunity.
        
         | machinekob wrote:
         | For lowest end GPU? (and 2k gaming?) It is plenty even for most
         | 4k games.
        
           | smcleod wrote:
           | Gaming sure, but not for GPU compute
        
             | machinekob wrote:
             | You most likely would buy 700x series for compute
        
       | declan_roberts wrote:
       | I think a graphics card tailored for 2k gaming is actually great.
       | 2k really is the goldilocks zone between 4k and 1080p graphics
       | before you start creeping into diminishing returns.
        
         | icegreentea2 wrote:
         | 2k usually refers to 1080p no? The k is the approximate
         | horizontal resolution, so 1920x1080 is definitely 2k enough.
        
           | antisthenes wrote:
           | 2k Usually refers to 2560x1440.
           | 
           | 1920x1080 is 1080p.
           | 
           | It doesn't make a whole lot of sense, but that's how it is.
        
             | ortusdux wrote:
             | https://en.wikipedia.org/wiki/2K_resolution
        
               | nightski wrote:
               | That's amusing because I think almost everyone I know
               | confuses it with 1440p. I've never heard of 2k being used
               | for 1080p before.
        
               | Retric wrote:
               | "In consumer products, 2560 x 1440 (1440p) is sometimes
               | referred to as 2K,[13] but it and similar formats are
               | more traditionally categorized as 2.5K resolutions."
        
             | seritools wrote:
             | 1440p is colloquially referred to as 2.5K, not 2K.
        
               | vundercind wrote:
               | It'd be pretty weird if it were called 2k. 1080p is in an
               | absolute sense or as a relative "distance" to the next-
               | lowest thousand _closer_ to 2k pixels of width than 4k is
               | to 4k (both are under, of course, but one 's under by 80
               | pixels, one by 160). It's got a much better claim to the
               | label 2k than 1440p does, and arguably a somewhat better
               | claim to 2k than 4k has to 4k.
               | 
               | [EDIT] I mean, of course, 1080p's also not typically
               | called that, yet another resolution is, but labeling
               | 1440p 2k is especially far off.
        
               | mkl wrote:
               | You are misunderstanding. 1080p, 1440p, 2160p refer to
               | the number of _rows_ of pixels, and those terms come from
               | broadcast television and computing (the p is progressive,
               | vs i for interlaced). 4k, 2k refer to the number of
               | _columns_ of pixels, and those terms come from cinema and
               | visual effects (and originally means 4096 and 2048 pixels
               | wide). That means 1920x1080 is both 2k _and_ 1080p,
               | 2560x1440 is both 2.5k and 1440p, and 3840x2160 is both
               | 4k and 2160p.
        
               | vundercind wrote:
               | > You are misunderstanding. 1080p, 1440p, 2160p refer to
               | the number of rows of pixels
               | 
               | > (the p is progressive, vs i for interlaced)
               | 
               | > 4k, 2k refer to the number of columns of pixels
               | 
               | > 2560x1440 is both 2.5k and 1440p, and 3840x2160 is both
               | 4k and 2160p.
               | 
               | These parts I did not misunderstand.
               | 
               | > and those terms come from cinema and visual effects
               | (and originally means 4096 and 2048 pixels wide)
               | 
               | OK that part I didn't know, or at least had forgotten--
               | which are effectively the same thing, either way.
               | 
               | > 1920x1080 is both 2k and 1080p
               | 
               | Wikipedia suggests that in this particular case (unlike
               | with 4k) application of "2k" to resolutions other than
               | the original cinema resolution (2048x1080) is unusual;
               | moreover, I was responding to a commenter's usage of "2k"
               | as synonymous with "1440p", which seemed especially odd
               | to me.
        
               | nemomarx wrote:
               | I have never seen 2.5k used in the wild (gamer forums
               | etc) so it can't be that colloquial.
        
           | layer8 wrote:
           | Actual use is inconsistent. From
           | https://en.wikipedia.org/wiki/2K_resolution: " _In consumer
           | products, 2560 x 1440 (1440p) is sometimes referred to as 2K,
           | but it and similar formats are more traditionally categorized
           | as 2.5K resolutions._ "
           | 
           | "2K" is used to denote WQHD often enough, whereas 1080p is
           | usually called that, if not "FHD".
           | 
           | "2K" being used to denote resolutions lower than WQHD is
           | really only a thing for the 2048 cinema resolutions, not for
           | FHD.
        
             | declan_roberts wrote:
             | TIL
        
         | giobox wrote:
         | For sure its been a sweet spot for a very long time for budget
         | conscious gamers looking for best balance of price and frame
         | rates, but 1440p optimized parts are nothing new. Both NVidia
         | and AMD make parts that target 1440p display users too, and
         | have done for years. Even previous Intel parts you can argue
         | were tailored for 1080p/1440p use, given their comparative
         | performance deficit at 4k etc.
         | 
         | Assuming they retail at prices Intel are suggesting in the
         | press releases, you maybe here save 40-50 bucks over an
         | ~equivalent NVidia 4060.
         | 
         | I would also argue like others here that with tech like frame
         | gen, DLSS etc, even the cheapest discrete NVidia 40xx parts are
         | arguably 1440p optimized now, it doesn't even need to be said
         | in their marketing materials. Im not as familiar with AMD's
         | range right now, but I suspect virtually every discrete
         | graphics card they sell is "2k optmized" by the standard Intel
         | used here, and also doesn't really warrant explicit mention.
        
           | philistine wrote:
           | I'm baffled that PC gamers have decided that 1440p is the
           | endgame for graphics. When I look at a 27-inch 1440p display,
           | I see pixel edges everywhere. It's right at the edge of
           | losing the visibility of individual pixels, since I can't
           | perceive them at 27-inch 2160p, but not quite there yet for
           | desktop distances.
           | 
           | Time marches on, and I become ever more separated from gaming
           | PC enthusiasts.
        
             | wing-_-nuts wrote:
             | I used to be in the '4k or bust' camp, but then I realized
             | that I needed 1.5x scaling on a 27" display to have my UI
             | at a comfy size. That put me right back at 1440p screen
             | real estate _and_ you had to deal with fractional scaling
             | issues.
             | 
             | Instead, I bought a good 27" 1440p monitor, and you know
             | what? I am not the discerning connoisseur of pixels that I
             | thought I was. Honestly, it's _fine_.
             | 
             | I will hold out with this setup until I can get a 8k 144hz
             | monitor and a gpu to drive it for a reasonable price. I
             | expect that will take another decade or so.
        
               | doubled112 wrote:
               | I have a 4K 43" TV on my desk and it is about perfect for
               | me for desktop use without scaling. For gaming, I tend to
               | turn it down to 1080p because I like frames and don't
               | want to pay up.
               | 
               | At 4K, it's like having 4 21" 1080p monitors. Haven't
               | maximized or minimized a window in years. The sprawl is
               | real.
        
             | layer8 wrote:
             | This is a trade-off with frame rates and rendering quality.
             | When having to choose, most gamers prefer higher frame rate
             | and rendering quality. With 4K, that becomes very
             | expensive, if not impossible. 4K is 2.25 times the pixels
             | of 1440p, which for example means you can get double the
             | frame rate with 1440p using the same processing power and
             | bandwidth.
             | 
             | In other words, the current tech just isn't quite there
             | yet, or not cheap enough.
        
               | gdwatson wrote:
               | Arguably 1440p is the sweet spot for gaming, but I love
               | 4k monitors for the extra text sharpness. Fortunately
               | DLSS and FSR upscaling are pretty good these days. At 4k,
               | quality-mode upscaling gives you a native render
               | resolution about 1440p, with image quality a little
               | better and performance a little worse.
               | 
               | It's a great way to have my cake and eat it too.
        
             | Novosell wrote:
             | Gaming at 2160p is just too expensive still, imo. You gotta
             | pay more for your monitor, GPU and PSU. Then if you want
             | side monitors that match in resolution, you're paying more
             | for those as well.
             | 
             | You say PC gamers at the start of your comment and gaming
             | PC enthusiasts at the end. These groups are not the same
             | and I'd say the latter is largely doing ultrawide, 4k
             | monitor or even 4k TV.
             | 
             | According to steam, 56% are on 1080p, 20% on 1440p and 4%
             | on 2160p.
             | 
             | So gamers as a whole are still settled on 1080p, actually.
             | Not everyone is rich.
        
               | semi-extrinsic wrote:
               | I'm still using a 50" 1080p (plasma!) television in my
               | living room. It's close to 15 years old now. I've seen
               | newer and bigger TVs many times at my friends house, but
               | it's just not _better enough_ that I can be bothered to
               | upgrade.
        
               | dmonitor wrote:
               | Doesn't plasma have deep blacks and color reproduction
               | similar to OLED? They're still very good displays, and
               | being 15 years old means it probably pre-dates the
               | SmartTV era.
        
               | philistine wrote:
               | > You say PC gamers at the start of your comment and
               | gaming PC enthusiasts at the end. These groups are not
               | the same
               | 
               | Prove to me those aren't synonyms.
        
               | Novosell wrote:
               | Prove to me they are.
        
               | dmonitor wrote:
               | The major drawback for PC gaming at 4k that I never see
               | mentioned is how much _heat_ the panels generate. Many of
               | them generate so much heat that rely on active cooling! I
               | bought a pair of high refresh 4k displays and combined
               | with the PC, they raised my room to an uncomfortable
               | temperature. I returned them for other reasons (hard to
               | justify not returning them when I got laid off a week
               | after purchasing them), but I 've since made note of the
               | wattage when scouting monitors.
        
               | evantbyrne wrote:
               | Not rich. Well within reach for Americans with expendable
               | income. Mid range 16" macbook pros are in the same price
               | ballpark as 4k gaming rigs. Or put another way costs less
               | than a vacation for two to a popular destination.
        
             | wlesieutre wrote:
             | I don't think it's seen as the end game, it's that if you
             | want 120 fps (or 144, 165, or 240) without turning down
             | your graphics settings you're talking $1000+ GPUs plus a
             | huge case and a couple hundreds watts higher on your power
             | supply.
             | 
             | 1440p hits a popular balance where it's more pixels than
             | 1080p but not so absurdly expensive or power hungry.
             | 
             | Eventually 4K might be reasonably affordable, but we'll
             | settle at 1440p for a while in the meantime like we did at
             | 1080p (which is still plenty popular too).
        
             | dingnuts wrote:
             | if you can see the pixels on a 27 inch 1440p display,
             | you're just sitting too close to the screen lol
        
               | philistine wrote:
               | I don't directly see the pixels per se like on 1080p at
               | 27-inch at desktop distances. But I see harsh edges in
               | corners and text is not flawless like on 2160p.
               | 
               | Like I said, it's on the cusp of invisible pixels.
        
             | Lanolderen wrote:
             | It's a nice compromise for semi competitive play. On 4k
             | it'd be very expensive and most likely finicky to maintain
             | high FPS.
             | 
             | Tbh now that I think about it I only really _need_
             | resolution for general usage. For gaming I 'm running
             | everything but textures on low with min or max FOV
             | depending on the game so it's not exactly aesthetic anyway.
             | I more so need physical screen size so the heads are
             | physically larger without shoving my face in it and refresh
             | rate.
        
           | goosedragons wrote:
           | Nvidia markets the 4060 as a 1080p card. It's design makes it
           | worse at 1440p than past X060 cards too. Intel has XeSS to
           | compete with DLSS and are reportedly coming out with their
           | own frame gen competitor. $40-50 is a decent savings in the
           | budget market especially if Intel's claims are to believed
           | and it's actually faster than the 4060.
        
         | leetharris wrote:
         | I see what you're saying, but I also feel like ALL Nvidia cards
         | are "2K" oriented cards because of DLSS, frame gen, etc.
         | Resolution is less important now in general thanks to their
         | upscaling tech.
        
         | laweijfmvo wrote:
         | Can it compete with the massive used GPU market though? Why buy
         | a new Intel card when I can get a used Nvidia card that I know
         | will work well?
        
           | teaearlgraycold wrote:
           | To some, buying used never crosses their mind.
        
         | teaearlgraycold wrote:
         | Please say 1440p and not 2k. Ignoring arguments about what 2k
         | _should_ mean, there's enough use either way that it's
         | confusing.
        
       | Implicated wrote:
       | 12GB memory
       | 
       | -.-
       | 
       | I feel like _anyone_ who can pump out GPU's with 24GB+ of memory
       | that are capable to use for py-stuff would benefit greatly.
       | 
       | Even if it's not as performant as the NVIDIA options - just to be
       | able to get the models to run, at whatever speed.
       | 
       | They would fly off the shelves.
        
         | cowmix wrote:
         | 100% - _this_ could be Intel 's ticket to capture the hearts of
         | developers and then everything else that flows downstream. They
         | have nothing to lose here -- just do it Intel!
        
           | bagels wrote:
           | They could lose a lot of money?
        
             | flockonus wrote:
             | They already do... google $INTC, stare in disbelief in the
             | right side "Financials".
             | 
             | At some point they should make a stand, that's the whole
             | meta-topic of this thread.
        
         | evanjrowley wrote:
         | Maybe that's not too bad for someone who wants to use pre-
         | existing models. Their AI Playground examples require at
         | minimum an Intel Core Ultra H CPU, which is quite low-powered
         | compared to even these dedicated GPUs:
         | https://github.com/intel/AI-Playground
        
         | elorant wrote:
         | Would it though? How many people are running inference at home?
         | Outside of enthusiasts I don't know anyone. Even companies
         | don't self-host models and prefer to use APIs. Not that I
         | wouldn't like a consumer GPU with tons of VRAM, but I think
         | that the market for it is quite small for companies to invest
         | building it. If you bother to look at Steam's hardware stats
         | you'll notice that only a small percentage is using high-end
         | cards.
        
           | ModernMech wrote:
           | It's a chicken and egg scenario. The main problem with
           | running inference at home is the lack of hardware. If the
           | hardware was there more people would do it. And it's not a
           | problem if "enthusiasts" are the only ones using it because
           | that's to be expected at this stage of the tech cycle. If the
           | market is small just charge more, the enthusiasts will pay
           | it. Once more enthusiasts are running inference at home, then
           | the late adopters will eventually come along.
        
             | m00x wrote:
             | Mac minis are great for this. They're cheap-ish and they
             | can run quite large models at a decent speed if you run it
             | with an MLX backend.
        
               | alganet wrote:
               | mini _Pro_ are great for this, ones with large RAM
               | upgrades.
               | 
               | If you get the base 16GB mini, it will have more or less
               | the same VRAM but way worse performance than an Arc.
               | 
               | If you already have a PC, it makes sense to go for the
               | cheapest 12GB card instead of a base mac mini.
        
           | tokioyoyo wrote:
           | This is the weird part, I saw the same comments in other
           | threads. People keep saying how everyone yearns for local
           | LLMs... but other than hardcore enthusiasts it just sounds
           | like a bad investment? Like it's a smaller market than gaming
           | GPUs. And by the time anyone runs them locally, you'll have
           | bigger/better models and GPUs coming out, so you won't even
           | be able to make use of them. Maybe the whole "indoctrinate
           | users to be a part of Intel ecosystem, so when they go work
           | for big companies they would vouch for it" would have
           | merit... if others weren't innovating and making their
           | products better (like NVIDIA).
        
             | throwaway48476 wrote:
             | Intel sold their GPUs at negative margin which is part of
             | why the stock fell off a cliff. If they could double the
             | vram they could raise the price into the green even selling
             | thousands, likely closer to 100k, would be far better than
             | what they're doing now. The problem is Intel is run by
             | incompetent people who guard their market segments as
             | tribal fiefs instead of solving for the customer.
        
               | refulgentis wrote:
               | By subsidizing it more they'll lose less money?
        
               | throwaway48476 wrote:
               | Increasing VRAM would differentiate intel GPUs and allow
               | driving higher ASPs, into the green.
        
         | m00x wrote:
         | You can just use a CPU in that case, no? You can run most ML
         | inference on vectorized operations on modern CPUs at a fraction
         | of the price.
        
           | marcyb5st wrote:
           | My 7800x says not really. Compared to my 3070 it feels so
           | incredibly slow that gets in the way of productivity.
           | 
           | Specifically, waiting ~2 seconds vs ~20 for a code snippet is
           | much more detrimental to my productivity than the time
           | difference would suggest. In ~2 seconds I don't get
           | distracted, in ~20 seconds my mind starts wandering and then
           | I have to spend time refocusing.
           | 
           | Make a GPU that is 50% slower than a 2 generations older mid-
           | range GPU (in tokens/s) but on bigger models and I would
           | gladly shell out 1000+$.
           | 
           | So much so that I am considering getting a 5090 if nVdia
           | actually fixes the connector mess they made with 4090s or
           | even a used v100.
        
             | refulgentis wrote:
             | I don't understand, make it slower so it's faster?
        
             | m00x wrote:
             | I'm running codeseeker 13B model on my macbook with no perf
             | issues and I get a response within a few seconds.
             | 
             | Running a specialist model makes more sense on small
             | devices.
        
         | bongodongobob wrote:
         | I don't know a single person in real life that has any desire
         | to run local LLMs. Even amongst my colleagues and tech friends,
         | not very many use LLMs period. It's still very niche outside AI
         | enthusiasts. GPT is better than anything I can run locally
         | anyway. It's not as popular as you think it is.
        
           | dimensi0nal wrote:
           | The only consumer demand for local AI models is for
           | generating pornography
        
             | treprinum wrote:
             | How about running your intelligent home with a voice
             | assistant on your own computer? In privacy-oriented
             | countries (Germany) that would be massive.
        
               | magicalhippo wrote:
               | This is what I'm fiddling with. My 2080Ti is not quite
               | enough to make it viable. I find the small models fail
               | too often, so need larger Whisper and LLM models.
               | 
               | Like the 4060 Ti would have been a nice fit if it hadn't
               | been for the narrow memory bus, which makes it slower
               | than my 2080 Ti for LLM inference.
               | 
               | A more expensive card has the downside of not being cheap
               | enough to justify idling in my server, and my gaming card
               | is at times busy gaming.
        
             | serf wrote:
             | absolutely wrong -- if you're not clever enough to think of
             | any other reason to run an LLM locally then don't condemn
             | the rest of the world to "well they're just using it for
             | porno!"
        
           | throwaway48476 wrote:
           | I want local copilot. I would pay for this.
        
         | rafaelmn wrote:
         | You can get that on mac mini and it will probably cost you less
         | than equivalent PC setup. Should also perform better than low
         | end Intel GPU and be better supported. Will use less power as
         | well.
        
       | jmward01 wrote:
       | 12GB max is a non-starter for ML work now. Why not come out with
       | a reasonably priced 24gb card even if it isn't the fastest and
       | target it at the ML dev world? Am I missing something here?
        
         | Implicated wrote:
         | I was wondering the same thing. Seems crazy to keep pumping out
         | 12gb cards in 2025.
        
         | shrewduser wrote:
         | these are the entry level cards, i imagine the coming higher
         | end variants will have the option of much more ram.
        
         | tofuziggy wrote:
         | Yes exactly!!
        
         | enragedcacti wrote:
         | > Am I missing something here?
         | 
         | Video games
        
           | rs_rs_rs_rs_rs wrote:
           | It's insane how out of touch people can be here, lol
        
             | heraldgeezer wrote:
             | I have been trying to hold my slurs in reading this thread.
             | 
             | These ML AI Macbook people are legit insane.
             | 
             | Desktops and gaming is ugly and complex to them (because
             | lego is hard and macbook look nice unga bunga), yet it is a
             | mass market Intel wants to move in on.
             | 
             | People here complain because Intel is not making a cheap
             | GPU to "make AI" on when that's a market of maybe 1000
             | people.
             | 
             | This Intel card is perfect for an esports gaming machine
             | running CS2, Valorant, Rocket Leauge and casual or older
             | games like The Sims, GoG games etc. Market of 1 million +
             | right there, CS2 alone is 1mil people playing everyday. Not
             | people grinding leetcode on their macs. Every real
             | developer has a desktop, epyc cpu, giga ram and a nice GPU
             | for downtime and run a real OS like Linux or even Windows
             | (yes majority of devs run Windows)
        
               | throwaway48476 wrote:
               | Intel GPUs don't sell well to gamers. They've been on the
               | market for years now.
               | 
               | >market of maybe 1000 people
               | 
               | The market of people interested in local ai inference is
               | in the millions. If it's cheap enough the data center
               | market is at least 10 million.
        
               | heraldgeezer wrote:
               | Yes, Intel cards have sucked. But they are trying again!
        
               | terhechte wrote:
               | Most devs use windows
               | (https://www.statista.com/statistics/869211/worldwide-
               | softwar...). Reddit llocallama alone has 250k users.
               | Clearly the market is bigger than 1000 people. Why are
               | gamers and Linux people always so aggressive diminutive
               | of other people's interests?
        
               | heraldgeezer wrote:
               | >Why are gamers and Linux people always so aggressive
               | diminutive of other people's interests?
               | 
               | Both groups have a high autism %
               | 
               | We love to be "technically correct" and we often are. So
               | we get frustrated when people claim things that are
               | wrong.
        
             | jmward01 wrote:
             | How big is NVIDIA now? You don't think breaking into that
             | market is a good strategy? And, yes, I understand that this
             | is targeted at gamers and not ML. That was the point of the
             | comment I made. Maybe if they did target ML they would make
             | money and open a path to the massive server market out
             | there.
        
         | bryanlarsen wrote:
         | These are $200 low end cards, the B5X0 cards. Presumably they
         | have B7X0 and perhaps even B9X0 cards in the pipeline as well.
        
           | zamadatix wrote:
           | There has been no hint or evidence (beyond hope) Intel will
           | add a 900 class this generation.
           | 
           | B770 was rumoured to match the 16 GB of the A770 (and to be
           | the top end offering for Battlemage) but it is said to not
           | have even been taped out yet with rumour it may end up having
           | been cancelled completely.
           | 
           | I.e. don't hold your breath for anything consumer from Intel
           | this generation better for AI than tha A770 you could have
           | bought 2 years ago. Even if something slightly better is
           | coming at all there is no hint it will be soon.
        
           | hulitu wrote:
           | > These are $200 low end cards
           | 
           | Hm, i wouldn't consider 200$ low end.
        
         | dgfitz wrote:
         | ML is about hit another winter. Maybe intel is ahead of
         | industry.
         | 
         | Or we can keep asking high computers questions about
         | programming.
        
           | PittleyDunkin wrote:
           | > ML is about hit another winter.
           | 
           | I agree ML is about to hit (or has likely already hit) some
           | serious constraints compared to breathless predictions of two
           | years ago. I don't think there's anything equivalent to the
           | AI winter on the horizon, though--LLMs even operated by
           | people who have no clue how the underlying mechanism
           | functions are still far more empowered than anything like the
           | primitives of the 80s enabled.
        
             | klodolph wrote:
             | Yeah... I want to think of it like mining, where you've
             | found an ore vein. You have to switch from prospecting to
             | mining. There's a lot of work to be done by integrating our
             | LLMs and other tools with other systems, and I think the
             | cost/benefit of making models bigger, Bigger, BIGGER is
             | reaching a plateau.
        
             | kimixa wrote:
             | I think there'll be a "financial" winter - or another way a
             | bubble burst - the investment right now is simply
             | unsustainable, how are these products going to be
             | monetized?
             | 
             | Nvidia had a revenue of $27billion in 2023 - that's about
             | $160 per person per year [0] for _every working age person_
             | in the USA. And it 's predicted to more than double in
             | 2024. If you reduce that to office workers (you know, the
             | people who might _actually_ get some benefit, as no AI is
             | going to milk a cow or serve you starbucks) that 's more
             | like $1450/year. Or again more than double that for 2024.
             | 
             | How much value add is the current set of AI products going
             | to give us? It's still mostly promise too.
             | 
             | Sure, like most bubbles there'll probably still be some
             | winners, but there's no way the current market as a whole
             | is sustainable.
             | 
             | The only way the "maximal AI" dream income is actually
             | going to happen is if they functionally replace a
             | significant proportion of the working population
             | completely. And that probably would have large enough
             | impacts to society that things like "Dollars In A Bank" or
             | similar may not be so important.
             | 
             | [0] Using the stat of "169.8 million people worked at some
             | point in 2022"
             | https://www.bls.gov/news.release/pdf/work.pdf
             | 
             | [1] 18.5 million office workers according to
             | https://www.bls.gov/news.release/ocwage.nr0.htm
        
               | BenjiWiebe wrote:
               | Well, "AI" is milking cows. Not LLM's though. Our milking
               | robot uses image recognition to find the cow's teats to
               | put the milking cup on.
        
               | semi-extrinsic wrote:
               | Yeah, but automated milking robots like that have been in
               | the market for more than a decade now IIRC?
               | 
               | Seems like a lot of CV solutions have seen fairly steady
               | but small incremental advances over the past 10-15 years,
               | quite unrelated to the current AI hype.
        
               | kimixa wrote:
               | Improving capabilities of AI isn't at odds with expecting
               | an "AI Winter" - just the current drive is more hype than
               | sustainable, provable progress.
               | 
               | We've been through multiple AI Winters, as a new
               | technique is developed, it _does_ increase the
               | capabilities. Just not as much as the hype suggested.
               | 
               | To say there won't be a bust implies this boom will last
               | forever, into whatever singularity that implies.
        
               | choilive wrote:
               | I think the more accurate denominator would be the world
               | population. People are seeing benefits to LLMs even
               | outside of the office.
        
               | dgfitz wrote:
               | How do LLMs make money though?
        
               | hulitu wrote:
               | > I think the more accurate denominator would be the
               | world population. People are seeing benefits to LLMs even
               | outside of the office.
               | 
               | For example ?
               | 
               | (besides deep fakes)
        
               | ben_w wrote:
               | While I'd agree monetisation seems to be a challenge in
               | the long term (analogy: spreadsheets are used everywhere,
               | but are so easy to make they're not themselves a revenue
               | stream, only as part of a bigger package)...
               | 
               | > Nvidia had a revenue of $27billion in 2023 - that's
               | about $160 per person per year [0] for every working age
               | person in the USA
               | 
               | As a non-American, I'd like to point out we also earn
               | money.
               | 
               | > as no AI is going to milk a cow or serve you starbucks
               | 
               | Cows have been getting the robots for a while now, here's
               | a recent article: https://modernfarmer.com/2023/05/for-
               | years-farmers-milked-co...
               | 
               | Robots serve coffee as well as the office parts of the
               | coffee business: https://www.techopedia.com/ai-coffee-
               | makers-robot-baristas-a...
               | 
               | Some of the malls around here have food courts where
               | robots bring out the meals. I assume they're no more
               | sophisticated than robot vacuum cleaners, but they get
               | the job done.
               | 
               | Transformer models seem to be generally pretty good at
               | high-level robot control, though IIRC a different
               | architecture is needed down at the level of actuators and
               | stepper motors.
        
               | kimixa wrote:
               | Sure, robotics help many jobs, and some level of the
               | current deep learning boom seems to have crossover in
               | improving that - but how many of them are running LLMs
               | that affect Nvidia's bottom line right now? There's some
               | interesting research in that area, but it's certainly not
               | the primary driving force. And then is the control system
               | the limiting factor for many systems - it's probably
               | relatively easy to get a machine today that makes a
               | Starbucks coffee "as good as" a decently trained human.
               | But the market doesn't seem to want that.
               | 
               | And I know restricting it to the US is a simplification,
               | but so is restricting it to Nvidia, it's just to give a
               | ballpark back-of-the-envelope "does this even make
               | sense?" level calculation. And that's what I'm failing to
               | see.
        
               | amluto wrote:
               | Machines that will make espresso, automatically, that I
               | personally to what Starbucks service are widely
               | available. No AI needed, and they aren't even "robotic".
               | These can use ordinary coffee beans, and you can get them
               | for home use or for commercial use. You can also go to a
               | mall and get a robot to make you coffee.
               | 
               | Nonetheless, Starbucks does not use these machines, and I
               | don't see any reason that AI, on its current trajectory,
               | will change that calculation any time soon.
        
               | lm28469 wrote:
               | I love how the fact that we might not want AI/robots
               | everywhere in our lives isn't even discussed.
               | 
               | They could serve us a plate of shit and we'd debate if
               | pepper or salt is better to complement it
        
               | ben_w wrote:
               | It's pretty often discussed, it's just hard to put
               | everything into a single comment (or thread).
               | 
               | I mean, Yudkowsky has basically spent the last decade
               | screaming into the void about how AI will with high
               | probability literally kill everyone, and even people like
               | me who think that danger is much less likely still look
               | at the industrial revolution and how slow we were to
               | react to the harms of climate change and think "speed-
               | running another one of these may be unwise, we should
               | probably be careful".
        
             | ben_w wrote:
             | What we had in the 80s was barely able to perform spell-
             | check, free downloadable LLMs today are mind-blowing even
             | in comparison to GPT-2.
        
               | dgfitz wrote:
               | I think the only good thing that came out of the 80s was
               | the 90s. I'd leave that decade alone so we can forget
               | about it.
        
             | lm28469 wrote:
             | > even operated by people who have no clue how the
             | underlying mechanism functions are still far more empowered
             | than anything like the primitives of the 80s enabled.
             | 
             | I'm still not convinced about that. All the """studies"""
             | show 30-60% boost in productivity but clearly this doesn't
             | translate to anything meaningful in real life because no
             | industry laid off 30-60% of their workforce and no industry
             | progressed anywhere close to 30% since chat gpt was
             | released.
             | 
             | It's been released a whole 24 months ago, remember the
             | talks about freeing us from work and curing cancer... Even
             | investments funds which are the biggest suckers for
             | anything profitable are more and more doubtful
        
           | seanmcdirmid wrote:
           | Haven't people been saying that for the last decade? I mean,
           | eventually they will be right, maybe "about" means next year,
           | or maybe a decade later? They just have to stop making huge
           | improvements for a few years and the investment will dry up.
           | 
           | I really wasn't interested in computer hardware anymore (they
           | are fast enough!) until I discovered the world of running
           | LLMs and other AI locally. Now I actually care about computer
           | hardware again. It is weird, I wouldn't have even opened this
           | HN thread a year ago.
        
             | vlovich123 wrote:
             | What makes local AI interesting to you vs larger remote
             | models like ChatGPT and Claude?
        
               | adriancr wrote:
               | Not OP but for me a big thing is privacy, I can feed it
               | personal documents and expect those to not leak.
               | 
               | It has zero cost, hardware is already there. I'm not
               | captive to some remote company.
               | 
               | I can fiddle and integrate with other home sensors /
               | automation as I want.
        
               | hentrep wrote:
               | Curious as I'm of the same mind - what's your local AI
               | setup? I'm looking to implement a local system that would
               | ideally accommodate voice chat. I know the answer depends
               | on my use case - mostly searching and analysis of
               | personal documents - but would love to hear how you've
               | implemented.
        
               | dgfitz wrote:
               | llama.ccp and time seems to be the general answer to this
               | question.
        
               | epicureanideal wrote:
               | Lack of ideological capture of the public models.
        
               | seanmcdirmid wrote:
               | Control and freedom. You can use unharmonious models and
               | hacks to existing models, also latency, you can actually
               | use AI for a lot more applications when it is running
               | locally.
        
           | HDThoreaun wrote:
           | Selling cheap products that are worse than the competition is
           | a valid strategy during downturns as businesses look to cut
           | costs
        
           | throwaway48476 wrote:
           | The survivors of the AI winter are not the dinosaurs but the
           | small mammals that can profit by dramatically reducing the
           | cost of AI inference in a minimum Capex environment.
        
         | layer8 wrote:
         | The ML dev world isn't a consumer mass market like PC gaming
         | is.
        
           | hajile wrote:
           | Launching a new SKU for $500-1000 with 48gb of RAM seems like
           | a profitable idea. The GPU isn't top-of-the-line, but the RAM
           | would be unmatched for running a lot of models locally.
        
             | layer8 wrote:
             | You can't just throw in more RAM without having the rest of
             | the GPU architected for it. So there's an R&D cost involved
             | for such a design, and there may even be trade-offs on
             | performance for the mass-market lower-tier models. I'm
             | doubtful that the LLM enthusiast/tinkerer market is large
             | enough for that to be obviously profitable.
        
               | hajile wrote:
               | That would depend on how they designed the memory
               | controllers. GDDR6 only supporting 1-2gb modules at
               | present (I believe GDDR6W supports 4gb modules). If they
               | were using 12 1gb modules, then increasing to 24gb
               | shouldn't be a very large change.
               | 
               | Honestly, Apple seems to be on the right track here. DDR5
               | is slower than GDDR6, but you can scale the amount of RAM
               | far higher simply by swapping out the density.
        
               | KeplerBoy wrote:
               | It's a 192 bit interface, so 6 16gbit chips.
        
               | KeplerBoy wrote:
               | Of course you can just add more RAM. Double the capacity
               | of every chip and you get twice the RAM without ever
               | asking an engineer.
               | 
               | People did it with the RTX3070.
               | https://www.tomshardware.com/news/3070-16gb-mod
        
               | Tuna-Fish wrote:
               | Can you find me a 32Gbit GDDR6 chip?
        
             | jmward01 wrote:
             | give me 48gb with reasonable power consumption so I can dev
             | locally and I will buy it in a heartbeat. Anyone that is
             | fine-tuning would want a setup like that to test things
             | before pushing to real GPUs. And in reality if you can
             | fine-tune on a card like that in two days instead of a few
             | hours it would totally be worth it.
        
               | justsomehnguy wrote:
               | I would love too, but you can't just add the chips, you
               | need the the bus too.
        
               | jmward01 wrote:
               | The bigger point here is to ask why they aren't designing
               | that in from the start. Same with AMD. RAM has been
               | stalled and is critical. Start focusing on allowing a lot
               | more of it, even at the cost of performance, and you have
               | a real product. I have a 12GB 3060 as my dev box and the
               | big limiter for it is RAM, not cuda cores. If it had 48GB
               | but the same number of cores then I would be very happy
               | with it, especially if it was power efficient.
        
             | Tuna-Fish wrote:
             | It's not technically possible to just slap on more RAM.
             | GDDR6 is point-to-point with option for clamshell, and the
             | largest chips in mass production are 16Gbit/32 bit. So, for
             | a 192bit card, the best you can get is 192/32x16Gbitx2 =
             | 24GB.
             | 
             | To have more memory, you have to design a new die with a
             | wider interface. The design+test+masks on leading edge
             | silicon is tens of millions of NRE, and has to be paid well
             | over a year before product launch. No-one is going to do
             | that for a low-priced product with an unknown market.
             | 
             | The savior of home inference is probably going to be AMD's
             | Strix Halo. It's a laptop APU built to be a fairly low end
             | gaming chip, but it has a 256-bit LPDDR5X interface. There
             | are larger LPDDR5X packages available (thanks to the
             | smartphone market), and Strix Halo should be eventually
             | available with 128GB of unified ram, performance probably
             | somewhere around a 4060.
        
         | ggregoire wrote:
         | > 12GB max is a non-starter for ML work now.
         | 
         | Can you even do ML work with a GPU not compatible with CUDA?
         | (genuine question)
         | 
         | A quick search showed me the equivalence to CUDA in the Intel
         | world is oneAPI, but in practice, are the major Python
         | libraries used for ML compatible with oneAPI? (Was also gonna
         | ask if oneAPI can run inside Docker but apparently it does [1])
         | 
         | [1] https://hub.docker.com/r/intel/oneapi
        
           | suprjami wrote:
           | There is ROCm and Vulkan compute.
           | 
           | Vulkan is especially appealing because you don't need any
           | special GPGPU drivers and it runs on any card which supports
           | Vulkan.
        
         | PhasmaFelis wrote:
         | > Am I missing something here?
         | 
         | This is a graphics card.
        
         | heraldgeezer wrote:
         | This is not an ML card... this is a gaming card... Why are you
         | people like this?
        
         | whalesalad wrote:
         | I still don't understand why graphics cards haven't evolved to
         | include sodimm slots so that the vram can be upgraded by the
         | end user. At this point memory requirements vary so much from
         | gamer to scientist so it would make more sense to offer compute
         | packages with user-supplied memory.
         | 
         | tl;dr GPU's need to transition from being add-in cards to being
         | a sibling motherboard. A sisterboard? Not a daughter board.
        
       | stracer wrote:
       | Too late, and it has a bad rep. This effort from Intel to sell
       | discrete GPUs is just inertia from old aspirations, won't really
       | help noticeably to save it, as there is not much money in it.
       | Most probably the whole Intel ARC effort will be mothballed, and
       | probably many more will.
        
         | undersuit wrote:
         | No reviews and when you click on the reseller links in the
         | press announcement they're still selling A750s with no B-Series
         | in sight. Strong paper launch.
        
           | sangnoir wrote:
           | The fine article states reviews are still embargoed, and
           | sales start next week.
        
             | undersuit wrote:
             | The mods have thankfully changed this to a Phoronix article
             | instead of the Intel page and the title has been reworked
             | to not include 'launch'.
        
         | ksd482 wrote:
         | What's the alternative?
         | 
         | I think it's the right call since there isn't much competition
         | in GPU industry anyway. Sure, Intel is far behind. But they
         | need to start somewhere in order to break ground.
         | 
         | Strictly speaking strategically, my intuition is that they will
         | learn from this, course correct and then would start making
         | progress.
        
           | stracer wrote:
           | The idea of another competitive GPU manufacturer is nice. But
           | it is hard to bring into existence. Intel is not in a
           | position to invest lots of money and sustained effort into
           | products for which the market is captured and controlled by a
           | much bigger and more competent company on top of its game.
           | Not even AMD can get more market share, and they are much
           | more competent in the GPU technology. Unless NVIDIA and AMD
           | make serious mistakes, Intel GPUs will remain a 3rd rate
           | product.
           | 
           | > "They need to start somewhere in order to break ground"
           | 
           | Intel has big problems and it's not clear they should occupy
           | themselves with this. They should stabilize, and the most
           | plausible way to do that is to cut the weak parts, and get
           | back to what they were good at - performant secure x86_64
           | CPUs, maybe some new innovative CPUs with low consumption,
           | maybe memory/solid state drives.
        
       | jvanderbot wrote:
       | Seems to feature ray tracing (kind of obvious), but also
       | upscaling.
       | 
       | My experience on WH40K DT has taught me that upscaling is
       | absolutely vital for a reasonable experience on some games.
        
         | 1propionyl wrote:
         | > upscaling is absolutely vital for a reasonable experience on
         | some games
         | 
         | This strikes me as a bit of a sad state of affairs. We've moved
         | beyond a Parkinson's law of computational resources -usage by
         | games expands to fill the available resources- to resource
         | usage expanding to fill the available resources on the highest
         | end machines unavailable for less than a few thousand
         | dollars... and then using that to train a model to simulate by
         | upscaling higher quality or performance on lower end machines.
         | 
         | A counterargument would be that this makes high-end experiences
         | available to more people, and while in the individual case, I
         | don't buy that that's where the incentives it creates are
         | driving the entire industry.
         | 
         | To put a finer point on it: at what percentage of budget is too
         | much money being spent on producing assets?
        
           | jvanderbot wrote:
           | Isn't it insane to think that rendering triangles for the
           | visuals in games has gotten so demanding that we need an
           | artificially intelligent system embedded in our graphics
           | cards to paint pixels that look like high definition
           | geometry?
           | 
           | What a time to be alive. Our most advanced technology is used
           | to cheat on homework and play video games.
        
             | 1propionyl wrote:
             | It is. And it strikes me as evidence we've lost the plot
             | and a measure has ceased to be a good measure upon being a
             | target.
             | 
             | It used to be that more computational power was desirable
             | because it would allow for developers to more fully realize
             | creative visions that weren't previously possible.
             | 
             | Now, it seems that the goal is simply visual fidelity and
             | asset complexity... and the rest of the experience is not
             | only secondary, but compromised in pursuit of the former.
             | 
             | Thinking back on recent games that felt like something
             | _new_ and painstakingly crafted... they 're almost all 2D
             | (or look like it), lean on excellent art/music (and even
             | haptics!) direction, have a well-crafted core gameplay loop
             | or set of systems, and have relatively low actual system
             | requirements (which in turn means they are exceptionally
             | smooth without any AI tricks).
             | 
             | Off the top of my head few years: Hades, Balatro, Animal
             | Well, Cruelty Squad[0], Spelunky, Pizza Tower, Papers
             | Please, etc. Most of these could just as easily have been
             | made a decade ago.
             | 
             | That's not to say we haven't had many games that are
             | gorgeous and fun. But while the latter is necessary and
             | sufficient, the former is neither.
             | 
             | It's just icing: it doesn't matter if the cake tastes like
             | crap.
             | 
             | [0] a mission statement if there ever was one for how much
             | fun something can be while not just being ugly but being
             | actively antagonistic to the senses and any notion of good
             | taste.
        
             | jms55 wrote:
             | > Isn't it insane to think that rendering triangles for the
             | visuals in games has gotten so demanding that we need an
             | artificially intelligent system embedded in our graphics
             | cards to paint pixels that look like high definition
             | geometry?
             | 
             | That's not _quite_ how temporal upscaling work in practice.
             | It's more of a blend between existing pixels, not
             | generating entire pixels from scratch.
             | 
             | The technique has existed since before ML upscalers became
             | common. It's just turned out that ML is really good at
             | determining how much to blend by each frame, compared to
             | hand written and tweaked per-game heuristics.
             | 
             | ---
             | 
             | For some history, DLSS 1 _did_ try and generate pixels
             | entirely from scratch each frame. Needless to say, the
             | quality was crap, and that was after a very expensive and
             | time consuming process to train the model for each
             | individual game (and forget about using it as you develop
             | the game; imagine having to retrain the AI model as you
             | implement the graphics).
             | 
             | DLSS 2 moved to having the model predict blend weights fed
             | into an existing TAAU pipeline, which is much more
             | generalizable and has way better quality.
        
       | crowcroft wrote:
       | Anyone using Intel graphics cards? Aside from specs drivers and
       | support can make or break the value prop of a gfx card. Would be
       | curious what actually using is these is like.
        
         | GiorgioG wrote:
         | I put an Arc card in my daughter's machine last month. Seems to
         | work fine.
        
           | Scramblejams wrote:
           | What OS?
        
         | jamesgeck0 wrote:
         | I use an A770 LE for PC gaming. Windows drivers have improved
         | substantially in the last two years. There's a driver update
         | every month or so, although the Intel Arc control GUI hasn't
         | improved in a while. Popular newer titles have generally run
         | well; I've played some Metaphor, Final Fantasy 16, Elden Ring,
         | Spider-Man Remastered, Horizon Zero Dawn, Overwatch, Jedi
         | Survivor, Forza Horizon 4, Monster Hunter Sunbreak, etc.
         | without major issues. Older games sometimes struggle; a 6 year
         | old Need for Speed doesn't display terrain, some 10+ year old
         | indie games crash. Usually fixed by dropping dxvk.dll in the
         | game directory. This fix cannot be used with older Windows
         | Store games. One problematic newer title was Starfield, which
         | at launch had massive frame pacing and hard crashing issues
         | exclusive to Intel Arc.
         | 
         | I've had a small sound latency issue forever; most visible with
         | YouTube videos, the first half-second of every video is silent.
         | 
         | I picked this card up for about $120 less than the GTX 4060.
         | Wasn't a terrible decision.
        
       | imbusy111 wrote:
       | None of the store links work. Weird. Is this not supposed to be a
       | public page yet?
        
         | SirMaster wrote:
         | Must be an announcement rather than a launch I guess?
        
       | kookamamie wrote:
       | Why, though? Intel's strategy seems puzzling, to say the least.
        
         | tokioyoyo wrote:
         | Hard to get subsidies if you're not releasing new lines of
         | products.
        
       | ChrisArchitect wrote:
       | Official page:
       | https://www.intel.com/content/www/us/en/products/docs/discre...
        
       | SeqDesign wrote:
       | the new intel battlemage cards look sweet. if they can extend
       | displays on linux, then i'll definitely be buying one
        
       | greenavocado wrote:
       | I'm not a gamer and there is not enough memory in this thing for
       | me to care to use it for AI applications so that leaves just one
       | thing I care about: hardware accelerated video encoding and
       | decoding. Let's see some performance metrics both in speed and
       | visual quality
        
         | bjoli wrote:
         | From what I have gathered, the alchemist av1 is about the same
         | or sliiiightly worse than current nvenc. My a750 does about
         | 1400fps for dvd encoding on the quality preset. I havent had
         | the opportunity to try 1080p or 4k though.
        
       | bloodyplonker22 wrote:
       | I wanted Intel to do well so I purchased an ARC card. The problem
       | is not the hardware. For some games, it worked fine, but in
       | others, it kept crashing left and right. After updates to
       | drivers, crashing was reduced, but it still happened. Driver
       | software is not easy to develop thoroughly. Even AMD had problems
       | when compared to Nvidia when AMD really started to enter the GPU
       | game after buying ATI. AMD has long since solved their driver
       | woes, but years after ARC's launch, Intel still has not.
        
         | shmerl wrote:
         | Do you mean on Linux and are those problems with anv? Radv
         | seems to be developed faster these days with anv being slightly
         | more behind.
        
         | jamesgeck0 wrote:
         | I haven't experienced many crashing issues on Windows 11. What
         | games are you seeing this in?
        
       | bjoli wrote:
       | I love my a750. Works fantastic out of the box in Linux. He
       | encoding and decoding for every format I use. Flawless support
       | for different screens.
       | 
       | I haven't regretted the purchase at all.
        
       | maxfurman wrote:
       | How does this connect to Gelsinger's retirement, announced
       | yesterday? The comments on that news were all doom and gloom, so
       | I had expected more negative news today. Not a product launch.
       | But I'm just some guy on HN, what do I know?
        
         | wmf wrote:
         | I don't see any connection. This is a very minor product for
         | Intel.
        
       | Havoc wrote:
       | Who is the target audience for this?
       | 
       | Well informed gamers know Intel's discrete GPU is hanging by a
       | thread, so they're not hoping on that bandwagon.
       | 
       | Too small for ML.
       | 
       | The only people really happy seem to be the ones buying it for
       | transcoding and I can't imagine there is a huge market of people
       | going "I need to go buy a card for AV1 encoding".
        
         | epolanski wrote:
         | Cheap gaming rigs.
         | 
         | They do well compared to AMD/Nvidia at that price point.
         | 
         | Is it a market worth chasing at all?
         | 
         | Doubt.
        
         | spookie wrote:
         | It's cheap, plenty of market when the others have forgotten the
         | segment.
        
         | zamalek wrote:
         | If it works well on Linux there's a market for that. AMD are
         | hinting that they will be focusing on iGPUs going forward (all
         | power to them, their iGPUs are unmatched and NVIDIA is
         | dominating dGPU). Intel might be the savior we need. Well,
         | Intel and possibly NVK.
         | 
         | Had this been available a few weeks ago I would have gone
         | through the pain of early adoption. Sadly it wasn't just an
         | upgrade build for me, so I didn't have the luxury of waiting.
        
           | sosodev wrote:
           | AMD has some great iGPUs but it seems like they're still
           | planning to compete in the dGPU space just not at the high
           | end of the market.
        
         | sangnoir wrote:
         | > Too small for ML.
         | 
         | What do you mean by this - I assume you mean too small for SoTA
         | LLMs? There are many ML applications where 12GB is more than
         | enough.
         | 
         | Even w.r.t. LLMs, not everyone requires the latest & biggest
         | LLM models. Some "small", distilled and/or quantized LLMs are
         | perfectly usable with <24GB
        
         | screye wrote:
         | all-in-1 machines.
         | 
         | Intel's customers are 3rd party Cpu assemblers like Dell & HP.
         | Many corporate bulk buyers only care if 1-2 of the apps they
         | use are supported. The lack of wider support isn't a concern.
        
         | ddtaylor wrote:
         | Intel has earned a lot of credit in the Linux space.
         | 
         | Nvidia is trash tier in terms of support and only recently
         | making serious steps to actually support the platform.
         | 
         | AMD went all in nearly a decade ago and it's working pretty
         | well for them. They are mostly caught up to being Intel grade
         | support in the kernel.
         | 
         | Meanwhile, Intel has been doing this since I was in college. I
         | was running the i915 driver in Ubuntu 20 years ago. Sure their
         | chips are super low power stuff, but what you can do with them
         | and the level of software support you get is unmatched. Years
         | before these other vendors were taking the platform seriously
         | Intel was supporting and funding Mesa development.
        
         | marshray wrote:
         | I'm using an Intel card right now. With Wayland. It just works.
         | 
         | Ubuntu 24.04 couldn't even boot to a tty with the Nvidia Quadro
         | thing that came with this major-brand PC workstation, still
         | under warranty.
        
         | mappu wrote:
         | _> Intel 's discrete GPU is hanging by a thread, so they're not
         | hoping on that bandwagon_
         | 
         | Why would that matter? You buy one GPU, in a few years you buy
         | another GPU. It's not a life decision.
        
         | qudat wrote:
         | If you go on the intel arc subreddit people are hyped about
         | intel GPUs. Not sure what the price is but the previous gen was
         | cheap and the extra competition is welcomed
         | 
         | In particular, intel just needs to support vfio and it'll be
         | huge for homelabs.
        
       | zenethian wrote:
       | These are pretty interesting, but I'm curious about the side-by-
       | side screenshot with the slider: why does ray tracing need to be
       | enabled to see the yellow stoplight? That seems like a weird
       | oversight.
        
         | zamalek wrote:
         | It's possible that the capture wasn't taken at the exact same
         | frame, or that the state of the light isn't deterministic in
         | the benchmark.
        
       | tommica wrote:
       | Probably would jump to Intel once my 3060 gets too old
        
       | headgasket wrote:
       | my hunch is the path forward for intel on both the CPU and the
       | GPU end is to release a series of consumer chipsets with a large
       | number of PCIE 5.0 lanes, and keep iterating this. This would
       | cannibalize some of the datacenter server side revenue, both
       | that's a reboot... get the hackers raving about intel value for
       | the money instead of EPYC. Or do a skunkworks ARM64 M1 like
       | processor; there's a market for this as a datacenter part...
        
       | pizzaknife wrote:
       | tell it to my intc stock price
        
       ___________________________________________________________________
       (page generated 2024-12-03 23:00 UTC)