[HN Gopher] Intel announces Arc B-series "Battlemage" discrete g...
___________________________________________________________________
Intel announces Arc B-series "Battlemage" discrete graphics with
Linux support
Author : rbanffy
Score : 222 points
Date : 2024-12-03 17:19 UTC (5 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| rbanffy wrote:
| For me, the most important feature is Linux support. Even if I'm
| not a gamer, I might want to use the GPU for compute and buggy
| proprietary drivers are much more than just an inconvenience.
| zokier wrote:
| Sure, but open drivers have been AMDs selling point for a
| decade, and even nVidia is finally showing signs of opening up.
| So it's bit dubious if these new Intels really can compete on
| this front, at least for very long.
| Night_Thastus wrote:
| We'll have to wait for first-party benchmarks, but they seem
| decent so far. A 4060 equivalent $200-$250 isn't bad at all. for
| I'm curious if we'll get a B750 or B770 and how they'll perform.
|
| At the very least, it's nice to have some decent BUDGET cards
| now. The ~$200 segment has been totally dead for years. I have a
| feeling Intel is losing a fair chunk of $ on each card though,
| just to enter the market.
| rbanffy wrote:
| I'd love to see their GPGPU software support under Linux.
| zargon wrote:
| The keywords you're looking for are Intel basekit, oneapi,
| and ipex.
|
| https://christianjmills.com/posts/intel-pytorch-extension-
| tu...
|
| https://chsasank.com/intel-arc-gpu-driver-oneapi-
| installatio...
| jmclnx wrote:
| >Battlemage is still treated to fully open-source graphics driver
| support on Linux.
|
| I am hoping these are open in such a manner that they can be used
| in OpenBSD. Right now I avoid all hardware with a Nvidia GPU.
| That makes for somewhat slim pickings.
|
| If the firmware is acceptable to the OpenBSD folks, then I will
| happly use these.
| rbanffy wrote:
| They are promising good Linux support, which kind of implies,
| at least, that everything but opaque blobs are open.
| rmm wrote:
| I put an a360 Card into an old machine I turned into a plex
| server. It turned it into a transcoding powerhouse. I can do
| multiple indepdent streams now without it skipping a beat. Price-
| performance ratio was off the chart
| jeffbee wrote:
| Interesting application. Was this a machine lacking an iGPU, or
| does the Intel GPU-on-a-stick have more quicksync power than
| the iGPU?
| 6SixTy wrote:
| A not inconsequential possibility is that both the iGPU and
| dGPU are sharing the transcoding workload, rather than the
| dGPU replacing the iGPU. It's a fairly forgotten feature of
| Intel Arc, but I don't blame anyone because the help articles
| are dusty to say the least.
| kridsdale1 wrote:
| Any idea how that compares to Apple Silicon for that job? I
| bought the $599 MacBook Air with M1 as my plex server for this
| reason. Transcodes 4k HEVC and doesn't even need a fan. Sips
| watts.
| 2OEH8eoCRo0 wrote:
| All Intel arc even the $99 A310 has HW accel h265 and AV1
| encoding.
| machinekob wrote:
| Apple Silicon still don't support AV1 encoding but it is good
| enough for simple Jellyfin server i'm using one myself
| ThatMedicIsASpy wrote:
| My 7950X3Ds GPU does 4k HDR (33Mb/s) to 1080p at 40fps
| (proxmox, jellyfin). If these GPUs would support SR-IOV I would
| grab one for transcoding and GPU accelerated remote desktop.
|
| Untouched video (star wars 8) 4k HDR (60Mb/s) to 1080p at 28fps
| c2h5oh wrote:
| All first gen arc gpus share the same video encoder/decoder,
| including the sub-$100 A310, that can handle four (I haven't
| tested more than two) simultaneous 4k HDR -> 1080p AV1
| transcodes at high bitrate with tone mapping while using
| 12-15W of power.
|
| No SR-IOV.
| baq wrote:
| Intel has been a beast at transcoding for years, it's a
| relatively niche application though.
| 2OEH8eoCRo0 wrote:
| How's the Linux compatibility? I was tempted to do the same for
| my CentOS Stream Plex box.
| Lapra wrote:
| Unlabelled graphs are infuriating. Are the charts average
| framerate? Mean framerate? Maximum framerate?
| rbanffy wrote:
| The two graphs on the page show FPS.
| zamadatix wrote:
| GP is asking what measure of FPS. The most likely value when
| unspecified is usually "mean FPS" but, being a marketing
| graph, it doesn't explicitly say.
| confident_inept wrote:
| I'm really curious to see if these still rely heavily on
| resizable BAR. Putting these in old computers in linux without
| reBAR support makes the driver crash with literally any load
| rendering the cards completely unusable.
|
| It's a real shame, the single slot a380 is a great performance
| for price light gaming and general use card for small machines.
| jeffbee wrote:
| What is the newest platform that lacks resizable BAR? It was
| standardized in 2006. Is 4060-level graphics performance useful
| in whatever old computer has that problem?
| bryanlarsen wrote:
| Sandy Bridge (2009) is still a very usable CPU with a modern
| GPU. In theory Sandy Bridge supported resizable BAR but in
| practice they didn't. I think the problem was BIOS's.
| stusmall wrote:
| Oh wow. That's older than I thought. This is definitely
| less of an issue than folks make out of it.
|
| I cling onto my old hardware to limit ewaste where I can. I
| still gave up on my old sandybridge machine once it hit
| about a decade old. Not only would the CPU have trouble
| keeping up, its mostly only PCIe 2.0. A few had 3.0. You
| wouldn't get the full potential even out of the cheapest
| one of these intel cards. If you are putting a GPU in a
| system like that I can't imagine even buying new. Just get
| something used off ebay.
| 6SixTy wrote:
| On paper any PCIe 2.0 motherboard can receive a BIOS update
| adding ReBAR support with 2.1, but reality is that you
| pretty much have to get a PCIe 3.0 motherboard to have any
| chance of having it or modding it in yourself.
|
| Another issue is that not every GPU actually supports
| ReBAR, I'm reasonably certain the Nvidia drivers turn it
| off for some titles, and pretty much the only vendor that
| reliably wants ReBAR on at all times is Intel Arc.
|
| I also personally wouldn't say that Sandy Bridge is very
| usable with a modern GPU without also specifying what kind
| of CPU or GPU. Or context in how it's being used.
| vel0city wrote:
| My old Ice Lake CPU was very much a bottleneck in lots of
| games in 2018 when I finally replaced it. It was a
| noticeable improvement across the board making the jump to
| a Zen+ CPU at the time, even with the same GPU.
| vel0city wrote:
| Ryzen 2000 series processors don't support AMD's "Smart
| Access Memory" which is pretty much resizable BAR. That's
| 2018.
|
| Coffee Lake also didn't really support ReBAR either, also
| 2018.
| tremon wrote:
| The newest platform is probably POWER10. ReBar is not
| supported on any POWER platform, most likely including the
| upcoming POWER11.
|
| Also, I don't think you'll find many mainboards from 2006
| supporting it. It may have been standardized in 2006, but a
| quick online search leads me to think that even on x86
| mainboards it didn't become commonly available until at least
| 2020.
| jeffbee wrote:
| Congrats on a pretty niche reply. I wonder if literally
| anyone has tried to put an ARC dGPU in a POWER system.
| Maybe someone from Libre-SOC will chime in.
| Palomides wrote:
| do you have a reference for power rebar support? just
| curious, I couldn't find anything with a quick look
| IshKebab wrote:
| Oh no... my POWER gaming rig... no..
| babypuncher wrote:
| ReBAR was standardized in 2006 but consumer motherboards
| didn't start shipping with an option to enable it until much
| later, and didn't start turning it on by default until a few
| years ago.
| mushufasa wrote:
| what's the current status of using cuda on non-gpu chips?
|
| IIRC that was one of the original goals of geohot's tinybox
| project, though I'm not sure exactly where that evolved
| mtlmtlmtlmtl wrote:
| Bit disappointed there's no 16gig(or more) version. But
| absolutely thrilled the rumours of Intel discrete graphics'
| demise were wildly exaggerated(looking at you, Moore's Law is
| Dead...).
|
| Very happy with my A770. Godsend for people like me who want
| plenty VRAM to play with neural nets, but don't have the money
| for workstation GPUs or massively overpriced Nvidia flagships.
| Works painlessly with linux, gaming performance is fine, price
| was the first time I haven't felt fleeced buying a GPU in many
| years. Not having CUDA does lead to some friction, but I think
| nVidia's CUDA moat is a temporary situation.
|
| Prolly sit this one out unless they release another SKU with 16G
| or more ram. But if Intel survives long enough to release
| Celestial, I'll happily buy one.
| khimaros wrote:
| have you tested llama.cpp with this card on Linux? when i
| tested about a year ago, it was a nightmare.
| mtlmtlmtlmtl wrote:
| A few months ago, yeah. Had to set an environment
| variable(added to the ollama systemd unit file), but
| otherwise it worked just fine.
| gs17 wrote:
| > Intel with their Windows benchmarks are promoting the Arc B580
| as being 24% faster than the Intel Arc A750
|
| Not a huge fan of the numbering system they've used. B > A
| doesn't parse as easily as 5xxx > 4xxx to me.
| vesrah wrote:
| They're going in alphabetical order: A - Alchemist B -
| Battlemage C - Celestial (Future gen) D - Druid (Future gen)
| bee_rider wrote:
| Hey we complained about all the numbers in their product
| names. Getting names from the D&D PHB is... actually very
| cool, no complaints.
| gs17 wrote:
| Yes, I understand that. I'm saying it doesn't read as easily
| IMO as (modern) NVIDIA/AMD model numbers. Most numbers I deal
| with are base-10, not base-36.
| BadHumans wrote:
| The naming scheme they are using is easier to parse for me
| so all in the eye of the beholder.
| baq wrote:
| You aren't using excel or sheets I see?
| Ekaros wrote:
| On other hand considering Geforce is 3rd loop of base 10
| maybe it is not so bad... Radeon is on other hand a pure
| absolute mess... Going back same 20 years.
|
| I kinda like the idea of Intel.
| CoastalCoder wrote:
| Given Intel's recent troubles, I'm trying to decide how risky it
| is to invest in their platform. Especially discrete GPUs for
| Linux gaming
|
| Fortunately, having their Linux drivers be (mostly?) open source
| makes a purchase seem less risky.
| babypuncher wrote:
| I can't speak from experience with their GPUs on Linux, but I
| know on Windows most of their problems stem from supporting
| pre-DX12 Direct3D titles. Nvidia and AMD have spent many years
| polishing up their Direct3D support and putting in driver-side
| hacks that paper over badly programmed Direct3D games.
|
| These are obviously Windows-specific issues that don't come up
| at all in Linux, where all that Direct3D headache is taken care
| of by DXVK. Amusingly a big part of Intel's efforts to improve
| D3D performance on Windows has been to use DXVK for many
| titles.
| beAbU wrote:
| Intel isn't going anywhere for at least a couple of hardware
| genrations. Buying a GPU is also not "investing" in anything.
| In 2 years' time you can replace it whith whatever is best
| value for money at that time.
| CoastalCoder wrote:
| > Buying a GPU is also not "investing" in anything.
|
| It is in the (minor) sense that I'd rely on Intel for
| warranty support, driver updates (if closed source), and
| firmware fixes.
|
| But I agree with your main point that the worst-case downside
| isn't that big of a deal.
| throwaway48476 wrote:
| There's no way you're going to maintain and develop the
| intel linux driver as a solo dev.
| CoastalCoder wrote:
| > There's no way you're going to maintain and develop the
| intel linux driver as a solo dev.
|
| I agree entirely.
|
| My point was that even if Intel disappeared tomorrow,
| there's a good chance that Linux developer _community_
| would take over maintenance of those drivers.
|
| In contrast to, e.g., 10-years-ago nvidia, where IIUC it
| was very difficult for outsiders to obtain the
| documentation needed to write proper drivers for their
| GPUs.
| Scene_Cast2 wrote:
| I wonder how many transistors it has and what the chip size it
| is.
|
| For power, it's 190W compared to 4060's 115 W.
|
| EDIT: from [1]: B580 has 21.7 billion transistors at 406 mm2 die
| area, compared to 4060's 18.9 billion and 146 mm2. That's a big
| die.
|
| [1] https://www.techpowerup.com/gpu-specs/arc-b580.c4244
| zokier wrote:
| > Both the Arc B580 and B570 are based on the "BMG-G21" a new
| monolithic silicon built on the TSMC 5 nm EUV process node. The
| silicon has a die-area of 272 mm2, and a transistor count of
| 19.6 billion
|
| https://www.techpowerup.com/review/intel-arc-b580-battlemage...
|
| These numbers seem bit more believable
| Archit3ch wrote:
| They say the best predictor for the future is the past.
|
| How was driver support for their A-series?
| Night_Thastus wrote:
| Drivers were _very_ rough at launch. Some games didn 't run at
| all, some basic functionality and configuration either crashes
| or failed to work, some things ran very poorly, etc. However,
| it was essentially all ironed out over many months of work.
|
| They likely won't need to do the same discovery and fixing for
| B-series as they've already dealt with it.
| treprinum wrote:
| Why don't they just release a basic GPU with 128GB RAM and eat
| NVidia's local generative AI lunch? The networking effect of all
| devs porting their LLMs etc. to that card would instantly put
| them as a major CUDA threat. But beancounters running the company
| would never get such an idea...
| 01HNNWZ0MV43FF wrote:
| Judging by the number of 16 GB laptops I see around, 128 GB of
| RAM would probably cost a bajillion dollars
| gs17 wrote:
| One of the great things about having a desktop is being able
| to get that much for under $300 instead of the price of a
| second laptop.
| qwytw wrote:
| Not the laptop RAM. It costs pennies, Apple's is just
| charging $200 for 12GB because they can. It's way too slow
| though..
|
| And Nvidia doesn't want to cannibalize its high end chips but
| putting more memory into consumer ones.
| gs17 wrote:
| Even 24 or 32 GB for an accessible price would sell out fast.
| NVIDIA wants $2000 for the 5090 to get 32.
| Numerlor wrote:
| 48 GB is at the tail end of what's reasonale for normal GPUs.
| The IO requires a lot of die space. And intel's architecture is
| not very space efficient right now compared to nvidia's
| jsheard wrote:
| > The IO requires a lot of die space.
|
| And even if you spend a lot of die space on memory
| controllers, you can only fit so many GDDR chips around the
| GPU core while maintaining signal integrity. HBM sidesteps
| that issue but it's still too expensive for anything but the
| highest end accelerators, and the ordinary LPDDR that Apple
| uses is lacking in bandwidth compared to GDDR, so they have
| to compensate with ginormous amounts of IO silicon. The M4
| Ultra is expected to have similar bandwidth to a 4090 but the
| former will need a 1024bit bus to get there while the latter
| is only 384bit.
| Numerlor wrote:
| Going off of how the 4090 and 7900 xtx is arranged I think
| you could maybe fit on or two chips more around the die
| over their 12, but that's still a far cry from 128. That
| would probably just need a shared bus like normal DDR as
| you're not fitting that much with 16 gbit density
| SmellTheGlove wrote:
| What if we did what others suggested was the practical
| limit - 48GB. Then just put 2-3 cards in the system and
| maybe had a little bridge over a separate bus for them to
| communicate?
| rapsey wrote:
| Who manufactures the type of RAM and can they buy enough
| capacity? I know nVidia bought up the high bandwidth memory
| supply for years to come.
| wtallis wrote:
| Just how "basic" do you think a GPU can be while having the
| capability to interface with that much DRAM? Getting there with
| GDDR6 would require a _really_ wide memory bus even if you
| could get it to operate with multiple ranks. Getting to 128GB
| with LPDDR5x would be possible with the 256-bit bus width they
| used on the top parts of the last generation, but would result
| in having half the bandwidth of an already mediocre card.
| "Just add more RAM" doesn't work the way you wish it could.
| treprinum wrote:
| M3/M4 Max MacBooks with 128GB RAM are already way better than
| an A6000 for very large local LLMs. So even if the GPU is as
| slow as the one in M3/M4 Max (<3070), and using some basic
| RAM like LPDDR5x it would still be way faster than anything
| from NVidia.
| kevingadd wrote:
| Are you suggesting that Intel 'just' release a GPU at the
| same price point as an M4 Max SOC? And that there would be
| a large market for it if they did so? Seems like an
| extremely niche product that would be demanding to
| manufacture. The M4 Max makes sense because it's a complete
| system they can sell to Apple's price-insensitive audience,
| Intel doesn't have a captive market like that to sell
| bespoke LLM accelerator cards to yet.
|
| If this hypothetical 128GB LLM accelerator was also a
| capable GPU that would be more interesting but Intel hasn't
| proven an ability to execute on that level yet.
| treprinum wrote:
| Nothing in my comment says about pricing it at the M4 Max
| level. Apple charges as much because they can (typing
| this on an $8000 M3 Max). 128GB LPDDR5 is dirt cheap
| these days just Apple adds its premium because they like
| to. Nothing prevents Intel from releasing a basic GPU
| with that much RAM for under $1k.
| wtallis wrote:
| You're asking for a GPU die at least as large as NVIDIA's
| TU102 that was $1k in 2018 when paired with only 11GB of
| RAM (because $1k couldn't get you a fully-enabled die to
| use 12GB of RAM). I think you're off by at least a factor
| of two in your cost estimates.
| treprinum wrote:
| Intel has Xeon Phi which was a spin-off of their first
| attempt at GPU so they have a lot of tech in place they
| can reuse already. They don't need to go with GDDRx/HBMx
| designs that require large dies.
| ksec wrote:
| I don't want to further this discussions but may be you
| dont realise some of the people who replied to you either
| design hardware for a living or has been in the hardware
| industry for longer than 20 years.
| treprinum wrote:
| For some reason Apple did it with M3/M4 Max likely by
| folks that are also on HN. The question is how many of
| the years spent designing HW were spent also by educating
| oneselves on the latest best ways to do it.
| ksec wrote:
| >For some reason.....
|
| They already replied with an answer.
| wtallis wrote:
| Even LPDDR requires a large die. It only takes things out
| of the realm of technologically impossible to merely
| economically impractical. A 512-bit bus is still very
| inconveniently large for a single die.
| m00x wrote:
| It's also impossible and it would need to be a CPU.
|
| CPUs and GPUs access memory very differently.
| jsheard wrote:
| The M4 Max needs an enormous 512bit memory bus to extract
| enough bandwidth out of those LPDDR5x chips, while the GPUs
| that Intel just launched are 192/160bit and even flagships
| rarely exceed 384bit. They can't just slap more memory on
| the board, they would need to dedicate significantly more
| silicon area to memory IO and drive up the cost of the
| part, assuming their architecture would even scale that
| wide without hitting weird bottlenecks.
| p1esk wrote:
| Apple could do it. Why can't Intel?
| jsheard wrote:
| Because Apple isn't playing the same game as everyone
| else. They have the money and clout to buy out TSMCs
| bleeding-edge processes and leave everyone else with the
| scraps, and their silicon is only sold in machines with
| extremely fat margins that can easily absorb the BOM cost
| of making huge chips on the most expensive processes
| money can buy.
| p1esk wrote:
| Bleeding edge processes is what Intel specializes in.
| Unlike Apple, they don't need TSMC. This should have been
| a huge advantage for Intel. Maybe that's why Gelsinger
| got the boot.
| AlotOfReading wrote:
| Intel Arc hardware is manufactured by TSMC, specifically
| on N6 and N5 for this latest announcement.
|
| Intel doesn't currently have nodes competitive with TSMC
| or excess capacity in their better processes.
| jsheard wrote:
| Intel's foundry side has been floundering so hard that
| they've resorted to using TSMC themselves in an attempt
| to keep up with AMD. Their recently launched CPUs are a
| mix of Intel-made and TSMC-made chiplets, but the latter
| accounts for most of the die area.
| duskwuff wrote:
| > Bleeding edge processes is what Intel specializes in.
| Unlike Apple, they don't need TSMC.
|
| Intel literally outsourced their Arrow Lake manufacturing
| to TSMC because they couldn't fabricate the parts
| themselves - their 20A (2nm) process node never reached a
| production-ready state, and was eventually cancelled
| about a month ago.
| p1esk wrote:
| OK, so the question becomes: TSMC could do it. Why can't
| Intel?
| BonoboIO wrote:
| They are trying ... for like 10 years
| wtallis wrote:
| These days, Intel merely specializes in bleeding
| processes. They spent far too many years believing the
| unrealistic promises from their fab division, and in the
| past few years they've been suffering the consequences as
| the problems are too big to be covered up by the cost
| savings of vertical integration.
| JBiserkov wrote:
| > and their silicon is only sold in machines with
| extremely fat margins
|
| Like the brand new Mini that cost 600 USD and went to 500
| during Black week.
| dragontamer wrote:
| Because LPDDR5x is soldered on RAM.
|
| Everyone else wants configurable RAM that scales both
| down (to 16GB) and up (to 2TB), to cover smaller laptops
| and bigger servers.
|
| GPUs with soldered on RAM has 500GB/sec bandwidths, far
| in excess of Apples chips. So the 8GB or 16GB offered by
| NVidia or AMD is just far superior at vid o game graphics
| (where textures are the priority)
| jsheard wrote:
| > GPUs with soldered on RAM has 500GB/sec bandwidths, far
| in excess of Apples chips.
|
| Apple is doing 800GB/sec on the M2 Ultra and should reach
| about 1TB/sec with the M4 Ultra, but that's still lagging
| behind GPUs. The 4090 was already at the 1TB/sec mark two
| years ago, the 5090 is supposedly aiming for 1.5TB/sec,
| and the H200 is doing _5TB /sec._
| dragontamer wrote:
| HBM is kind of not fair lol. But 4096-line bus is gonna
| have more bandwidth than any competitor.
|
| It's pretty expensive though.
|
| The 500GB/sec number is for a more ordinary GPU like the
| B580 Battlemage in the $250ish price range. Obviously the
| $2000ish 4090 will be better, but I don't expect the
| typical consumer to be using those.
| kimixa wrote:
| But an on-package memory bus has some of the advantages
| of HBM, just to a lesser extent, so it's arguably
| comparable as an "intermediate stage" between RAM chips
| and HBM. Distances are shorter (so voltage drop and
| capacitance are lower, so can be driven at lower power),
| routing is more complex but can be worked around by more
| layers, which increases cost but on a _significantly_
| smaller area than required for dimms, and the dimms
| connections themselves can hurt performance (reflection
| from poor contacts, optional termination makes things
| more complex, and the expectations of mix-and-match for
| dimm vendors and products likely reduce fine tuning
| possibilities).
|
| There's pretty much a direct opposite scaling between
| flexibility and performance - dimms > soldered ram > on-
| package ram > die-interconnects.
| Der_Einzige wrote:
| It doesn't matter if the "cost is driven up". Nvidia has
| proven that we're all lil pay pigs for them. 5090 will be
| 3000$ for 32gb of VRAM. Screenshot this now, it will age
| well.
|
| We'd be happy to pay 5000 for 128gb from Intel.
| pixelpoet wrote:
| You are absolutely correct, and even my non-prophetic ass
| echoed exactly the first sentence of the top comment in
| this HN thread ("Why don't they just release a basic GPU
| with 128GB RAM and eat NVidia's local generative AI
| lunch?").
|
| Yes, yes, it's not trivial to have a GPU with 128gb of
| memory with cache tags and so on, but is that really in
| the same universe of complexity of taking on Nvidia and
| their CUDA / AI moat any other way? Did Intel ever give
| the impression they don't know how to design a cache?
| There really has to be a GOOD reason for this, otherwise
| everyone involved with this launch is just plain stupid
| or getting paid off to not pursue this.
|
| Saying all this with infinite love and 100% commercial
| support of OpenCL since version 1.0, a great enjoyer of
| A770 with 16GB of memory, I live to laugh in the face of
| people who claimed for over 10 years that OpenCL is
| deprecated on MacOS (which I cannot stand and will never
| use, yet the hardware it runs on...) and still routinely
| crushes powerful desktop GPUs, in reality and practice
| today.
| timschmidt wrote:
| Both Intel and AMD produce server chips with 12 channel
| memory these days (that's 12x64bit for 768bit) which
| combined with DDR5 can push effective socket bandwidth
| beyond 800GB/s, which is well into the area occupied by
| single GPUs these days.
|
| You can even find some attractive deals on
| motherboard/ram/cpu bundles built around grey market
| engineering sample CPUs on aliexpress with good reports
| about usability under Linux.
|
| Building a whole new system like this is not exactly as
| simple as just plugging a GPU into an existing system,
| but you also benefit from upgradeability of the memory,
| and not having to use anything like CUDA. llamafile, as
| an example, really benefits from AVX-512 available in
| recent CPUs. LLMs are memory bandwidth bound, so it
| doesn't take many CPU cores to keep the memory bus full.
|
| Another benefit is that you can get a large amount of
| usable high bandwidth memory with a relatively low total
| system power usage. Some of AMD's parts with 12 channel
| memory can fit in a 200W system power budget. Less than a
| single high end GPU.
| pixelpoet wrote:
| My desktop machine has had 128gb since 2018, but for the
| AI workloads currently commanding almost infinite market
| value, it really needs the 1TB/s bandwidth and teraflops
| that only a bona fide GPU can provide. An early AMD GPU
| with these characteristics is the Radeon VII with 16gb
| HBM, which I bought for 500 eur back in 2019 (!!!).
|
| I'm a rendering guy, not an AI guy, so I really just want
| the teraflops, but all GPU users urgently need a 3rd
| market player.
| timschmidt wrote:
| That 128gb is hanging off a dual channel memory bus with
| only 128 total bits of bandwidth. Which is why you need
| the GPU. The Epyc and Xeon CPUs I'm discussing have 6x
| the memory bandwidth, and will trade blows with that GPU.
| pixelpoet wrote:
| At a mere 20x the cost or something, to say nothing about
| the motherboard etc :( 500 eur for 16GB of 1TB/s with
| tons of fp32 (and even fp64! The main reason I bought it)
| back in 2019 is no joke.
|
| Believe me, as a lifelong hobbyist-HPC kind of person, I
| am absolutely dying for such a HBM/fp64 deal again.
| timschmidt wrote:
| $1,961.19: H13SSL-N Motherboard And EPYC 9334 QS CPU +
| DDR5 4*128GB 2666MHZ REG ECC RAM Server motherboard kit
|
| https://www.aliexpress.us/item/3256807766813460.html
|
| Doesn't seem like 20x to me. I'm sure spending more than
| 30 seconds searching could find even better deals.
| pixelpoet wrote:
| Isn't 2666 MHz ECC RAM obscenely slow? 32 cores without
| the fast AVX-512 of Zen5 isn't what anyone is looking for
| in terms of floating point throughput (ask me about
| electricity prices in Germany), and for that money I'd
| rather just take a 4090 with 24GB memory and do my own
| software fixed point or floating point (which is exactly
| what I do personally and professionally).
|
| This is exactly what I meant about Intel's recent launch.
| Imagine if they went full ALU-heavy on latest TSMC
| process and packaged 128GB with it, for like, 2-3k Eur.
| Nvidia would be whipping their lawyers to try to do
| something about that, not just their engineers.
| timschmidt wrote:
| I don't think anyone's stopping you, buddy. Great chat. I
| hope you have a nice evening.
| mirekrusin wrote:
| Me too, probably 2x. I'd sell like hot cakes.
| jsheard wrote:
| The question is whether there's enough overall demand for
| a GPU architecture with 4x the VRAM of a 5090 but only
| about 1/3rd of the bandwidth. At that point it would only
| really be good for AI inferencing, so why not make
| specialized inferencing silicon instead?
| mandelken wrote:
| I genuinely wonder why no one is doing this? Why can't I
| buy this specialized AI inference silicon with plenty of
| VRAM?
| hughesjj wrote:
| Man, I'm old enough to remember when 512 was a thing for
| consumer cards back when we had 4-8gb memory
|
| Sure that was only gddr5 and not gddr6 or lpddr5, but I
| would have bet we'd be up to 512bit again 10 years down
| the line..
|
| (I mean supposedly hbm3 has done 1024-2048bit busses but
| that seems more research or super high end cards, not
| consumer)
| jsheard wrote:
| Rumor is the 5090 will be bringing back the 512bit bus,
| for a whopping 1.5TB/sec bandwidth.
| CoastalCoder wrote:
| > The M4 Max needs an enormous 512bit memory bus to
| extract enough bandwidth out of those LPDDR5x chips
|
| Does M4 Max have 64-byte cache lines?
|
| If they can fetch or flush an entire cache line in a
| single memory-bus transaction, I wonder if that opens up
| any additional hardware / performance optimizations.
| modeless wrote:
| The memory controller would be bigger, and the cost would
| be higher, but not radically higher. It would be an
| attractive product for local inference even at triple the
| current price and the development expense would be 100%
| justified if it helped Intel get _any_ kind of foothold
| in the ML market.
| wtallis wrote:
| That would basically mean Intel doubling the size of their
| current GPU die, with a different memory PHY. They're
| clearly not ready to make that an affordable card. Maybe
| when they get around to making a chiplet-based GPU.
| amelius wrote:
| What if they put 8 identical GPUs in the package, each with
| 1/8 the memory? Would that be a useful configuration for a
| modern LLM?
| keyboard_slap wrote:
| It could work, but would it be cost-competitive?
| rini17 wrote:
| Also, cooling.
| ben_w wrote:
| Last I've heard, the architecture makes that difficult. But
| my information may be outdated, and even if it isn't, I'm
| not a hardware designer and may have just misunderstood the
| limits I hear others discuss.
| numeri wrote:
| GPU inference is always a balancing act, trying to avoid
| bottlenecks on memory bandwidth (loading data from the
| GPU's global memory/VRAM to the much smaller internal
| shared memory, where it can be used for calculations) and
| compute (once the values are loaded).
|
| Splitting the model up between several GPUs would add a
| third much worse bottleneck - memory bandwidth between the
| GPUs. No matter how well you connect them, it'll be slower
| than transfer within a single GPU.
|
| Still, the fact that you can fit an 8x larger GPU might be
| worth it to you. It's a trade-off that's almost universally
| made while training LLMs (sometimes even with the model
| split down both its width and length), but is much less
| attractive for inference.
| amelius wrote:
| > Splitting the model up between several GPUs would add a
| third much worse bottleneck - memory bandwidth between
| the GPUs.
|
| What if you allowed the system to only have a shared
| memory between every neighboring pair of GPUs?
|
| Would that make sense for an LLM?
| treprinum wrote:
| K80 used to be two glued K40 but their interconnect was
| barely faster than PCIe so it didn't have much benefit as
| one had to move stuff between two internal GPUs anyway.
| ksec wrote:
| Thank You Wtallis. Somewhere along the line, this basic
| "knowledge" of hardware is completely lost. I dont expect
| this to be explained in any comment section on old Anandtech.
| It seems hardware enthusiast has mostly disappeared, I guess
| that is also why Anandtech closed. We now live in a world
| where most site are just BS rumours.
| ethbr1 wrote:
| That's because Anand Lal Shimpi is a CompE by training.
|
| Not too many hardware enthusiast site editors have that
| academic background.
|
| And while fervor can sometimes substitute for education...
| probably not in microprocessor / system design.
| chessgecko wrote:
| GDDR isnt like the ram that connects to cpu, it's much more
| difficult and expensive to add more. You can get up to 48GB
| with some expensive stacked gddr, but if you wanted to add more
| stacks you'd need to solve some serious signal timing related
| headaches that most users wouldn't benefit from.
|
| I think the high memory local inference stuff is going to come
| from "AI enabled" cpus that share the memory in your computer.
| Apple is doing this now, but cheaper options are on the way. As
| a shape its just suboptimal for graphics, so it doesn't make
| sense for any of the gpu vendors to do it.
| treprinum wrote:
| They can use LPDDR5x, it would still massively accelerate
| inference of large local LLMs that need more than 48GB RAM.
| Any tensor swapping between CPU RAM and GPU RAM kills the
| performance.
| chessgecko wrote:
| I think we don't really disagree, I just think that this
| shape isn't really a gpu its just a cpu because it isn't
| very good for graphics at that point.
| treprinum wrote:
| That's why I said "basic GPU". It doesn't have to be too
| fast but it should still be way faster than a regular
| CPU. Intel already has Xeon Phi so a lot of things were
| developed already (like memory controller, heavy parallel
| dies etc.)
| chessgecko wrote:
| I guess it's hard to know how well this would compete
| with integrated gpus, especially at a reasonable
| pricepoint. If you wanted to spend $4000+ on it, it could
| be very competitive and might look something like nvidias
| grace-hopper superchip, but if you want the product to be
| under $1k I think it might be better just to buy separate
| cards for your graphics and ai stuff.
| smcleod wrote:
| As someone else said - I don't think you have to have GDDR,
| surely there are other options. Apple does a great job of it
| on their APUs with up to 192GB, even an old AMD Threadripper
| chip can do quite well with its DDR4/5 performance
| chessgecko wrote:
| For ai inference you definitely have other options, but for
| low end graphics? the lpddr that apple (and nvidia in
| grace) use would be super expensive to get a comparable
| bandwidth (think $3+/gb and to get 500GB/sec you need at
| least 128GB).
|
| And that 500GB/sec is pretty low for a gpu, its like a 4070
| but the memory alone would add $500+ to the cost of the
| inputs, not even counting the advanced packaging (getting
| those bandwidths out of lpddr needs organic substrate).
|
| It's not that you can't, just when you start doing this it
| stops being like a graphics card and becomes like a cpu.
| beAbU wrote:
| They are probably held back by same reason thats preventing AMD
| and nVidia from doing it either.
| treprinum wrote:
| NVidia and AMD make $$$ on datacenter GPUs so it makes sense
| they don't want to discount their own high-end. Intel has
| nothing there so they can happily go for commodization of AI
| hardware like what Meta did when releasing LLaMA to the wild.
| beAbU wrote:
| Is nVidia or AmD offering 128gb cards in any configuration?
| latchkey wrote:
| They aren't "cards" but MI300x has 192GB and MI325x has
| 256GB.
| phkahler wrote:
| You can run an AMD APU with 128GB of shared RAM.
| treprinum wrote:
| It's too slow and not very compatible.
| bryanlarsen wrote:
| The reason is AMD and Nvidia don't is that they don't want to
| cannibalize their high end AI market. Intel doesn't have a
| high end AI market to protect.
| fweimer wrote:
| There are products like this one: https://www.intel.com/con
| tent/www/us/en/products/sku/232592/...
|
| As far as I understand it, it gives you 64 GiB of HBM per
| socket.
| Muskyinhere wrote:
| Because if they could just do that and it would rival what
| NVidia has, they would just do it.
|
| But obvoiusly they don't.
|
| And for reasons: NVidia has worked on CUDA for ages, do you
| believe they just replace this whole thing in no time?
| treprinum wrote:
| llama.cpp and its derivatives say yes.
| pjmlp wrote:
| A fraction of CUDA capabilities.
| treprinum wrote:
| Sufficient for LLMs and image/video gen.
| m00x wrote:
| FLUX.1 D generation is about a minute at 20 steps on a
| 4080, but takes 35 minutes on the CPU.
| treprinum wrote:
| 4080 won't do video due to low RAM. The GPU doesn't have
| to be as fast there, it can be 5x slower which is still
| way faster than a CPU. And Intel can iterate from there.
| m00x wrote:
| It won't be 5x slower, it would be 20-50x slower if you
| would implement it as you said.
|
| You can't just "add more ram" to GPUs and have them work
| the same way. Memory access is completely different than
| on CPUs.
| Der_Einzige wrote:
| Not even close. Llama.cpp isn't even close to a
| production ready LLM inference engine, and it runs
| overwhelmingly faster when using CUDA
| pjmlp wrote:
| A fraction of what a GPU is used for.
| m00x wrote:
| This is the most script kiddy comment I've seen in a while.
|
| llama.cpp is just inference, not training, and the CUDA
| backend is still the fastest one by far. No one is even
| close to matching CUDA on either training or inference. The
| closest is AMD with ROCm, but there's likely a decade of
| work to be done to be competitive.
| treprinum wrote:
| Inference on very large LLMs where model + backprop
| exceed 48GB is already way faster on a 128GB MacBook than
| on NVidia unless you have one of those monstrous Hx00s
| with lots of RAM which most devs don't.
| m00x wrote:
| Because the CPU has to load the model in parts for every
| cycle so you're spending a lot of time on IO and it
| offsets processing.
|
| You're talking about completely different things here.
|
| It's fine if you're doing a few requests at home, but if
| you're actually serving AI models, CUDA is the only
| reasonable choice other than ASICs.
| treprinum wrote:
| My comment was about Intel having a starter project,
| getting enthusiastic response from devs, network effects
| and iterate from there. They need a way to threaten
| Nvidia and just focusing on what they can't do won't
| bring them there. There is one route where they can
| disturb Nvidia's high end over time and that's a cheap
| basic GPU with lots of RAM. Like Ryzen 1st gen whose
| single core performance was two generations behind Intel
| trashed Intel by providing 2x as many cores for cheap.
| m00x wrote:
| It would be a good idea to start with some basic
| understanding of GPU, and realizing why this can't easily
| be done.
| treprinum wrote:
| That's a question M3 Max with its internal GPU already
| answered. It's not like I didn't do any HPC or CUDA work
| in the past to be completely clueless about how GPUs work
| though I haven't created those libraries myself.
| Muskyinhere wrote:
| No one is running LLMs on consumer NVidia GPUs or apple
| MacBooks.
|
| A dev, if they want to run local models, probably run
| something which just fits on a proper GPU. For everything
| else, everyone uses an API key from whatever because its
| fundamentaly faster.
|
| IF a affordable intel GPU would be relevant faster for
| inferencing, is not clear at all.
|
| A 4090 is at least double the speed of Apples GPU.
| treprinum wrote:
| 4090 is 5x faster than M3 Max 128GB according to my tests
| but it can't even inference LLaMA-30B. The moment you hit
| that memory limit the inference is suddenly 30x slower
| than M3 Max. So a basic GPU with 128GB RAM would trash
| 4090 on those larger LLMs.
| m00x wrote:
| Do you have the code for that test?
| treprinum wrote:
| I ran some variation of llama.cpp that could handle large
| models by running portion of them on GPU and if too
| large, the rest on CPU and those were the results. Maybe
| I can dig it from some computer at home but it was almost
| like a year ago when I got M3 Max with 128GB RAM.
| yumraj wrote:
| Yes, and inference is a huge market in itself and
| potentially larger than training (gut feeling haven't run
| numbers)
|
| Keep NVIDIA for training and Intel/AMD/Cerebras/... for
| interference.
| Muskyinhere wrote:
| NVidia Blackwell is not just a GPU. Its a Rack with a
| interconnect through a custom Nvidia based Network.
|
| And it needs liquid cooling.
|
| You don't just plugin intel cards 'out of the box'.
| m00x wrote:
| Inference is still a lot faster on CUDA than on CPU. It's
| fine if you run it at home or on your laptop for privacy,
| but if you're serving those models at any scale, you're
| going to be using GPUs with CUDA.
|
| Inference is also a much smaller market right now, but
| will likely be overtaken later as we have more people
| using the models than competing to train the best one.
| latchkey wrote:
| The funny thing about Cerebras is that it doesn't scale
| well at all for inference and if you talk to them in
| person, they are currently making all their money on
| training workloads.
| Wytwwww wrote:
| Does CUDA even matter than much for LLMs? Especially
| inference? I don't think software would be the limiting
| factor for this hypothetical GPU. Afterall it would be
| competing with Apple's M chips not with the 4090 or Nvidia's
| enterprise GPUs.
| Der_Einzige wrote:
| It's the only thing that matters. Folks act like AMD
| support is there because suddenly you can run the most
| basic LLM workload. Try doing anything actually interesting
| (i.e, try running anything cool in the mechanistic
| interoperability or representation/attention engineer
| world) with AMD and suddenly everything broken, nothing
| works, and you have to spend millions worth of AI engineer
| developer time to try to salvage a working solution.
|
| Or you can just buy Nvidia.
| heraldgeezer wrote:
| This is a gaming card. Look at benchmarks.
| whatudb wrote:
| Meta comment: "why don't they just" phrase usually indicates
| significant ignorance about a subject, it's better to learn a
| little bit before dispensing criticism about beancounters or
| whatnot.
|
| In this case, the die I/O limits precludes more than a
| reasonable number of DDR channels.
| FuriouslyAdrift wrote:
| HBM3E memory is at least 3x the price of DDR5 (it requires 3x
| the wafer as DDR5) and capacity is sold out for all of 2025
| already... that's the price and production bottleneck.
|
| High speed, low latency server grade DDR5 is around $800-$1600
| for 128GB. Triple that for $2400 - $4800 just for the memory.
| Still need the GPUs/APUs, card, VRMs, etc.
|
| Even the nVidia H100 with "only" 94GB starts at $30k...
| adventured wrote:
| Nvidia's $30,000 is a 90% margin product at scale. They could
| charge 1/3 that and still be very profitable. There has
| rarely been such a profitable large corporation in terms of
| the combo of profit & margin.
|
| Their last quarter was $35b in sales and $26b in gross profit
| ($21.8b op income; 62% op income margin vs sales).
|
| Visa is notorious for their extreme margin (66% op income
| margin vs sales) due to being basically a brand + transaction
| network. So the fact that a hardware manufacturer is hitting
| those levels is truly remarkable.
|
| It's very clear that either AMD or Intel could accept far
| lower margins to go after them. And indeed that's exactly
| what will be required for any serious attempt to cut into
| their monopoly position.
| talldayo wrote:
| > And indeed that's exactly what will be required for any
| serious attempt to cut into their monopoly position.
|
| You misunderstand why and how Nvidia is a monopoly. Many
| companies make GPUs, and all those GPUs _can_ be used for
| computation if you develop compute shaders for them. This
| part is not the problem, _you can already_ go buy cheaper
| hardware that outperforms Nvidia if price is your only
| concern.
|
| Software is the issue. That's it - it's CUDA and nothing
| else. You cannot assail Nvidia's position, and moreover
| their hardware's value, without a really solid reason for
| datacenters to own them. Datacenters do not want to own
| GPUs because once the AI bubble pops they'll be bagholders
| for Intel and AMD's depreciated software. Nvidia hardware
| can at least crypto mine, or be leased out to industrial
| customers that have their own remote CUDA applications. The
| demand for generic GPU compute is basically nonexistent,
| the reason this market exists at all is because CUDA
| exists, and you cannot turn over Nvidia's foothold without
| accepting that fact.
|
| The only way the entire industry can fuck over Nvidia is if
| they choose to invest in a complete CUDA replacement like
| OpenCL. That is the only way that Nvidia's value can be
| actually deposed without any path of recourse for their
| business, and it will never happen because every single one
| of Nvidia's competitors hate each other's guts and would
| rather watch each other die in gladiatorial combat than
| help each other fight the monster. And Jensen Huang
| probably revels in it, CUDA is a hedged bet against the
| industry ever working together for common good.
| adventured wrote:
| I do not misunderstand why Nvidia has a monopoly. You
| jumped drastically beyond anything I was discussing and
| incorrectly assumed ignorance on my part. I never said
| why I thought they had one. I never brought up matters of
| performance or software or moats at all. I matter of fact
| stated they had a monopoly, you assumed the rest.
|
| It's impossible to assail their monopoly without
| utilizing far lower prices, coming up under their extreme
| margin products. It's how it is almost always done
| competitively in tech (see: ARM, or Office (dramatically
| undercut Lotus with a cheaper inferior product), or
| Linux, or Huawei, or Chromebooks, or Internet Explorer,
| or just about anything).
|
| Note: I never said lower prices is all you'd need. Who
| would think that? The implication is that I'm ignorant of
| the entire history of tech, it's a poor approach to
| discussion with another person on HN frankly.
| talldayo wrote:
| Nvidia's monopoly is pretty much detached from price at
| this point. That's the entire reason _why_ they can
| charge insane margins - nobody cares! There is not a
| single business squaring Nvidia up with serious intent to
| take down CUDA. It 's been this way for nearly two
| decades at this point, with not a single spark of hope to
| show for it.
|
| In the case of ARM, Office, Linux, Huawei, and ChromeOS,
| these were all _actual_ alternatives to the incumbent
| tools people were familiar with. You can directly compare
| Office and Lotus because they are fundamentally similar
| products - ARM had a real chance against x86 because wasn
| 't a complex ISA to unseat. Nvidia is not analogous to
| these businesses because they occupy a league of their
| own as the provider of CUDA. It's not exaggeration to say
| that they have completely seceded from the market of GPUs
| and can sustain themselves on demand from crypto miners
| and AI pundits alone.
|
| AMD, Intel and even Apple have bigger things to worry
| about than hitting an arbitrary price point, if they want
| Nvidia in their crosshairs. All of them have already
| solved the "sell consumer tech at attractive prices"
| problem but not the "make it complex, standardize it and
| scale it up" problem.
| DSingularity wrote:
| I feel people are exaggerating the impossibility of
| replacing CUDA. Adopting CUDA is convenient right now
| because yes it is difficult to replace it. Barrier to
| entry for orgs that can do that is very high. But it has
| been done. Google has the TPU for example.
| Der_Einzige wrote:
| They're not exaggerating it. The more things change, the
| more they stay the same. Nvidia and AMD had the exact
| same relationship 15 years ago that they do today. The
| AMD crowd clutching about their better efficiencies, and
| the Nvidia crowd having grossly superior
| drivers/firmware/hardware, including unique PhysX stuff
| that STILL has not been matched since 2012 (remember
| Planetside 2 or Broderlands 2 physics? Pepperidge Farm
| Remembers...)
|
| So many billions of dollars and no one is even 1% close
| to displacing CUDA in any meaningful way. ZULDA is dead.
| ROCM is a meme, Scale is a meme. Either you use CUDA or
| you don't do meaningful AI work.
| talldayo wrote:
| The TPU is not a GPU nor is it commercially available. It
| is a chip optimized around a limited featureset with a
| limited software layer on top of it. It's an impressive
| demonstration on Google's behalf to be sure, but it's
| also not a shot across the bow at Nvidia's business.
| Nvidia has the TSMC relations, a refined and complex
| streaming multiprocessor architecture and _actual_
| software support their customers can go use today. TPUs
| haven 't quite taken over like people anticipated
| anyways.
|
| I don't personally think CUDA is impossible to replace -
| but I do think that everyone capable of replacing CUDA
| has been ignoring it recently. Nvidia's role as the GPGPU
| compute people is secure for the foreseeable future.
| Apple wants to design _simpler_ GPUs, AMD wants to design
| cheaper GPUs, and Intel wants to pretend like they can
| compete with AMD. Every stakeholder with the capacity to
| turn this ship around is pretending like Nvidia doesn 't
| exist and whistling until they go away.
| Der_Einzige wrote:
| Thank you for laying it out. It's so silly to see people in
| the comments act like Intel or Nvidia can't EASILY add more
| VRAM to their cards. Every single argument against it is
| all hogwash.
| arcticbull wrote:
| Visa doesn't actually make a ton of money off each
| transaction, if you divide out their revenue against their
| payment volume (napkin math)...
|
| They processed $12T in payments last year (almost a billion
| payments per day), with a net revenue of $32B. That's a
| gross transaction margin of 0.26% and their GAAP net income
| was half that, about 0.14%. [1]
|
| They're just a transaction network, unlike say Amex which
| is both an issuer and a network. Being just the network is
| more operationally efficient.
|
| [1] https://annualreport.visa.com/financials/default.aspx
| oivey wrote:
| That's a weird way to account for their business size.
| There isn't a significant marginal cost per transaction.
| They didn't sell $12T in products. They facilitated that
| much in payments. Their profits are fantastic.
| elorant wrote:
| AMD has a 192GB GPU. I don't see them eating NVidia's lunch
| with it.
| treprinum wrote:
| They are charging as much as Nvidia for it. Now imagine they
| offered such a card for $2k. Would that allow them to eat
| Nvidia's lunch?
| p1esk wrote:
| We would also need to imagine AMD fixing their software.
| treprinum wrote:
| I think plenty of enthusiastic open source devs would
| jump at it and fix their software if the software was
| reasonably open. The same effect as what happened when
| Meta released LLaMA.
| jjmarr wrote:
| It is open and they regularly merge PRs.
|
| https://github.com/ROCm/ROCm/pulls?q=is%3Apr+is%3Aclosed
| treprinum wrote:
| AMD GPUs aren't very attractive to ML folks because they
| don't outshine Nvidia in any single aspect. Blasting lots
| of RAM onto a GPU would make it attractive immediately
| with lots of attention from devs occupied with more
| interesting things.
| latchkey wrote:
| If you want to load up 405B @ FP_16 into a single H100 box,
| how do you do it? You get two boxes. 2x the price.
|
| Models are getting larger, not smaller. This is why H200
| has more memory, but the same exact compute. MI300x vs.
| MI325x... more memory, same compute.
| elorant wrote:
| Let's say for the sake of argument that you could build
| such a card and sell it for less than $5k. Why would you do
| it? You know there's huge demand in the tens of billions
| per quarter for high end cards. Why undercut so heavily
| that market? To overthrow NVidia? So you'll end up with a
| profit margin way low and then your shareholders will eat
| you alive.
| daft_pink wrote:
| Totally agree. Someone needs to exploit the lack of available
| gpu memory in graphics cards for model runners. Even training
| tensors tends to run against memory issues with the current
| cards.
| zamalek wrote:
| I think a better idea would be an NPU with slower memory, or
| tie it to the system DDR. I don't think consumer inference
| (possibly even training) applications would need the memory
| bandwidth offered by GDDR/HBM. Inference on my 7950x is already
| stupid fast (all things considered).
|
| The deeper problem is that the market for this is probably
| incredibly niche.
| m3kw9 wrote:
| Because they can't
| Sparkyte wrote:
| Because you can't stack that much ram on a GPU without
| sufficient channels to do so. You could probably do 64GB on
| GDDR6 but you can't do 128GB on GDDR6 without more memory
| channels. 2GB per chip per channel is the current limit for
| GDDR6 this is why HBM was invented.
|
| It is why you can only see GPUs with 24GB of memory at the
| moment.
|
| HBM2 can handle 64GB ( 4 x 8GB Stack ) ( Total capacity 128GB )
|
| HBM3 can handle 192GB ( 4 x 24GB Stack ) ( Total capacity 384GB
| )
|
| You can not do this with GDDR6.
| bayindirh wrote:
| Disclosure: HPC admin who works with NIVIDA cards here.
|
| Because, no. It's not as simple as that.
|
| NVIDIA has a complete ecosystem now. They have cards. They have
| cards of cards (platforms), which they produce, validate and
| sell. They have NVLink crossbars and switches which connects
| these cards on their card of cards with very high speeds and
| low latency.
|
| For inter-server communication they have libraries which
| coordinate cards, workloads and computations.
|
| They bought Mellanox, but that can be used by anyone, so
| there's no lock-in for now.
|
| As a tangent, NVIDIA has a whole set of standards for pumping
| tremendous amount of data in and out of these mesh of cards.
| Let it be GPU-Direct storage or specialized daemons which
| handle data transfers on and off cards.
|
| If you think that you can connect n cards on PCIe bus and just
| send workloads to them and solve problems magically, you'll
| hurt yourself a lot, both performance and psychology wise.
|
| You have to build a stack which can perform these things with
| maximum possible performance to be able to compute with NVIDIA.
| It's not just emulating CUDA, now. Esp., on the high end of the
| AI spectrum (GenAI, MultiCard, MultiSystem, etc.).
|
| For other lower end, multi-tenant scenarios, they have card
| virtualization, MIG, etc. for card sharing. You have to
| complete on that, too, for cloud and smaller applications.
| dgfitz wrote:
| How does any of this make money?
| arcticbull wrote:
| Having the complete ecosystem affords them significant
| margins.
| dgfitz wrote:
| Against what?
| arcticbull wrote:
| As of today they have SaaS company margins as a hardware
| company which is practically unheard of.
| lyime wrote:
| What?
|
| It's like the most profitable set of products in tech. You
| have companies like Meta, MSFT, Amazon, Google etc spending
| $5B every few years buying this hardware.
| dgfitz wrote:
| Stale money is moving around. Nothing changed .
| HeatrayEnjoyer wrote:
| What is stale money?
| dgfitz wrote:
| Hmm. There is a lot of money that exists, doing nothing.
| I consider that stale money.
|
| Edit: I can't sort this out. Where did all the money go?
| bayindirh wrote:
| When this walled garden is the only way to use GPUs with
| high efficiency and everybody is using this stack, and
| NVIDIA controlling the supply of these "platform boards" to
| OEMs, they don't make money, but they literally print it.
|
| However, AMD is coming for them because a couple of high
| profile supercomputer centers (LUMI, Livermore, etc.) are
| using Instinct cards and pouring money to AMD to improve
| their cards and stack.
|
| I have not used their (Instinct) cards, yet, but their
| Linux driver architecture is way better than NVIDIA.
| throwaway48476 wrote:
| All of that is highly relevant for training but what the
| poster was asking for is a desktop inference card.
| bayindirh wrote:
| You use at least half of this stack for desktop setups. You
| need copying daemons, the ecosystem support (docker-nvidia,
| etc.), some of the libraries, etc. even when you're on a
| single system.
|
| If you're doing inference on a server; MIG comes into play.
| If you're doing inference on a larger cloud, GPU-direct
| storage comes into play.
|
| It's all modular.
| WanderPanda wrote:
| No you don't need much bandwidth between cards for
| inference
| bayindirh wrote:
| Copying daemons (gdrcopy) is about pumping data in and
| out of a single card. docker-nvidia and rest of the stack
| is enablement for using cards.
|
| GPU-Direct is about pumping data from storage devices to
| cards, esp. from high speed storage systems across
| networks.
|
| MIG actually shares a single card to multiple instances,
| so many processes or VMs can use a single card for
| smaller tasks.
|
| Nothing I have written in my previous comment is related
| to inter-card, inter-server communication, but all are
| related to disk-GPU, CPU-GPU or RAM-CPU communication.
|
| Edit: I mean, it's not OK to talk about downvoting, and
| downvote as you like but, I install and enable these
| cards for researchers. I know what I'm installing and
| what it does. C'mon now. :D
| mikhael wrote:
| Mostly, I think, we don't really understand your argument
| that Intel couldn't easily replicate the parts needed
| only for inference.
| landryraccoon wrote:
| It's possible you're underestimating the open source
| community.
|
| If there's a competing platform that hobbyists can tinker
| with, the ecosystem can improve quite rapidly, especially
| when the competing platform is completely closed and
| hobbyists basically are locked out and have no
| alternative.
| throwaway48476 wrote:
| Innovation is a bottom up process. If they sell the
| hardware the community will spring up to take advantage.
| bayindirh wrote:
| > It's possible you're underestimating the open source
| community.
|
| On the contrary. You really don't know how I love and
| prefer open source and love a more leveling playing
| field.
|
| > If there's a competing platform that hobbyists can
| tinker with...
|
| AMD's cards are better from hardware and software
| architecture standpoint, but the performance is not there
| yet. Plus, ROCm libraries are not that mature, but
| they're getting there. Developing high performance, high
| quality code is deceivingly expensive, because it's very
| heavy in theory, and you fly _very close_ to the metal. I
| did that in my Ph.D., so I know what it entails. So it
| requires more than a couple (hundred) hobbyists to pull
| off (see the development of Eigen linear algebra library,
| or any high end math library).
|
| Some big guns are pouring money into AMD to implement
| good ROCm libraries, and it started paying off (Debian
| has a ton of ROCm packages now, too). However, you need
| to be able to pull it off in the datacenter to be able to
| pull it off on the desktop.
|
| AMD also needs to be able to enable ROCm on desktop
| properly, so people can start hacking it at home.
|
| > especially when the competing platform is completely
| closed...
|
| NVIDIA gives a lot of support to universities,
| researchers and institutions who play with their cards.
| Big cards may not be free, but know-how, support and
| first steps are always within reach. Plus, their
| researchers dogfood their own cards, and write papers
| with them.
|
| So, as long as papers got published, researchers do their
| research, and something got invented, many people don't
| care about how open source the ecosystem is. This upsets
| me a ton, but when closed source AI companies and
| researchers who forget to add crucial details to their
| papers so what they did can't be reproduced don't care
| about open source, because they think like NVIDIA. "My
| research, my secrets, my fame, my money".
|
| It's not about sharing. It's about winning, and it's ugly
| in some aspects.
| phkahler wrote:
| No. I've been reading up. I'm planning to run Flux 12b on
| my AMD 5700G with 64GB RAM. CPU will take 5-10minutes per
| image which will be fine for me tinkering while writing
| code. Maybe I'll be able to get the GPU going on it too.
|
| Point of the OP is this is entirely possible with even an
| iGPU if only we have the RAM. nVidia _should be_
| irrelevant for local inference.
| postalrat wrote:
| Lets see how quickly that changes if intel releases cards
| with massive amounts of ram for a fraction of the cost.
| jmward01 wrote:
| Most of the above infra is predicated on limiting RAM so that
| you need so much communication between cards. Bump the RAM up
| and you could do single card inference and all those
| connections become overhead that could have gone to more ram.
| For training there is an argument still, but even there the
| more RAM you have the less all that connectivity gains you.
| RAM has been used to sell cards and servers for a long time
| now, it is time to open the floodgates.
| foobiekr wrote:
| Correct for inference - the main use of the interconnect is
| RDMA requests between GPUs to fit models that wouldn't
| otherwise fit.
|
| Not really correct for training - training has a lot of
| all-to-all problems, so hierarchical reduction is useful
| but doesn't really solve the incast problem - Nvlink
| _bandwidth_ is less of an issue than perhaps the SHARP
| functions in the NVLink switch ASICs.
| epistasis wrote:
| Rather than tackling the entire market at once, they could
| start with one section and build from there. NVIDIA didn't
| get to where it was in a year, it took many strategic
| acquisitions. (All the networking and other HPC-specialized
| stuff I was buying a decade ago has seemingly been bought by
| NVIDIA).
|
| Start by being a "second vendor" for huge customers of NVIDIA
| that want to foster competition, as well as a few others
| willing to take risks, and build from there.
| teekert wrote:
| I have a question for you, since I'm somewhat entering the
| HPC world. In the EU the EuroHPC-JU is building what they
| call AI factories, afaict these are just batch processing
| (Slurm I think) clusters with GPUs in the nodes. So I wonder
| where you'd place those cards of cards. Are you saying there
| is another, perhaps better ways to use massive amounts of
| these cards? Or is that still in the "super powerful
| workstation" domain? Thanx in advance.
| treprinum wrote:
| View it as Raspberry Pi for AI workloads. Initial stage is
| for enthusiasts that would develop the infra, figure out
| what is possible and spread the word. Then the next phase
| will be SME industry adoption, making it commercially
| interesting, while bypassing Nvidia completely. At some
| point it would live its own life and big players jump in.
| Classical disrupt strategy via low cost unique offerings.
| segmondy wrote:
| They don't need to do 128gb, 48gb+ would eat their lunch, Intel
| and AMD are sleeping.
| ThatMedicIsASpy wrote:
| SR-IOV is supported on their iGPUs and outside of it exclusive to
| their enterprise offering. Give it to me on desktop and I'll buy.
| throwaway48476 wrote:
| Intel is allergic to competition.
| karmakaze wrote:
| I wanted to have alternative choices than Nvidia for high power
| GPUs. Then the more I thought about it, the more it made sense to
| rent cloud services for AI/ML workloads and lesser powered ones
| for gaming. The only use cases I could come up with for wanting
| high-end cards are 4k gaming (a luxury I can't justify for
| infrequent use) or for PC VR which may still be valid if/when a
| decent OLED (or mini-OLED) headset is available--the Sony PSVR2
| with PC adapter is pretty close. The Bigscreen Beyond is also a
| milestone/benchmark.
| oidar wrote:
| Which video card are you using for PSVR?
| karmakaze wrote:
| I haven't decided/pulled-the-trigger but the Intel ARC series
| are giving the AMD parts a good run for the money.
|
| The only concern is how well the new Intel drivers work (full
| support for DX12) with older titles which are continuously
| being improved (for DX11, 10, and some for 9 others via
| emulation).
|
| There's likely some deep discounting of Intel cards because
| of how bad the drivers were at launch and the prices may not
| stay so low once things are working much better.
| gigaflop wrote:
| Don't rent a GPU for gaming, unless you're doing something like
| a full-on game streaming service. +10ms isn't much for some
| games, but would be noticeable on plenty.
|
| IMO you want those frames getting rendered as close to the
| monitor as possible, and you'd probably have a better time with
| lower fidelity graphics rendered locally. You'd also get to
| keep gaming during a network outage.
| babypuncher wrote:
| I don't even think network latency is the real problem, it's
| all the buffering needed to encode a game's output to a video
| stream and keep it v-synced with a network-attached display.
|
| I've tried game streaming under the best possible conditions
| (<1ms network latency) and it still feels a little off.
| Especially shooters and 2D platformers.
| oidar wrote:
| Yeah - there's no way to play something like
| Overwatch/Fornite on a streaming service and have a good
| time. The only things that seems to be ok is turned based
| or platformers.
| karmakaze wrote:
| Absolutely. By "and lesser powered ones for gaming" I meant
| purchase.
| BadHumans wrote:
| I'm considering getting one to replace my 8 year old NVIDIA card
| but why are there 2 SKUs almost identical in price?
| layer8 wrote:
| Binning.
|
| https://en.wikipedia.org/wiki/Product_binning#Core_unlocking
| tcdent wrote:
| If they were serious about AI they would have published TOPS
| stats at at least float32 and bfloat16.
|
| The lack of quantified stats on the marketing pages tells me
| Intel is way behind.
| andrewstuart wrote:
| Intel can't compete head to head with Nvidia on performance.
|
| But surely it's easy enough to compete on video ram - why not
| load their GPUs to the max with video ram?
|
| And also video encoder cores - Intel has a great video encoder
| core and these vary little across high end to low end GPUs - so
| they could make it a standout feature to have, for example, 8
| video encoder cores instead of 2.
|
| It's no wonder Nvidia is the king because AMD and Intel just
| don't seem willing to fight.
| AndrewDucker wrote:
| Which market segment wants to encode 8 streams at once for
| cheap, and how big is it?
| hx8 wrote:
| I like Intel's aggressive pricing against entry/mid level GPUs,
| which hopefully puts downward pressure on all GPUs. Overall,
| their biggest concern is software support. We've had reports of
| certain DX11/12 games failing to run properly on Proton, and the
| actual performance of the A series varied greatly between games
| even on Windows. I suspect we'll see the same issues when the
| B580 gets proper third party benchmarking.
|
| Their dedication to Linux Support, combined with their good
| pricing makes this a potential buy for me in future versions. To
| be frank, I won't be replacing my 7900 XTX with this. Intel needs
| to provide more raw power in their cards and third parties need
| to improve their software support before this captures my
| business.
| Sparkyte wrote:
| Intel over there with two spears in the knees looking puzzled and
| in pain.
| smcleod wrote:
| 12GB of vRAM? What a wasted opportunity.
| machinekob wrote:
| For lowest end GPU? (and 2k gaming?) It is plenty even for most
| 4k games.
| smcleod wrote:
| Gaming sure, but not for GPU compute
| machinekob wrote:
| You most likely would buy 700x series for compute
| declan_roberts wrote:
| I think a graphics card tailored for 2k gaming is actually great.
| 2k really is the goldilocks zone between 4k and 1080p graphics
| before you start creeping into diminishing returns.
| icegreentea2 wrote:
| 2k usually refers to 1080p no? The k is the approximate
| horizontal resolution, so 1920x1080 is definitely 2k enough.
| antisthenes wrote:
| 2k Usually refers to 2560x1440.
|
| 1920x1080 is 1080p.
|
| It doesn't make a whole lot of sense, but that's how it is.
| ortusdux wrote:
| https://en.wikipedia.org/wiki/2K_resolution
| nightski wrote:
| That's amusing because I think almost everyone I know
| confuses it with 1440p. I've never heard of 2k being used
| for 1080p before.
| Retric wrote:
| "In consumer products, 2560 x 1440 (1440p) is sometimes
| referred to as 2K,[13] but it and similar formats are
| more traditionally categorized as 2.5K resolutions."
| seritools wrote:
| 1440p is colloquially referred to as 2.5K, not 2K.
| vundercind wrote:
| It'd be pretty weird if it were called 2k. 1080p is in an
| absolute sense or as a relative "distance" to the next-
| lowest thousand _closer_ to 2k pixels of width than 4k is
| to 4k (both are under, of course, but one 's under by 80
| pixels, one by 160). It's got a much better claim to the
| label 2k than 1440p does, and arguably a somewhat better
| claim to 2k than 4k has to 4k.
|
| [EDIT] I mean, of course, 1080p's also not typically
| called that, yet another resolution is, but labeling
| 1440p 2k is especially far off.
| mkl wrote:
| You are misunderstanding. 1080p, 1440p, 2160p refer to
| the number of _rows_ of pixels, and those terms come from
| broadcast television and computing (the p is progressive,
| vs i for interlaced). 4k, 2k refer to the number of
| _columns_ of pixels, and those terms come from cinema and
| visual effects (and originally means 4096 and 2048 pixels
| wide). That means 1920x1080 is both 2k _and_ 1080p,
| 2560x1440 is both 2.5k and 1440p, and 3840x2160 is both
| 4k and 2160p.
| vundercind wrote:
| > You are misunderstanding. 1080p, 1440p, 2160p refer to
| the number of rows of pixels
|
| > (the p is progressive, vs i for interlaced)
|
| > 4k, 2k refer to the number of columns of pixels
|
| > 2560x1440 is both 2.5k and 1440p, and 3840x2160 is both
| 4k and 2160p.
|
| These parts I did not misunderstand.
|
| > and those terms come from cinema and visual effects
| (and originally means 4096 and 2048 pixels wide)
|
| OK that part I didn't know, or at least had forgotten--
| which are effectively the same thing, either way.
|
| > 1920x1080 is both 2k and 1080p
|
| Wikipedia suggests that in this particular case (unlike
| with 4k) application of "2k" to resolutions other than
| the original cinema resolution (2048x1080) is unusual;
| moreover, I was responding to a commenter's usage of "2k"
| as synonymous with "1440p", which seemed especially odd
| to me.
| nemomarx wrote:
| I have never seen 2.5k used in the wild (gamer forums
| etc) so it can't be that colloquial.
| layer8 wrote:
| Actual use is inconsistent. From
| https://en.wikipedia.org/wiki/2K_resolution: " _In consumer
| products, 2560 x 1440 (1440p) is sometimes referred to as 2K,
| but it and similar formats are more traditionally categorized
| as 2.5K resolutions._ "
|
| "2K" is used to denote WQHD often enough, whereas 1080p is
| usually called that, if not "FHD".
|
| "2K" being used to denote resolutions lower than WQHD is
| really only a thing for the 2048 cinema resolutions, not for
| FHD.
| declan_roberts wrote:
| TIL
| giobox wrote:
| For sure its been a sweet spot for a very long time for budget
| conscious gamers looking for best balance of price and frame
| rates, but 1440p optimized parts are nothing new. Both NVidia
| and AMD make parts that target 1440p display users too, and
| have done for years. Even previous Intel parts you can argue
| were tailored for 1080p/1440p use, given their comparative
| performance deficit at 4k etc.
|
| Assuming they retail at prices Intel are suggesting in the
| press releases, you maybe here save 40-50 bucks over an
| ~equivalent NVidia 4060.
|
| I would also argue like others here that with tech like frame
| gen, DLSS etc, even the cheapest discrete NVidia 40xx parts are
| arguably 1440p optimized now, it doesn't even need to be said
| in their marketing materials. Im not as familiar with AMD's
| range right now, but I suspect virtually every discrete
| graphics card they sell is "2k optmized" by the standard Intel
| used here, and also doesn't really warrant explicit mention.
| philistine wrote:
| I'm baffled that PC gamers have decided that 1440p is the
| endgame for graphics. When I look at a 27-inch 1440p display,
| I see pixel edges everywhere. It's right at the edge of
| losing the visibility of individual pixels, since I can't
| perceive them at 27-inch 2160p, but not quite there yet for
| desktop distances.
|
| Time marches on, and I become ever more separated from gaming
| PC enthusiasts.
| wing-_-nuts wrote:
| I used to be in the '4k or bust' camp, but then I realized
| that I needed 1.5x scaling on a 27" display to have my UI
| at a comfy size. That put me right back at 1440p screen
| real estate _and_ you had to deal with fractional scaling
| issues.
|
| Instead, I bought a good 27" 1440p monitor, and you know
| what? I am not the discerning connoisseur of pixels that I
| thought I was. Honestly, it's _fine_.
|
| I will hold out with this setup until I can get a 8k 144hz
| monitor and a gpu to drive it for a reasonable price. I
| expect that will take another decade or so.
| doubled112 wrote:
| I have a 4K 43" TV on my desk and it is about perfect for
| me for desktop use without scaling. For gaming, I tend to
| turn it down to 1080p because I like frames and don't
| want to pay up.
|
| At 4K, it's like having 4 21" 1080p monitors. Haven't
| maximized or minimized a window in years. The sprawl is
| real.
| layer8 wrote:
| This is a trade-off with frame rates and rendering quality.
| When having to choose, most gamers prefer higher frame rate
| and rendering quality. With 4K, that becomes very
| expensive, if not impossible. 4K is 2.25 times the pixels
| of 1440p, which for example means you can get double the
| frame rate with 1440p using the same processing power and
| bandwidth.
|
| In other words, the current tech just isn't quite there
| yet, or not cheap enough.
| gdwatson wrote:
| Arguably 1440p is the sweet spot for gaming, but I love
| 4k monitors for the extra text sharpness. Fortunately
| DLSS and FSR upscaling are pretty good these days. At 4k,
| quality-mode upscaling gives you a native render
| resolution about 1440p, with image quality a little
| better and performance a little worse.
|
| It's a great way to have my cake and eat it too.
| Novosell wrote:
| Gaming at 2160p is just too expensive still, imo. You gotta
| pay more for your monitor, GPU and PSU. Then if you want
| side monitors that match in resolution, you're paying more
| for those as well.
|
| You say PC gamers at the start of your comment and gaming
| PC enthusiasts at the end. These groups are not the same
| and I'd say the latter is largely doing ultrawide, 4k
| monitor or even 4k TV.
|
| According to steam, 56% are on 1080p, 20% on 1440p and 4%
| on 2160p.
|
| So gamers as a whole are still settled on 1080p, actually.
| Not everyone is rich.
| semi-extrinsic wrote:
| I'm still using a 50" 1080p (plasma!) television in my
| living room. It's close to 15 years old now. I've seen
| newer and bigger TVs many times at my friends house, but
| it's just not _better enough_ that I can be bothered to
| upgrade.
| dmonitor wrote:
| Doesn't plasma have deep blacks and color reproduction
| similar to OLED? They're still very good displays, and
| being 15 years old means it probably pre-dates the
| SmartTV era.
| philistine wrote:
| > You say PC gamers at the start of your comment and
| gaming PC enthusiasts at the end. These groups are not
| the same
|
| Prove to me those aren't synonyms.
| Novosell wrote:
| Prove to me they are.
| dmonitor wrote:
| The major drawback for PC gaming at 4k that I never see
| mentioned is how much _heat_ the panels generate. Many of
| them generate so much heat that rely on active cooling! I
| bought a pair of high refresh 4k displays and combined
| with the PC, they raised my room to an uncomfortable
| temperature. I returned them for other reasons (hard to
| justify not returning them when I got laid off a week
| after purchasing them), but I 've since made note of the
| wattage when scouting monitors.
| evantbyrne wrote:
| Not rich. Well within reach for Americans with expendable
| income. Mid range 16" macbook pros are in the same price
| ballpark as 4k gaming rigs. Or put another way costs less
| than a vacation for two to a popular destination.
| wlesieutre wrote:
| I don't think it's seen as the end game, it's that if you
| want 120 fps (or 144, 165, or 240) without turning down
| your graphics settings you're talking $1000+ GPUs plus a
| huge case and a couple hundreds watts higher on your power
| supply.
|
| 1440p hits a popular balance where it's more pixels than
| 1080p but not so absurdly expensive or power hungry.
|
| Eventually 4K might be reasonably affordable, but we'll
| settle at 1440p for a while in the meantime like we did at
| 1080p (which is still plenty popular too).
| dingnuts wrote:
| if you can see the pixels on a 27 inch 1440p display,
| you're just sitting too close to the screen lol
| philistine wrote:
| I don't directly see the pixels per se like on 1080p at
| 27-inch at desktop distances. But I see harsh edges in
| corners and text is not flawless like on 2160p.
|
| Like I said, it's on the cusp of invisible pixels.
| Lanolderen wrote:
| It's a nice compromise for semi competitive play. On 4k
| it'd be very expensive and most likely finicky to maintain
| high FPS.
|
| Tbh now that I think about it I only really _need_
| resolution for general usage. For gaming I 'm running
| everything but textures on low with min or max FOV
| depending on the game so it's not exactly aesthetic anyway.
| I more so need physical screen size so the heads are
| physically larger without shoving my face in it and refresh
| rate.
| goosedragons wrote:
| Nvidia markets the 4060 as a 1080p card. It's design makes it
| worse at 1440p than past X060 cards too. Intel has XeSS to
| compete with DLSS and are reportedly coming out with their
| own frame gen competitor. $40-50 is a decent savings in the
| budget market especially if Intel's claims are to believed
| and it's actually faster than the 4060.
| leetharris wrote:
| I see what you're saying, but I also feel like ALL Nvidia cards
| are "2K" oriented cards because of DLSS, frame gen, etc.
| Resolution is less important now in general thanks to their
| upscaling tech.
| laweijfmvo wrote:
| Can it compete with the massive used GPU market though? Why buy
| a new Intel card when I can get a used Nvidia card that I know
| will work well?
| teaearlgraycold wrote:
| To some, buying used never crosses their mind.
| teaearlgraycold wrote:
| Please say 1440p and not 2k. Ignoring arguments about what 2k
| _should_ mean, there's enough use either way that it's
| confusing.
| Implicated wrote:
| 12GB memory
|
| -.-
|
| I feel like _anyone_ who can pump out GPU's with 24GB+ of memory
| that are capable to use for py-stuff would benefit greatly.
|
| Even if it's not as performant as the NVIDIA options - just to be
| able to get the models to run, at whatever speed.
|
| They would fly off the shelves.
| cowmix wrote:
| 100% - _this_ could be Intel 's ticket to capture the hearts of
| developers and then everything else that flows downstream. They
| have nothing to lose here -- just do it Intel!
| bagels wrote:
| They could lose a lot of money?
| flockonus wrote:
| They already do... google $INTC, stare in disbelief in the
| right side "Financials".
|
| At some point they should make a stand, that's the whole
| meta-topic of this thread.
| evanjrowley wrote:
| Maybe that's not too bad for someone who wants to use pre-
| existing models. Their AI Playground examples require at
| minimum an Intel Core Ultra H CPU, which is quite low-powered
| compared to even these dedicated GPUs:
| https://github.com/intel/AI-Playground
| elorant wrote:
| Would it though? How many people are running inference at home?
| Outside of enthusiasts I don't know anyone. Even companies
| don't self-host models and prefer to use APIs. Not that I
| wouldn't like a consumer GPU with tons of VRAM, but I think
| that the market for it is quite small for companies to invest
| building it. If you bother to look at Steam's hardware stats
| you'll notice that only a small percentage is using high-end
| cards.
| ModernMech wrote:
| It's a chicken and egg scenario. The main problem with
| running inference at home is the lack of hardware. If the
| hardware was there more people would do it. And it's not a
| problem if "enthusiasts" are the only ones using it because
| that's to be expected at this stage of the tech cycle. If the
| market is small just charge more, the enthusiasts will pay
| it. Once more enthusiasts are running inference at home, then
| the late adopters will eventually come along.
| m00x wrote:
| Mac minis are great for this. They're cheap-ish and they
| can run quite large models at a decent speed if you run it
| with an MLX backend.
| alganet wrote:
| mini _Pro_ are great for this, ones with large RAM
| upgrades.
|
| If you get the base 16GB mini, it will have more or less
| the same VRAM but way worse performance than an Arc.
|
| If you already have a PC, it makes sense to go for the
| cheapest 12GB card instead of a base mac mini.
| tokioyoyo wrote:
| This is the weird part, I saw the same comments in other
| threads. People keep saying how everyone yearns for local
| LLMs... but other than hardcore enthusiasts it just sounds
| like a bad investment? Like it's a smaller market than gaming
| GPUs. And by the time anyone runs them locally, you'll have
| bigger/better models and GPUs coming out, so you won't even
| be able to make use of them. Maybe the whole "indoctrinate
| users to be a part of Intel ecosystem, so when they go work
| for big companies they would vouch for it" would have
| merit... if others weren't innovating and making their
| products better (like NVIDIA).
| throwaway48476 wrote:
| Intel sold their GPUs at negative margin which is part of
| why the stock fell off a cliff. If they could double the
| vram they could raise the price into the green even selling
| thousands, likely closer to 100k, would be far better than
| what they're doing now. The problem is Intel is run by
| incompetent people who guard their market segments as
| tribal fiefs instead of solving for the customer.
| refulgentis wrote:
| By subsidizing it more they'll lose less money?
| throwaway48476 wrote:
| Increasing VRAM would differentiate intel GPUs and allow
| driving higher ASPs, into the green.
| m00x wrote:
| You can just use a CPU in that case, no? You can run most ML
| inference on vectorized operations on modern CPUs at a fraction
| of the price.
| marcyb5st wrote:
| My 7800x says not really. Compared to my 3070 it feels so
| incredibly slow that gets in the way of productivity.
|
| Specifically, waiting ~2 seconds vs ~20 for a code snippet is
| much more detrimental to my productivity than the time
| difference would suggest. In ~2 seconds I don't get
| distracted, in ~20 seconds my mind starts wandering and then
| I have to spend time refocusing.
|
| Make a GPU that is 50% slower than a 2 generations older mid-
| range GPU (in tokens/s) but on bigger models and I would
| gladly shell out 1000+$.
|
| So much so that I am considering getting a 5090 if nVdia
| actually fixes the connector mess they made with 4090s or
| even a used v100.
| refulgentis wrote:
| I don't understand, make it slower so it's faster?
| m00x wrote:
| I'm running codeseeker 13B model on my macbook with no perf
| issues and I get a response within a few seconds.
|
| Running a specialist model makes more sense on small
| devices.
| bongodongobob wrote:
| I don't know a single person in real life that has any desire
| to run local LLMs. Even amongst my colleagues and tech friends,
| not very many use LLMs period. It's still very niche outside AI
| enthusiasts. GPT is better than anything I can run locally
| anyway. It's not as popular as you think it is.
| dimensi0nal wrote:
| The only consumer demand for local AI models is for
| generating pornography
| treprinum wrote:
| How about running your intelligent home with a voice
| assistant on your own computer? In privacy-oriented
| countries (Germany) that would be massive.
| magicalhippo wrote:
| This is what I'm fiddling with. My 2080Ti is not quite
| enough to make it viable. I find the small models fail
| too often, so need larger Whisper and LLM models.
|
| Like the 4060 Ti would have been a nice fit if it hadn't
| been for the narrow memory bus, which makes it slower
| than my 2080 Ti for LLM inference.
|
| A more expensive card has the downside of not being cheap
| enough to justify idling in my server, and my gaming card
| is at times busy gaming.
| serf wrote:
| absolutely wrong -- if you're not clever enough to think of
| any other reason to run an LLM locally then don't condemn
| the rest of the world to "well they're just using it for
| porno!"
| throwaway48476 wrote:
| I want local copilot. I would pay for this.
| rafaelmn wrote:
| You can get that on mac mini and it will probably cost you less
| than equivalent PC setup. Should also perform better than low
| end Intel GPU and be better supported. Will use less power as
| well.
| jmward01 wrote:
| 12GB max is a non-starter for ML work now. Why not come out with
| a reasonably priced 24gb card even if it isn't the fastest and
| target it at the ML dev world? Am I missing something here?
| Implicated wrote:
| I was wondering the same thing. Seems crazy to keep pumping out
| 12gb cards in 2025.
| shrewduser wrote:
| these are the entry level cards, i imagine the coming higher
| end variants will have the option of much more ram.
| tofuziggy wrote:
| Yes exactly!!
| enragedcacti wrote:
| > Am I missing something here?
|
| Video games
| rs_rs_rs_rs_rs wrote:
| It's insane how out of touch people can be here, lol
| heraldgeezer wrote:
| I have been trying to hold my slurs in reading this thread.
|
| These ML AI Macbook people are legit insane.
|
| Desktops and gaming is ugly and complex to them (because
| lego is hard and macbook look nice unga bunga), yet it is a
| mass market Intel wants to move in on.
|
| People here complain because Intel is not making a cheap
| GPU to "make AI" on when that's a market of maybe 1000
| people.
|
| This Intel card is perfect for an esports gaming machine
| running CS2, Valorant, Rocket Leauge and casual or older
| games like The Sims, GoG games etc. Market of 1 million +
| right there, CS2 alone is 1mil people playing everyday. Not
| people grinding leetcode on their macs. Every real
| developer has a desktop, epyc cpu, giga ram and a nice GPU
| for downtime and run a real OS like Linux or even Windows
| (yes majority of devs run Windows)
| throwaway48476 wrote:
| Intel GPUs don't sell well to gamers. They've been on the
| market for years now.
|
| >market of maybe 1000 people
|
| The market of people interested in local ai inference is
| in the millions. If it's cheap enough the data center
| market is at least 10 million.
| heraldgeezer wrote:
| Yes, Intel cards have sucked. But they are trying again!
| terhechte wrote:
| Most devs use windows
| (https://www.statista.com/statistics/869211/worldwide-
| softwar...). Reddit llocallama alone has 250k users.
| Clearly the market is bigger than 1000 people. Why are
| gamers and Linux people always so aggressive diminutive
| of other people's interests?
| heraldgeezer wrote:
| >Why are gamers and Linux people always so aggressive
| diminutive of other people's interests?
|
| Both groups have a high autism %
|
| We love to be "technically correct" and we often are. So
| we get frustrated when people claim things that are
| wrong.
| jmward01 wrote:
| How big is NVIDIA now? You don't think breaking into that
| market is a good strategy? And, yes, I understand that this
| is targeted at gamers and not ML. That was the point of the
| comment I made. Maybe if they did target ML they would make
| money and open a path to the massive server market out
| there.
| bryanlarsen wrote:
| These are $200 low end cards, the B5X0 cards. Presumably they
| have B7X0 and perhaps even B9X0 cards in the pipeline as well.
| zamadatix wrote:
| There has been no hint or evidence (beyond hope) Intel will
| add a 900 class this generation.
|
| B770 was rumoured to match the 16 GB of the A770 (and to be
| the top end offering for Battlemage) but it is said to not
| have even been taped out yet with rumour it may end up having
| been cancelled completely.
|
| I.e. don't hold your breath for anything consumer from Intel
| this generation better for AI than tha A770 you could have
| bought 2 years ago. Even if something slightly better is
| coming at all there is no hint it will be soon.
| hulitu wrote:
| > These are $200 low end cards
|
| Hm, i wouldn't consider 200$ low end.
| dgfitz wrote:
| ML is about hit another winter. Maybe intel is ahead of
| industry.
|
| Or we can keep asking high computers questions about
| programming.
| PittleyDunkin wrote:
| > ML is about hit another winter.
|
| I agree ML is about to hit (or has likely already hit) some
| serious constraints compared to breathless predictions of two
| years ago. I don't think there's anything equivalent to the
| AI winter on the horizon, though--LLMs even operated by
| people who have no clue how the underlying mechanism
| functions are still far more empowered than anything like the
| primitives of the 80s enabled.
| klodolph wrote:
| Yeah... I want to think of it like mining, where you've
| found an ore vein. You have to switch from prospecting to
| mining. There's a lot of work to be done by integrating our
| LLMs and other tools with other systems, and I think the
| cost/benefit of making models bigger, Bigger, BIGGER is
| reaching a plateau.
| kimixa wrote:
| I think there'll be a "financial" winter - or another way a
| bubble burst - the investment right now is simply
| unsustainable, how are these products going to be
| monetized?
|
| Nvidia had a revenue of $27billion in 2023 - that's about
| $160 per person per year [0] for _every working age person_
| in the USA. And it 's predicted to more than double in
| 2024. If you reduce that to office workers (you know, the
| people who might _actually_ get some benefit, as no AI is
| going to milk a cow or serve you starbucks) that 's more
| like $1450/year. Or again more than double that for 2024.
|
| How much value add is the current set of AI products going
| to give us? It's still mostly promise too.
|
| Sure, like most bubbles there'll probably still be some
| winners, but there's no way the current market as a whole
| is sustainable.
|
| The only way the "maximal AI" dream income is actually
| going to happen is if they functionally replace a
| significant proportion of the working population
| completely. And that probably would have large enough
| impacts to society that things like "Dollars In A Bank" or
| similar may not be so important.
|
| [0] Using the stat of "169.8 million people worked at some
| point in 2022"
| https://www.bls.gov/news.release/pdf/work.pdf
|
| [1] 18.5 million office workers according to
| https://www.bls.gov/news.release/ocwage.nr0.htm
| BenjiWiebe wrote:
| Well, "AI" is milking cows. Not LLM's though. Our milking
| robot uses image recognition to find the cow's teats to
| put the milking cup on.
| semi-extrinsic wrote:
| Yeah, but automated milking robots like that have been in
| the market for more than a decade now IIRC?
|
| Seems like a lot of CV solutions have seen fairly steady
| but small incremental advances over the past 10-15 years,
| quite unrelated to the current AI hype.
| kimixa wrote:
| Improving capabilities of AI isn't at odds with expecting
| an "AI Winter" - just the current drive is more hype than
| sustainable, provable progress.
|
| We've been through multiple AI Winters, as a new
| technique is developed, it _does_ increase the
| capabilities. Just not as much as the hype suggested.
|
| To say there won't be a bust implies this boom will last
| forever, into whatever singularity that implies.
| choilive wrote:
| I think the more accurate denominator would be the world
| population. People are seeing benefits to LLMs even
| outside of the office.
| dgfitz wrote:
| How do LLMs make money though?
| hulitu wrote:
| > I think the more accurate denominator would be the
| world population. People are seeing benefits to LLMs even
| outside of the office.
|
| For example ?
|
| (besides deep fakes)
| ben_w wrote:
| While I'd agree monetisation seems to be a challenge in
| the long term (analogy: spreadsheets are used everywhere,
| but are so easy to make they're not themselves a revenue
| stream, only as part of a bigger package)...
|
| > Nvidia had a revenue of $27billion in 2023 - that's
| about $160 per person per year [0] for every working age
| person in the USA
|
| As a non-American, I'd like to point out we also earn
| money.
|
| > as no AI is going to milk a cow or serve you starbucks
|
| Cows have been getting the robots for a while now, here's
| a recent article: https://modernfarmer.com/2023/05/for-
| years-farmers-milked-co...
|
| Robots serve coffee as well as the office parts of the
| coffee business: https://www.techopedia.com/ai-coffee-
| makers-robot-baristas-a...
|
| Some of the malls around here have food courts where
| robots bring out the meals. I assume they're no more
| sophisticated than robot vacuum cleaners, but they get
| the job done.
|
| Transformer models seem to be generally pretty good at
| high-level robot control, though IIRC a different
| architecture is needed down at the level of actuators and
| stepper motors.
| kimixa wrote:
| Sure, robotics help many jobs, and some level of the
| current deep learning boom seems to have crossover in
| improving that - but how many of them are running LLMs
| that affect Nvidia's bottom line right now? There's some
| interesting research in that area, but it's certainly not
| the primary driving force. And then is the control system
| the limiting factor for many systems - it's probably
| relatively easy to get a machine today that makes a
| Starbucks coffee "as good as" a decently trained human.
| But the market doesn't seem to want that.
|
| And I know restricting it to the US is a simplification,
| but so is restricting it to Nvidia, it's just to give a
| ballpark back-of-the-envelope "does this even make
| sense?" level calculation. And that's what I'm failing to
| see.
| amluto wrote:
| Machines that will make espresso, automatically, that I
| personally to what Starbucks service are widely
| available. No AI needed, and they aren't even "robotic".
| These can use ordinary coffee beans, and you can get them
| for home use or for commercial use. You can also go to a
| mall and get a robot to make you coffee.
|
| Nonetheless, Starbucks does not use these machines, and I
| don't see any reason that AI, on its current trajectory,
| will change that calculation any time soon.
| lm28469 wrote:
| I love how the fact that we might not want AI/robots
| everywhere in our lives isn't even discussed.
|
| They could serve us a plate of shit and we'd debate if
| pepper or salt is better to complement it
| ben_w wrote:
| It's pretty often discussed, it's just hard to put
| everything into a single comment (or thread).
|
| I mean, Yudkowsky has basically spent the last decade
| screaming into the void about how AI will with high
| probability literally kill everyone, and even people like
| me who think that danger is much less likely still look
| at the industrial revolution and how slow we were to
| react to the harms of climate change and think "speed-
| running another one of these may be unwise, we should
| probably be careful".
| ben_w wrote:
| What we had in the 80s was barely able to perform spell-
| check, free downloadable LLMs today are mind-blowing even
| in comparison to GPT-2.
| dgfitz wrote:
| I think the only good thing that came out of the 80s was
| the 90s. I'd leave that decade alone so we can forget
| about it.
| lm28469 wrote:
| > even operated by people who have no clue how the
| underlying mechanism functions are still far more empowered
| than anything like the primitives of the 80s enabled.
|
| I'm still not convinced about that. All the """studies"""
| show 30-60% boost in productivity but clearly this doesn't
| translate to anything meaningful in real life because no
| industry laid off 30-60% of their workforce and no industry
| progressed anywhere close to 30% since chat gpt was
| released.
|
| It's been released a whole 24 months ago, remember the
| talks about freeing us from work and curing cancer... Even
| investments funds which are the biggest suckers for
| anything profitable are more and more doubtful
| seanmcdirmid wrote:
| Haven't people been saying that for the last decade? I mean,
| eventually they will be right, maybe "about" means next year,
| or maybe a decade later? They just have to stop making huge
| improvements for a few years and the investment will dry up.
|
| I really wasn't interested in computer hardware anymore (they
| are fast enough!) until I discovered the world of running
| LLMs and other AI locally. Now I actually care about computer
| hardware again. It is weird, I wouldn't have even opened this
| HN thread a year ago.
| vlovich123 wrote:
| What makes local AI interesting to you vs larger remote
| models like ChatGPT and Claude?
| adriancr wrote:
| Not OP but for me a big thing is privacy, I can feed it
| personal documents and expect those to not leak.
|
| It has zero cost, hardware is already there. I'm not
| captive to some remote company.
|
| I can fiddle and integrate with other home sensors /
| automation as I want.
| hentrep wrote:
| Curious as I'm of the same mind - what's your local AI
| setup? I'm looking to implement a local system that would
| ideally accommodate voice chat. I know the answer depends
| on my use case - mostly searching and analysis of
| personal documents - but would love to hear how you've
| implemented.
| dgfitz wrote:
| llama.ccp and time seems to be the general answer to this
| question.
| epicureanideal wrote:
| Lack of ideological capture of the public models.
| seanmcdirmid wrote:
| Control and freedom. You can use unharmonious models and
| hacks to existing models, also latency, you can actually
| use AI for a lot more applications when it is running
| locally.
| HDThoreaun wrote:
| Selling cheap products that are worse than the competition is
| a valid strategy during downturns as businesses look to cut
| costs
| throwaway48476 wrote:
| The survivors of the AI winter are not the dinosaurs but the
| small mammals that can profit by dramatically reducing the
| cost of AI inference in a minimum Capex environment.
| layer8 wrote:
| The ML dev world isn't a consumer mass market like PC gaming
| is.
| hajile wrote:
| Launching a new SKU for $500-1000 with 48gb of RAM seems like
| a profitable idea. The GPU isn't top-of-the-line, but the RAM
| would be unmatched for running a lot of models locally.
| layer8 wrote:
| You can't just throw in more RAM without having the rest of
| the GPU architected for it. So there's an R&D cost involved
| for such a design, and there may even be trade-offs on
| performance for the mass-market lower-tier models. I'm
| doubtful that the LLM enthusiast/tinkerer market is large
| enough for that to be obviously profitable.
| hajile wrote:
| That would depend on how they designed the memory
| controllers. GDDR6 only supporting 1-2gb modules at
| present (I believe GDDR6W supports 4gb modules). If they
| were using 12 1gb modules, then increasing to 24gb
| shouldn't be a very large change.
|
| Honestly, Apple seems to be on the right track here. DDR5
| is slower than GDDR6, but you can scale the amount of RAM
| far higher simply by swapping out the density.
| KeplerBoy wrote:
| It's a 192 bit interface, so 6 16gbit chips.
| KeplerBoy wrote:
| Of course you can just add more RAM. Double the capacity
| of every chip and you get twice the RAM without ever
| asking an engineer.
|
| People did it with the RTX3070.
| https://www.tomshardware.com/news/3070-16gb-mod
| Tuna-Fish wrote:
| Can you find me a 32Gbit GDDR6 chip?
| jmward01 wrote:
| give me 48gb with reasonable power consumption so I can dev
| locally and I will buy it in a heartbeat. Anyone that is
| fine-tuning would want a setup like that to test things
| before pushing to real GPUs. And in reality if you can
| fine-tune on a card like that in two days instead of a few
| hours it would totally be worth it.
| justsomehnguy wrote:
| I would love too, but you can't just add the chips, you
| need the the bus too.
| jmward01 wrote:
| The bigger point here is to ask why they aren't designing
| that in from the start. Same with AMD. RAM has been
| stalled and is critical. Start focusing on allowing a lot
| more of it, even at the cost of performance, and you have
| a real product. I have a 12GB 3060 as my dev box and the
| big limiter for it is RAM, not cuda cores. If it had 48GB
| but the same number of cores then I would be very happy
| with it, especially if it was power efficient.
| Tuna-Fish wrote:
| It's not technically possible to just slap on more RAM.
| GDDR6 is point-to-point with option for clamshell, and the
| largest chips in mass production are 16Gbit/32 bit. So, for
| a 192bit card, the best you can get is 192/32x16Gbitx2 =
| 24GB.
|
| To have more memory, you have to design a new die with a
| wider interface. The design+test+masks on leading edge
| silicon is tens of millions of NRE, and has to be paid well
| over a year before product launch. No-one is going to do
| that for a low-priced product with an unknown market.
|
| The savior of home inference is probably going to be AMD's
| Strix Halo. It's a laptop APU built to be a fairly low end
| gaming chip, but it has a 256-bit LPDDR5X interface. There
| are larger LPDDR5X packages available (thanks to the
| smartphone market), and Strix Halo should be eventually
| available with 128GB of unified ram, performance probably
| somewhere around a 4060.
| ggregoire wrote:
| > 12GB max is a non-starter for ML work now.
|
| Can you even do ML work with a GPU not compatible with CUDA?
| (genuine question)
|
| A quick search showed me the equivalence to CUDA in the Intel
| world is oneAPI, but in practice, are the major Python
| libraries used for ML compatible with oneAPI? (Was also gonna
| ask if oneAPI can run inside Docker but apparently it does [1])
|
| [1] https://hub.docker.com/r/intel/oneapi
| suprjami wrote:
| There is ROCm and Vulkan compute.
|
| Vulkan is especially appealing because you don't need any
| special GPGPU drivers and it runs on any card which supports
| Vulkan.
| PhasmaFelis wrote:
| > Am I missing something here?
|
| This is a graphics card.
| heraldgeezer wrote:
| This is not an ML card... this is a gaming card... Why are you
| people like this?
| whalesalad wrote:
| I still don't understand why graphics cards haven't evolved to
| include sodimm slots so that the vram can be upgraded by the
| end user. At this point memory requirements vary so much from
| gamer to scientist so it would make more sense to offer compute
| packages with user-supplied memory.
|
| tl;dr GPU's need to transition from being add-in cards to being
| a sibling motherboard. A sisterboard? Not a daughter board.
| stracer wrote:
| Too late, and it has a bad rep. This effort from Intel to sell
| discrete GPUs is just inertia from old aspirations, won't really
| help noticeably to save it, as there is not much money in it.
| Most probably the whole Intel ARC effort will be mothballed, and
| probably many more will.
| undersuit wrote:
| No reviews and when you click on the reseller links in the
| press announcement they're still selling A750s with no B-Series
| in sight. Strong paper launch.
| sangnoir wrote:
| The fine article states reviews are still embargoed, and
| sales start next week.
| undersuit wrote:
| The mods have thankfully changed this to a Phoronix article
| instead of the Intel page and the title has been reworked
| to not include 'launch'.
| ksd482 wrote:
| What's the alternative?
|
| I think it's the right call since there isn't much competition
| in GPU industry anyway. Sure, Intel is far behind. But they
| need to start somewhere in order to break ground.
|
| Strictly speaking strategically, my intuition is that they will
| learn from this, course correct and then would start making
| progress.
| stracer wrote:
| The idea of another competitive GPU manufacturer is nice. But
| it is hard to bring into existence. Intel is not in a
| position to invest lots of money and sustained effort into
| products for which the market is captured and controlled by a
| much bigger and more competent company on top of its game.
| Not even AMD can get more market share, and they are much
| more competent in the GPU technology. Unless NVIDIA and AMD
| make serious mistakes, Intel GPUs will remain a 3rd rate
| product.
|
| > "They need to start somewhere in order to break ground"
|
| Intel has big problems and it's not clear they should occupy
| themselves with this. They should stabilize, and the most
| plausible way to do that is to cut the weak parts, and get
| back to what they were good at - performant secure x86_64
| CPUs, maybe some new innovative CPUs with low consumption,
| maybe memory/solid state drives.
| jvanderbot wrote:
| Seems to feature ray tracing (kind of obvious), but also
| upscaling.
|
| My experience on WH40K DT has taught me that upscaling is
| absolutely vital for a reasonable experience on some games.
| 1propionyl wrote:
| > upscaling is absolutely vital for a reasonable experience on
| some games
|
| This strikes me as a bit of a sad state of affairs. We've moved
| beyond a Parkinson's law of computational resources -usage by
| games expands to fill the available resources- to resource
| usage expanding to fill the available resources on the highest
| end machines unavailable for less than a few thousand
| dollars... and then using that to train a model to simulate by
| upscaling higher quality or performance on lower end machines.
|
| A counterargument would be that this makes high-end experiences
| available to more people, and while in the individual case, I
| don't buy that that's where the incentives it creates are
| driving the entire industry.
|
| To put a finer point on it: at what percentage of budget is too
| much money being spent on producing assets?
| jvanderbot wrote:
| Isn't it insane to think that rendering triangles for the
| visuals in games has gotten so demanding that we need an
| artificially intelligent system embedded in our graphics
| cards to paint pixels that look like high definition
| geometry?
|
| What a time to be alive. Our most advanced technology is used
| to cheat on homework and play video games.
| 1propionyl wrote:
| It is. And it strikes me as evidence we've lost the plot
| and a measure has ceased to be a good measure upon being a
| target.
|
| It used to be that more computational power was desirable
| because it would allow for developers to more fully realize
| creative visions that weren't previously possible.
|
| Now, it seems that the goal is simply visual fidelity and
| asset complexity... and the rest of the experience is not
| only secondary, but compromised in pursuit of the former.
|
| Thinking back on recent games that felt like something
| _new_ and painstakingly crafted... they 're almost all 2D
| (or look like it), lean on excellent art/music (and even
| haptics!) direction, have a well-crafted core gameplay loop
| or set of systems, and have relatively low actual system
| requirements (which in turn means they are exceptionally
| smooth without any AI tricks).
|
| Off the top of my head few years: Hades, Balatro, Animal
| Well, Cruelty Squad[0], Spelunky, Pizza Tower, Papers
| Please, etc. Most of these could just as easily have been
| made a decade ago.
|
| That's not to say we haven't had many games that are
| gorgeous and fun. But while the latter is necessary and
| sufficient, the former is neither.
|
| It's just icing: it doesn't matter if the cake tastes like
| crap.
|
| [0] a mission statement if there ever was one for how much
| fun something can be while not just being ugly but being
| actively antagonistic to the senses and any notion of good
| taste.
| jms55 wrote:
| > Isn't it insane to think that rendering triangles for the
| visuals in games has gotten so demanding that we need an
| artificially intelligent system embedded in our graphics
| cards to paint pixels that look like high definition
| geometry?
|
| That's not _quite_ how temporal upscaling work in practice.
| It's more of a blend between existing pixels, not
| generating entire pixels from scratch.
|
| The technique has existed since before ML upscalers became
| common. It's just turned out that ML is really good at
| determining how much to blend by each frame, compared to
| hand written and tweaked per-game heuristics.
|
| ---
|
| For some history, DLSS 1 _did_ try and generate pixels
| entirely from scratch each frame. Needless to say, the
| quality was crap, and that was after a very expensive and
| time consuming process to train the model for each
| individual game (and forget about using it as you develop
| the game; imagine having to retrain the AI model as you
| implement the graphics).
|
| DLSS 2 moved to having the model predict blend weights fed
| into an existing TAAU pipeline, which is much more
| generalizable and has way better quality.
| crowcroft wrote:
| Anyone using Intel graphics cards? Aside from specs drivers and
| support can make or break the value prop of a gfx card. Would be
| curious what actually using is these is like.
| GiorgioG wrote:
| I put an Arc card in my daughter's machine last month. Seems to
| work fine.
| Scramblejams wrote:
| What OS?
| jamesgeck0 wrote:
| I use an A770 LE for PC gaming. Windows drivers have improved
| substantially in the last two years. There's a driver update
| every month or so, although the Intel Arc control GUI hasn't
| improved in a while. Popular newer titles have generally run
| well; I've played some Metaphor, Final Fantasy 16, Elden Ring,
| Spider-Man Remastered, Horizon Zero Dawn, Overwatch, Jedi
| Survivor, Forza Horizon 4, Monster Hunter Sunbreak, etc.
| without major issues. Older games sometimes struggle; a 6 year
| old Need for Speed doesn't display terrain, some 10+ year old
| indie games crash. Usually fixed by dropping dxvk.dll in the
| game directory. This fix cannot be used with older Windows
| Store games. One problematic newer title was Starfield, which
| at launch had massive frame pacing and hard crashing issues
| exclusive to Intel Arc.
|
| I've had a small sound latency issue forever; most visible with
| YouTube videos, the first half-second of every video is silent.
|
| I picked this card up for about $120 less than the GTX 4060.
| Wasn't a terrible decision.
| imbusy111 wrote:
| None of the store links work. Weird. Is this not supposed to be a
| public page yet?
| SirMaster wrote:
| Must be an announcement rather than a launch I guess?
| kookamamie wrote:
| Why, though? Intel's strategy seems puzzling, to say the least.
| tokioyoyo wrote:
| Hard to get subsidies if you're not releasing new lines of
| products.
| ChrisArchitect wrote:
| Official page:
| https://www.intel.com/content/www/us/en/products/docs/discre...
| SeqDesign wrote:
| the new intel battlemage cards look sweet. if they can extend
| displays on linux, then i'll definitely be buying one
| greenavocado wrote:
| I'm not a gamer and there is not enough memory in this thing for
| me to care to use it for AI applications so that leaves just one
| thing I care about: hardware accelerated video encoding and
| decoding. Let's see some performance metrics both in speed and
| visual quality
| bjoli wrote:
| From what I have gathered, the alchemist av1 is about the same
| or sliiiightly worse than current nvenc. My a750 does about
| 1400fps for dvd encoding on the quality preset. I havent had
| the opportunity to try 1080p or 4k though.
| bloodyplonker22 wrote:
| I wanted Intel to do well so I purchased an ARC card. The problem
| is not the hardware. For some games, it worked fine, but in
| others, it kept crashing left and right. After updates to
| drivers, crashing was reduced, but it still happened. Driver
| software is not easy to develop thoroughly. Even AMD had problems
| when compared to Nvidia when AMD really started to enter the GPU
| game after buying ATI. AMD has long since solved their driver
| woes, but years after ARC's launch, Intel still has not.
| shmerl wrote:
| Do you mean on Linux and are those problems with anv? Radv
| seems to be developed faster these days with anv being slightly
| more behind.
| jamesgeck0 wrote:
| I haven't experienced many crashing issues on Windows 11. What
| games are you seeing this in?
| bjoli wrote:
| I love my a750. Works fantastic out of the box in Linux. He
| encoding and decoding for every format I use. Flawless support
| for different screens.
|
| I haven't regretted the purchase at all.
| maxfurman wrote:
| How does this connect to Gelsinger's retirement, announced
| yesterday? The comments on that news were all doom and gloom, so
| I had expected more negative news today. Not a product launch.
| But I'm just some guy on HN, what do I know?
| wmf wrote:
| I don't see any connection. This is a very minor product for
| Intel.
| Havoc wrote:
| Who is the target audience for this?
|
| Well informed gamers know Intel's discrete GPU is hanging by a
| thread, so they're not hoping on that bandwagon.
|
| Too small for ML.
|
| The only people really happy seem to be the ones buying it for
| transcoding and I can't imagine there is a huge market of people
| going "I need to go buy a card for AV1 encoding".
| epolanski wrote:
| Cheap gaming rigs.
|
| They do well compared to AMD/Nvidia at that price point.
|
| Is it a market worth chasing at all?
|
| Doubt.
| spookie wrote:
| It's cheap, plenty of market when the others have forgotten the
| segment.
| zamalek wrote:
| If it works well on Linux there's a market for that. AMD are
| hinting that they will be focusing on iGPUs going forward (all
| power to them, their iGPUs are unmatched and NVIDIA is
| dominating dGPU). Intel might be the savior we need. Well,
| Intel and possibly NVK.
|
| Had this been available a few weeks ago I would have gone
| through the pain of early adoption. Sadly it wasn't just an
| upgrade build for me, so I didn't have the luxury of waiting.
| sosodev wrote:
| AMD has some great iGPUs but it seems like they're still
| planning to compete in the dGPU space just not at the high
| end of the market.
| sangnoir wrote:
| > Too small for ML.
|
| What do you mean by this - I assume you mean too small for SoTA
| LLMs? There are many ML applications where 12GB is more than
| enough.
|
| Even w.r.t. LLMs, not everyone requires the latest & biggest
| LLM models. Some "small", distilled and/or quantized LLMs are
| perfectly usable with <24GB
| screye wrote:
| all-in-1 machines.
|
| Intel's customers are 3rd party Cpu assemblers like Dell & HP.
| Many corporate bulk buyers only care if 1-2 of the apps they
| use are supported. The lack of wider support isn't a concern.
| ddtaylor wrote:
| Intel has earned a lot of credit in the Linux space.
|
| Nvidia is trash tier in terms of support and only recently
| making serious steps to actually support the platform.
|
| AMD went all in nearly a decade ago and it's working pretty
| well for them. They are mostly caught up to being Intel grade
| support in the kernel.
|
| Meanwhile, Intel has been doing this since I was in college. I
| was running the i915 driver in Ubuntu 20 years ago. Sure their
| chips are super low power stuff, but what you can do with them
| and the level of software support you get is unmatched. Years
| before these other vendors were taking the platform seriously
| Intel was supporting and funding Mesa development.
| marshray wrote:
| I'm using an Intel card right now. With Wayland. It just works.
|
| Ubuntu 24.04 couldn't even boot to a tty with the Nvidia Quadro
| thing that came with this major-brand PC workstation, still
| under warranty.
| mappu wrote:
| _> Intel 's discrete GPU is hanging by a thread, so they're not
| hoping on that bandwagon_
|
| Why would that matter? You buy one GPU, in a few years you buy
| another GPU. It's not a life decision.
| qudat wrote:
| If you go on the intel arc subreddit people are hyped about
| intel GPUs. Not sure what the price is but the previous gen was
| cheap and the extra competition is welcomed
|
| In particular, intel just needs to support vfio and it'll be
| huge for homelabs.
| zenethian wrote:
| These are pretty interesting, but I'm curious about the side-by-
| side screenshot with the slider: why does ray tracing need to be
| enabled to see the yellow stoplight? That seems like a weird
| oversight.
| zamalek wrote:
| It's possible that the capture wasn't taken at the exact same
| frame, or that the state of the light isn't deterministic in
| the benchmark.
| tommica wrote:
| Probably would jump to Intel once my 3060 gets too old
| headgasket wrote:
| my hunch is the path forward for intel on both the CPU and the
| GPU end is to release a series of consumer chipsets with a large
| number of PCIE 5.0 lanes, and keep iterating this. This would
| cannibalize some of the datacenter server side revenue, both
| that's a reboot... get the hackers raving about intel value for
| the money instead of EPYC. Or do a skunkworks ARM64 M1 like
| processor; there's a market for this as a datacenter part...
| pizzaknife wrote:
| tell it to my intc stock price
___________________________________________________________________
(page generated 2024-12-03 23:00 UTC)