[HN Gopher] Intel Xeon Max 9480 Deep-Dive 64GB HBM2e Onboard Lik...
___________________________________________________________________
Intel Xeon Max 9480 Deep-Dive 64GB HBM2e Onboard Like a GPU or AI
Accelerator
Author : PaulHoule
Score : 110 points
Date : 2023-09-19 15:00 UTC (2 days ago)
(HTM) web link (www.servethehome.com)
(TXT) w3m dump (www.servethehome.com)
| Aissen wrote:
| In caching mode, this is effectively a 64GB L4. Very impressive!
| (AMD's biggest offering Genoa-X has 1.15GB of L3)
| undersuit wrote:
| Point of Order! It's not L4, it's a ram cache. Data from L1-L3
| isn't stored there, only what you have written to or read from
| ram.
|
| Your working set of data won't spill out from the L3 to "the
| L4" when it grows too large.
| Aissen wrote:
| I'm not sure I understand the difference. Are we both talking
| about the "HBM Caching Mode" on this slide:
| https://www.servethehome.com/intel-xeon-max-9480-deep-
| dive-i... ?
| undersuit wrote:
| The memory caching, AFAIK, exist on different sides of
| explicit load and store instructions than CPU caching. It
| introduces subtle issues. You can see on page 4 a number of
| cache friendly benchmarks show little benefit for the HBM
| caching: https://www.servethehome.com/wp-
| content/uploads/2023/09/Inte...
|
| 64GB of HBM2e as ram is more performant than 128GB of DDR5
| with HBM2e cache, and often the cached variant has no
| speedup compared to a standard Intel configuration.
|
| Also OpenFOAM loves cache: https://www.phoronix.com/benchma
| rk/result/amd_ryzen_7_5800x3...
| Aissen wrote:
| I'd be curious to know more about the subtle issues
| (which I don't doubt there might be!).
|
| IMHO those results don't contradict the what I said. Of
| course if the workload entirely fits in the 64GB HBM,
| there's no point in using it as a cache, just use it
| directly. But if you need to address more RAM(any big DB,
| fs, etc.), and you don't want to manage the tier
| manually, then the caching mode could shine.
| bluedino wrote:
| Our Dell rep really talked this up a few months ago, but didn't
| have any benchmarks.
|
| Impressed with OpenFOAM results, as that's a typical workload for
| our users. However, the AMD system is basically equal.
| mikeInAlaska wrote:
| What a beast. It doesn't apply much to my life, but I did notice
| one thing that would have been nice on several builds in the
| past. The torque for the heatsink screws is specified in exact
| lb/feet.
|
| I have made builds in the past where you had to judge, tight
| enough? I thought it was unsettling. I believe my last couple
| builds had a more cam-lock feel to the heatsink though where it
| was tightened to a point where it had an obvious force-threshold
| stop to it.
| mrob wrote:
| Noctua heatsinks include maximum torque specifications in the
| instruction manuals, in Nm. I expected this to be standard, but
| I checked the manuals of popular high-end models from other
| manufacturers (Cooler Master, DeepCool, Be Quiet, Akasa), and I
| was unable to find torque specifications.
| omneity wrote:
| How can you make use of this info? Like which tool allows you
| to tighten down to a specific torque?
| bogdanstanciu wrote:
| Torque wrenches let you set a max torque and will spin
| freely+ no longer tighten after they reach it
| mikeInAlaska wrote:
| I have a Klein Torque screwdriver. Holy cow have they become
| expensive in the decade since I bought it.
| mrob wrote:
| Torque screwdriver. You can get digital motor-driven ones,
| and ones that work like torque wrenches, with a clutch that
| releases once you exceed the torque setting. Torque
| screwdrivers have a limited range of supported torques, and
| computer heatsink fasteners generally need fairly low torque,
| so be sure to get one that goes low enough.
| mkaic wrote:
| I just completed a few AMD Threadripper-based builds at work
| and was pleasantly surprised to see that each CPU actually
| shipped with its own torque-screwdriver tuned to the specific
| torque needed to install the chip.
| yieldcrv wrote:
| we just saying AI accelerator now?
| sp332 wrote:
| If your Graphics Processing Unit isn't actually processing any
| graphics, it seems like a better name. No one is gaming on an
| A100 (although now I want to see that!)
| circuit10 wrote:
| I think Linus Tech Tips tried
| aquir wrote:
| ESET Endpoint Security is blocking the site: JS/Agent.RAN threat
| found.
| [deleted]
| danielovichdk wrote:
| Funny enough, no matter how big these things get, they never seem
| to make me a lot more productive.
|
| If my CI build time at least would go down.
|
| Software is so slow compared to hardware. It's embarrassing that
| we haven't moved not even a hundredth of what hardware has the
| last 30 years.
|
| Why get this?
| redox99 wrote:
| > Funny enough, no matter how big these things get, they never
| seem to make me a lot more productive.
|
| Maybe because the stuff you do isn't bottlenecked by compute?
| In my case every hardware upgrade resulted in a big
| productivity improvement.
|
| Better CPU: Cut my C++ build times in half (from 10 to 5
| minutes if I change an important .h)
|
| Better GPUs: Cut my AI training time by a few X, massively
| improving iteration times. Also allow me to run bigger models
| that more easily reach my target accuracy.
| switchbak wrote:
| I get your take here, but as someone who's worked very hard at
| times to optimize builds (amongst other things), the business
| just generally doesn't respect those efforts and certainly
| doesn't reward them. Often times they're actively punished with
| a reflexive assumption that they're not "serious" efforts worth
| the time of the business. (There's the odd exception, but this
| is very widespread in my experience)
|
| Sure, there's a balance to be made between cutting wood and
| sharpening the saw. Who do we blame when the boss-man won't
| allow anyone to sharpen the tools even though we're obviously
| wasting outrageous amounts of time? You blame the people that
| won't allow those investments to be made.
|
| When you multiply that across an entire industry, add some
| trendy fashionable tech (that's also just fast-enough to be
| tolerable), and this is how we end up in the shitty
| circumstance you describe.
|
| And yet I still wouldn't trade my fancy IDE and slow CI
| pipelines for a copy of Turbo Pascal 7, as fast as it would be!
| jiggawatts wrote:
| People get set in their ways. Sometimes entire industries need
| a shake-up.
|
| This never occurs voluntarily, and people will wail and thrash
| about even as you try to help them get out of their rut.
|
| Builds being slow is one of my pet peeves also. Modern "best
| practices" are absurdly wasteful of the available computer
| power, but because everyone does it the same way, nobody seems
| to accept that it can be done differently.
|
| A typical modern CI/CD pipeline is like a smorgasboard of
| worst-case scenarios for performance. Let me list just _some_
| of them:
|
| - Everything is typically done from scratch, with minimal or no
| caching.
|
| - Synchronous I/O from a single thread, often to a remote
| cloud-hosted replicated disk... for an ephemeral build job.
|
| - Tens of thousands of tiny files, often smaller than the
| physical sector size.
|
| - Layers upon layers of virtualisation.
|
| - Many small HTTP downloads, from a single thread with no
| pipelining. Often un-cached despite being identified by stable
| identifiers -- and hence infinitely cacheable safely.
|
| - Spinning up giant, complicated, multi-process workflows for
| trivial tasks such as file copies. (CD agent -> shell -> cp
| command) Bonus points for generating more kilobytes of logs
| than the kilobytes of files processed.
|
| - Repeating the same work over and over (C++ header
| compilation).
|
| - Generating reams of code only to shrink it again through
| expensive processes or just throw it away (Rust macros).
|
| I could go on, but it's too painful...
| jrockway wrote:
| A few weeks ago, I decided to lock myself in my apartment and
| write a build system. I have always felt like developers wait
| ages for Docker, and I wanted to see why. (Cycle times on the
| app I develop are a minute for my crazy combination of shell
| scripts that I use, or up to 5 minutes for what most of my
| teammates do. This is incomprehensibly unproductive.)
|
| It turns out, it's all super crazy at every level. Things
| like Docker use _incredibly_ slow algorithms like SHA256 and
| gzip by default. For example, it takes 6 seconds to gzip a
| 150MB binary, while zstd -fast=3 achieves the same ratio and
| does it in 100 milliseconds! The OCI image spec allows
| Zstandard compression, so this is something you can just do
| to save build time and container startup time. (gzip,
| unsurprisingly, is not a speed demon when decompressing
| either.) SHA256, used everywhere in the OCI ecosystem, is
| also glacial; a significant amount of CPU used by starting or
| building containers is just running this algorithm. Blake3 is
| 17 times faster! (Blake2b, a fast and more-trusted hash than
| blake3, is about 6x faster.) But unfortunately, Docker /OCI
| only support SHA256, so you are stuck waiting every time you
| build or pull a container. (On the building side, you
| actually have to compute layer SHA256s twice; once for the
| compressed data and once for the uncompressed data. I don't
| know what happens if you don't do this, I just filled in
| every field the way the standard mandated and things worked.)
|
| This was on HN a couple years ago and was a real eye opener
| for me:
| https://jolynch.github.io/posts/use_fast_data_algorithms/
|
| There are also things that Dockerfiles preclude, like
| building each layer in parallel. I don't use layers or shell
| commands for anything; I just put binaries into a layer. With
| a builder that doesn't use Dockerfiles, you can build all the
| layers in parallel and push some of the earlier layers while
| the later ones are building. (One of the reasons I wrote my
| own image assembler is because we produce builds for each
| architecture. The build machine has to run an arm64 qemu
| emulator so that a Dockerfile-based build can run `[` to
| select the right third-party binary to extract. This is crazy
| to me; the decision is static and unchanging, so no code
| needs to be run. But I know that it's designed for stuff like
| "FROM debian; RUN apt-get update; RUN apt-get upgrade" which
| is ... not needed for anything I do.)
|
| The other thing that surprises me about pushing images is how
| slow a localhost->localhost container push is. I haven't
| looked into why because the standard registry code makes me
| cry, but I plan to just write a registry that stores blobs on
| disk, share the disk between the build environment and the
| k8s cluster (hostpath provisioner or whatever), and have the
| build system just write the artifacts it's building into that
| directory; thus there is no push step required. When the
| build is complete, the artifacts are available for k8s to
| "pull".
|
| The whole thing is a work in progress, but with a week of
| hacking I got the cycle time down from 1 minute to about 5
| seconds, and many more improvements are available.
| (Eventually I plan to build everything with Bazel and remote
| execution, but I needed the container builder piece for
| multi-architecture releases; Bazel will have to be invoked
| independently for each architecture because of its design,
| and then the various artifacts have to be assembled into the
| final image list.)
| oconnor663 wrote:
| > Blake3 is 17 times faster! (Blake2b, a fast and more-
| trusted hash than blake3, is about 6x faster.)
|
| I'm a little surprised to see that 6x figure. Just going
| off the red bar chart at blake2.net, I wouldn't expect to
| see much more than a 2x difference, unless you're measuring
| a sub-optimal SHA256 implementation. And recent x86 CPUs
| have hardware acceleration for SHA256, which makes it
| faster than BLAKE2b. But those CPUs also have wide vector
| registers and lots of cores, so BLAKE3's relative advantage
| tends to grow even as BLAKE2b falls behind.
|
| But in any case, yes, builds and containers tend to be
| great use cases for BLAKE3. You've got big files that are
| getting hashed over and over, and they're likely to be in
| cache. An expensive AWS machine can hit crazy numbers like
| 100 GB/s on that sort of workload, where the bottleneck
| ends up being memory bandwidth rather than CPU speed.
| jrockway wrote:
| Yeah, I'm not using any special implementation of sha256.
| Just crypto/sha256 from the standard library. It's also
| worth noting that I'm testing on a zen2 chip. I have read
| a lot of papers saying that sha2 is much faster because
| CPUs have special support; didn't see it in practice. (No
| AVX-512 here, but I believe those algorithms predate
| AVX-512 anyway.)
| api wrote:
| We've gotten very good at using containerization and
| virtualization to isolate and abstract away ugliness, and as
| a result we have been able to build unspeakable towers of
| horror that would have been impossible without these
| innovations.
| mschuster91 wrote:
| > Builds being slow is one of my pet peeves also. Modern
| "best practices" are absurdly wasteful of the available
| computer power, but because everyone does it the same way,
| nobody seems to accept that it can be done differently.
|
| A lot of what you describe originates from lessons learned in
| "classic" build environments:
|
| - broken (or in some cases, regular) build attempts leaving
| files behind that confuse later build attempts (e.g. because
| someone forgot to do a git clean before checkout step)
|
| - someone hot-fixing something on a build server which never
| got documented and/or impacted other builds, leading to long
| and weird debugging efforts when setting up more build
| servers
|
| - its worse counterpart, someone setting up the environment
| on a build server and never documenting it, leading to
| serious issues when that person inevitably left and then
| something broke
|
| - OS package upgrades breaking things (e.g. Chrome/FF
| upgrades and puppeteer using it), and a resulting reluctancy
| in upgrading build servers' software
|
| - attackers hacking build systems because of vulnerabilities,
| in the worst case embedding malware into deliverables or
| stealing (powerful) credentials
|
| - colliding versions of software stacks / libraries / other
| dependencies leading to issues when new projects are to be
| built on old build servers
|
| In contrast to that, my current favourite system of running
| GitLab CI on AWS EKS with ephemeral runner pods is orders of
| magnitude better:
|
| - every build gets its own fresh checkout of everything, so
| no chance of leftover build files or an attacker persisting
| malware without being noticed (remember, everything comes out
| of git)
|
| - no SSH or other access to the k8s nodes possible
|
| - every build gets a _reproducible_ environment, so when
| something fails in a build, it 's trivial to replicate
| locally, and all changes are documented
| gpderetta wrote:
| You forgot: - you can get lunch while you wait for your
| build to finish.
| mschuster91 wrote:
| Not if your pipeline is decent. Parallelize and cache as
| much as you can.
|
| The only stack that routinely throws wrenches into
| pipeline optimization is Maven. I'd love to run, say,
| Sonarqube, OWASP Dependency Checker, regular unit tests
| and end-to-end tests in parallel in different containers,
| but Maven - even if you pass the entire target / */target
| folders through - insists on running all steps prior to
| the goal you attempt to run. It's not just dumb and slow,
| it makes the runner images large AF because they have to
| carry _everything_ needed for all steps in one image,
| including resources like RAM and CPU.
| ilyt wrote:
| Right but build environment are awfully stupid about it.
| Re-downloading deps _when they did not change_ is utterly
| wasteful that achieves nothing, same as re-compiling stuff
| that did not change.
|
| > broken (or in some cases, regular) build attempts leaving
| files behind that confuse later build attempts (e.g.
| because someone forgot to do a git clean before checkout
| step)
|
| But thanks to CI you will never fix such broken build
| system!.
| justinclift wrote:
| > OS package upgrades breaking things
|
| Heh. docker.io package on Ubuntu did this recently, whereby
| it stopped honouring the "USER someuser" clause in
| Dockerfiles. Completely breaks docker builds.
|
| No idea if it's fixed yet, we just updated our systems to
| not pull in docker.io 20.10.25-0ubuntu1~20.04. or newer.
| ilyt wrote:
| Docker developers being clueless, what else is new...
| slowmovintarget wrote:
| Indeed: Optimize to reduce developer time spent on bad
| builds first.
|
| One of the rules of this approach is to filter all
| extraneous variable input likely to disrupt the build
| results, especially and including artifacts from previous
| failed builds.
| jiggawatts wrote:
| You're the perfect example of the cranky experienced
| developer stuck in their ways and fighting against better
| solutions.
|
| Most of the issues you've described are consequences of
| _yet more issues_ such as not caching with the correct
| cache key.
|
| I argue that all of the problems are eminently solvable.
| You're arguing for leaving massive issues in the system
| because of... other massive issues. Only one of these two
| approaches to problem solving gets to a solution without
| massive problems.
| jayd16 wrote:
| Yeah, I remember compressing 4k120hz video of my AI upscaled
| ray traced RPG play through in real time and streaming it to
| all my friends' phones in the 90s. Times never change.
|
| But really, it's easy to forget the massive leaps we make every
| year.
| josephg wrote:
| Maybe I'm just getting old, but diablo 4 doesn't look that
| much better to my eyes than diablo 3 did. I'm sure the
| textures are higher resolution and so on, but 5 minutes in I
| don't notice or care. Its how the game plays that matters,
| and that has almost never been hardware limited.
|
| That said, I'm really looking forward to the day we can embed
| LLMs and generative AI into video games for like world
| generation & dialog. I can't wait to play games where on
| startup, the game generates truly unique cities for me to
| explore filled with fictional characters. I want to wander
| around unique worlds and talk - using natural language - with
| the people who populate them.
|
| I'm giddy with excitement over the amazing things video games
| can bring in the next few years. I feel like a kid again
| looking forward to christmas.
| hypercube33 wrote:
| there was a video posted to hacker news about core windows
| apps taking longer to launch on new enough hardware. Stuff
| like calc, word, paint taking longer to start on win11 vs win
| 2000 even though machines have much faster everything so I
| think the point stands - why is everything slower now?
| moffkalast wrote:
| https://en.wikipedia.org/wiki/Wirth%27s_law
| joseph_grobbles wrote:
| It's a variation of Parkinson's Law -- we just keep expanding
| what we are doing to fit the hardware available to us, then
| claiming that nothing has changed.
|
| CI is a fairly new thing. The idea of constantly doing all that
| compute work again and again was unfathomable not that long ago
| for most teams. We layer on and load in loads of ancillary
| processes, checks, lints, and so on, because we have the
| headroom. And then we reminisce about the days when we did a
| bi-monthly build on a "build box", forgetting how minimalist it
| actually was.
| jebarker wrote:
| This is true, but there's still choice in how we expand and
| the default seems to be to do it as wastefully as possible.
| pbjtime wrote:
| It's what the economy rewards. Simple as that
| jebarker wrote:
| I don't totally agree. It's what a short-term view of the
| economy rewards for sure. But even if that was the only
| view of the economy I've seen plenty of low-performance
| software written purely out of cargo culting and/or
| inability or lack of will to do anything better.
| Aurornis wrote:
| > Software is so slow compared to hardware. It's embarrassing
| that we haven't moved not even a hundredth of what hardware has
| the last 30 years
|
| I don't understand this mentality. What, exactly, did you
| expect to get faster? If you run the same software on older
| hardware it's going to be much slower. We're just doing more
| because we can now.
|
| From my perspective, things are pretty darn fast on modern
| hardware compared to what I was dealing with 5-10 years ago.
|
| I had embedded systems builds that would take hours and hours
| on my local machine years ago. Now they're done in tens of
| minutes on my machine that uses a relatively cheap consumer
| CPU. I can clean build a complete kernel in a couple minutes.
|
| In my text editor I can do a RegEx search across large projects
| and get results nearly instantly! Having NVMe SSDs and high
| core count consumer CPUs makes amazing things possible.
|
| Software is improving, too. Have you seen how fast the new Bun
| package manager is? I can pick from dozens of open source
| database options for different jobs that easily enable high
| performance, large scale operations that would have been
| unthinkable or required expensive enterprise software a decade
| ago (even with today's hardware).
|
| > Why get this?
|
| If you really think nothing has improved, you might be
| experiencing a sort of hedonic adaptation: Every advancement
| gets internalized as the new baseline and you quickly forget
| how slow things were previously.
|
| I remember the same thing happened when SSDs came out: They
| made an amazing improvement in desktop responsiveness over
| mechanical HDDs, but many people almost immediately forgot how
| slow HDDs were. It's only when you go back and use a slow HDD-
| based desktop that you realize just how slow things were in the
| past.
| josephg wrote:
| > We're just doing more because we can now.
|
| Are we though?
|
| Our computers are orders of magnitude faster than they were.
| What new features justify consuming 100x or more CPU, RAM,
| network and disk space?
|
| Is my email software doing orders of magnitude more work to
| render an email compared to the 90s? Does discord have _that_
| many more features compared to mIRC that it makes sense for
| it to take several seconds to open on my 8 core M1 laptop?
| For reference, mIRC was a 2mb binary and I swear it opened
| faster on my pentium 2 than discord takes to open on my 2023
| laptop. By the standards of 1995 we all walk around with
| supercomputers in our pockets. But you wouldn 't know it,
| because the best hardware in the world still can't keep pace
| with badly written software. As the old line from the 1990s
| goes, "what Andy giveth, Bill taketh away."[1] (Andy Grove
| was CEO at the time of Intel.)
|
| My instinct is that as more and more engineers work "up the
| stack" we're collectively forgetting how to write efficient
| code. Or just not bothering. Why optimize your react web app
| when everyone will have new phones in a few years with more
| RAM? If the users complain, blame them for having old
| hardware.
|
| I find this process deeply disrespectful to our users. Our
| users pay thousands of dollars for good computer hardware
| because they want their computer to run fast and well. But
| all of that capacity is instead chewed through by developers
| the world over trying to save a buck during development time.
| Every hardware upgrade our users make just becomes the new
| baseline for how lazy we can be.
|
| Slow CI/CD pipelines are a completely artificial problem.
| There's absolutely no technical reason that they need to run
| so slowly.
|
| [1] https://en.wikipedia.org/wiki/Andy_and_Bill%27s_law
| rwmj wrote:
| And yet, just a few minutes ago (the time it took to reboot
| my laptop), I clicked open the "Insert" menu in a Google doc
| and the machine hung. Even a 128k Macintosh with a 68000 CPU
| could handle that.
| sp332 wrote:
| ClarisWorks would regularly hang my Mac Classic II. And
| with cooperative multitasking, I had to reboot the machine.
| Almondsetat wrote:
| why is opening word or excel or powerpoint not instantaneous?
| plenty of people, me included, have to sieve through vast
| amounts of document and constantly open/close them
| JohnBooty wrote:
| Application "bloat" is one obvious thing, but operating
| systems are also doing a lot more "work" to open an app
| these days -- address space randomization, checking file
| integrity, checking for developer signing certs,
| mitigations against hardware side channel attacks like
| rowhammer or whatever, etc.
|
| Those things aren't free. But, I don't know the relative
| performance hit there compared to good old software bloat.
| soulbadguy wrote:
| > address space randomization, checking file integrity,
| checking for developer signing certs, mitigations against
| hardware side channel attacks like rowhammer or whatever,
| etc.
|
| OS/Application have been slower way before any of those
| were a thing.
| mejutoco wrote:
| Maybe they are waiting to make them right before making
| them fast? /s
|
| (Make it work, make it right, make it fast)
| [deleted]
| joseph_grobbles wrote:
| This is a really terribly written article. Sorry for the
| negativity, but it was tough to try to dig through.
|
| For those confused by the title, Intel released a Xeon that
| includes 64GB of high speed RAM on the chip itself, configurable
| as either primary or pooled memory, or a memory subsystem caching
| layer.
| mutagen wrote:
| I suspect it was the video transcript or a lightly edited
| version of the transcript.
| CobaltFire wrote:
| I came here to say the same thing; usually StH is very
| readable. This has to either be a video transcript or some AI
| pruning (maybe a combination) because it reads absolutely
| horribly.
| neolefty wrote:
| Just as gaming hardware has taken over compute, gaming
| journalism must take over reporting!
| denton-scratch wrote:
| I also found it hard to read; I wanted to know how one might
| use this thing, but instead I learned all about channels,
| "wings", backplanes, and saw a lot of tables and photos that
| seem to be largely duplicates. An entire page (out of 5 pages)
| was dedicated to examining the development system.
| ipython wrote:
| I couldn't figure out what the _actual_ measured memory bandwidth
| to the onboard HBM2e memory chiplets are- how does it compare to,
| say, Apple 's M2 Ultra?
| opencl wrote:
| According to this paper[0] ~1600GB/s raw memory bandwidth,
| ~590GB/s in practice. I haven't seen any actual benchmarks of
| M2 Ultra CPU bandwidth, but the M1 Ultra has been benchmarked
| at ~240GB/s[1].
|
| [0]
| https://www.ixpug.org/images/docs/ISC23/McCalpin_SPR_BW_limi...
|
| [1]
| https://macperformanceguide.com/blog/2022/20220618_1934-Appl...
| doctorpangloss wrote:
| > The cores used in the Xeon Max processors do not support
| enough concurrency to reach full HBM bandwidth with the
| current core counts. 2x more concurrency or 2x more cores.
|
| is a good punchline from the report. The Apple cores have
| that problem but a lot worse. They are slow. It goes back to
| this flawed idea in the community that you can "just" "add"
| "more memory," when parts like the H100 have their memory
| size matched to the architecture (physical and software) of
| the dozens of CPUs on them.
|
| I'm not sure why the conception persists that a 60W laptop
| part would be comparable to a 300W server part of the same
| process generations, let alone this particular part.
| [deleted]
| foota wrote:
| That's insane.
| JacobiX wrote:
| The M1 Ultra has 800GB/s of memory bandwidth, on contrast HBM2E
| has 204.8 Gbps x 2 = 409.6 Gb/s
| consp wrote:
| Is that available to any core or just a sum-all-up and has
| latency penalties when going around?
| hypercube33 wrote:
| depends on NUMA node config so I believe this is combined
| on the whole chip if all cores are working on threads with
| their local 16GB HBM, theoretically.
| wmf wrote:
| GPUs use 4-6 stacks of HBM2 which is 1,840-2,760 GB/s. It's
| 2x-3x the bandwidth of M1/M2 Ultra.
| Brian-Puccio wrote:
| So 6,400 Gb/s compared to 409.6 Gb/s once you convert units?
| ftxbro wrote:
| according to google it costs thirteen thousand dollars
| Aissen wrote:
| It's less expensive than the public price of the biggest
| Genoa-X with 1.15GB of L3 : 9684X at $14k+.
| yvdriess wrote:
| That's high end Xeon for you. At least here you can argue it
| can save you from buying a GPU.
| varelse wrote:
| NUMAAF I guess, but it doesn't seem like they did anything
| interesting with it yet.
| RobotToaster wrote:
| Ouch, that's about the same as two A100s, any idea how it would
| compare?
| Aissen wrote:
| If you buy 40GB A100s, maybe, but LLMs have pushed the price
| of the 80GB A100s in the same range (they're more expensive
| now than at launch(!)).
| llm_nerd wrote:
| While the title bizarrely references GPUs, this is a normal
| general computation CPU that has 64GB of very high speed
| memory on the chip itself.
| luma wrote:
| That's datacenter compute for you, and they'll sell a bunch of
| them. One simple reason (of many) - a lot of enterprise
| licenses are tied to CPU count and those licenses can be
| extremely expensive. You can save money on the licenses by
| buying fewer, higher speed CPUs, and in many cases it can
| offset the increase in CPU cost.
| ftxbro wrote:
| > a lot of enterprise licenses are tied to CPU count and
| those licenses can be extremely expensive.
|
| if this pricing system is making an inefficient market for
| cpus then maybe it could be disrupted somehow
| bhouston wrote:
| Intel took on Apple's Max and Ultra nomenclature I see.
| usrusr wrote:
| I wonder if it might be a deliberate tip of the hat smuggled
| into the creative process by engineering: if I was at Intel I'd
| _desperately_ hope that much of the Apple Silicon performance
| comes from their on-module memory and this HBM2 monster looks
| exactly like something you might come up with if that hope got
| you thinking.
| tubs wrote:
| Apple uses the same memory packaging every mobile soc uses
| (and has used some flip chip and pop arrived). There is
| nothing super special about the ram or how it's wired. This
| myth has to cease.
|
| The only thing you can say about the larger chips is they
| have much wider channels, but they are still just regular
| DDRs.
___________________________________________________________________
(page generated 2023-09-21 23:01 UTC)