[HN Gopher] El Capitan: New Supercomputer Is the Fastest in the ...
       ___________________________________________________________________
        
       El Capitan: New Supercomputer Is the Fastest in the World
        
       Author : rbanffy
       Score  : 47 points
       Date   : 2024-11-19 20:52 UTC (2 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | Melatonic wrote:
       | Anybody know what "Inertial Confinement Fusion" is in the
       | referenced article?
        
         | JumpCrisscross wrote:
         | > _what "Inertial Confinement Fusion" is_
         | 
         | The experimental fusion approach used by the NIF [1][2].
         | 
         | It's conveniently simultaneously an approach to fusion power, a
         | way to study fusion plasmas and a tiny nuclear explosion.
         | 
         | [1] https://en.wikipedia.org/wiki/Inertial_confinement_fusion
         | 
         | [2] https://en.wikipedia.org/wiki/National_Ignition_Facility
        
       | MobiusHorizons wrote:
       | > El Capitan, housed at Lawrence Livermore National Laboratory in
       | Livermore, Calif., can perform over 2700 quadrillion operations
       | per second at its peak. The previous record holder, Frontier,
       | could do just over 2000 quadrillion peak operations per second.
       | 
       | > El Capitan uses AMD's MI300a chip, dubbed an accelerated
       | processing unit, which combines a CPU and GPU in one package. In
       | total, the system boasts 44,544 MI300As, connected together by
       | HPE's Slingshot interconnects.
       | 
       | Seems like a nice win for AMD.
        
         | alephnerd wrote:
         | > Seems like a nice win for AMD
         | 
         | Yep! They've been part of the Exascale project for a long time,
         | and it's good to see their commitment on HPC actually succeeded
         | unlike Intel's during the same time period.
        
       | einpoklum wrote:
       | So, they built this supercomputer to test new and more deadly
       | nuclear weapons. That makes me so "happy". I am absolutely not
       | worried about two nuclear powers being close to the brink of
       | direct war, even as we speak; nor about the abandonment of the
       | course of nuclear disarmament treaty; nor about the repeated talk
       | of a coming war against certain Asian powers. Everything is great
       | and I'll just fawn over the colorful livery and the petaflops
       | figure.
        
         | comboy wrote:
         | I'd guess it's unlikely to be the real use case. The real one
         | is classified. Plus it's not like more deadly nuclear weapons
         | would change anything, we can do bad enough with what we
         | already have.
        
           | JumpCrisscross wrote:
           | > _it 's unlikely to be the real use case. The real one is
           | classified._
           | 
           | What are you basing this on?
           | 
           | > _it 's not like more deadly nuclear weapons would change
           | anything_
           | 
           | We haven't been chasing yield in nuclear weapons since the
           | 60s.
           | 
           | Our oldest warheads date from the 60s [1]. For obvious
           | reasons, the experimental track record on half-century old
           | pits is scarce. We don't know if novel physics or chemistry
           | is going on in there, and we don't want to be the second ones
           | to find out.
           | 
           | [1] https://en.wikipedia.org/wiki/B61_nuclear_bomb
        
           | alephnerd wrote:
           | > I'd guess it's unlikely to be the real use case
           | 
           | I can safely say that nuclear simulations are one of the
           | major drivers for HPC research globally.
           | 
           | It is not the only one (genomics, simulations, fundamental
           | research are also major drivers) but it is a fairly prominent
           | one.
        
           | realo wrote:
           | Maybe there is research not on bigger bangs, but on smaller
           | packages?
           | 
           | Think about a baseball-size device able to take out a city
           | block.
           | 
           | Then think about an escadron of drones able to transport
           | those baseballs to very precise city blocks...
        
         | JumpCrisscross wrote:
         | > _they built this supercomputer to test new and more deadly
         | nuclear weapons_
         | 
         | If you are afraid of nuclear war, the thing to fear is a
         | nuclear state's capacity to retaliate being questioned. These
         | supercomputers are the alternative to live tests. Taking them
         | away doesn't poof nuclear weapons, it means you are left with a
         | half-assed deterrent or must resume live tests.
         | 
         | > _the abandonment of the course of nuclear disarmament treaty_
         | 
         | North Korea, the American interventions in the Middle East and
         | Ukraine set the precedent that nuclear sovereignty is in a
         | separate category from the treaty-enforced kind. Non-
         | proliferation won't be made or broken on the back of aging,
         | degrading weapons.
         | 
         | > _repeated talk of a coming war against certain Asian powers_
         | 
         | One invites war by refusing to prepare for it.
        
         | rbanffy wrote:
         | The whole point of testing (and making) deadly nuclear weapons
         | is to ensure they are never used again. The Mutually Assured
         | Destruction doctrine has kept us alive through the darkest pf
         | the Cold War (also keeping the Cold War cold). In order to
         | credibly threaten anyone who tries to annihilate you with
         | certain annihilation is with lots of such doomsday weapons. We
         | have lived in this Mexican standoff for longer than we
         | remember.
        
           | postalrat wrote:
           | Are are living in the darkest days of the cold war right now.
        
         | shagie wrote:
         | I would reference an older article on super computers and the
         | nuclear weapon arsenal.
         | 
         | https://www.techtarget.com/searchdatacenter/news/252468294/C...
         | 
         | > "The Russians are fielding brand new nuclear weapons and
         | bombs," said Lisa Gordon-Hagerty, undersecretary for nuclear
         | security at the DOE. She said "a very large portion of their
         | military is focused on their nuclear weapons complex."
         | 
         | > It's the same for China, which is building new nuclear
         | weapons, Gordon-Hagerty said, "as opposed to the United States,
         | where we are not fielding or designing new nuclear weapons. We
         | are actually extending the life of our current nuclear weapons
         | systems." She made the remarks yesterday in a webcast press
         | conference.
         | 
         | > ...
         | 
         | > Businesses use 3D simulation to design and test new products
         | in high performance computing. That is not a unique capability.
         | But nuclear weapon development, particularly when it involves
         | maintaining older weapons, is extraordinarily complex,
         | Goldstein said.
         | 
         | > The DOE is redesigning both the warhead and nuclear delivery
         | system, which requires researchers to simulate the interaction
         | between the physics of the nuclear system and the engineering
         | features of the delivery system, Goldstein said. He
         | characterized the interaction as a new kind of problem for
         | researchers and said 2D development doesn't go far enough. "We
         | simply can't rely on two-dimensional simulations -- 3D is
         | required," he said.
         | 
         | > Nuclear weapons require investigation of physics and
         | chemistry problems in a multidimensional space, Goldstein said.
         | The work is a very complex statistical problem, and Cray's El
         | Capitan system, which can couple this computation with machine
         | learning, is ideally suited for it, he said.
         | 
         | ---
         | 
         | This isn't designing new ones. Or blowing things up (
         | https://www.reuters.com/article/us-usa-china-nuclear/china-m...
         | ) to see if they still work. It is simulating them to have the
         | confidence that they still work - and that the adversaries of
         | the US know that the scientists are confident that they still
         | work without having to blow things up.
        
           | JumpCrisscross wrote:
           | > _to see if they still work. It is simulating them to have
           | the confidence that they still work_
           | 
           | The Armageddon scenario is some nuclear states conduct
           | stockpile stewardship, some don't, and those who do discover
           | that warheads come with a use-by date.
        
         | freeone3000 wrote:
         | Eh, we have all the nukes we need and we already know how to
         | build them. This is going to help more with fusion _power_ than
         | fusion _explosives_.
        
         | theideaofcoffee wrote:
         | I'd rather have a few supercomputers doing stockpile
         | stewardship over being tested live. As much as I hate it
         | personally, these weapons are a part of our society for better
         | or for worse until we (as in the people) decide they won't be
         | by electing those that will help dismantle the programs. They
         | should be maintained and these tools help in that.
        
       | olao99 wrote:
       | I fail to understand how these nuclear bomb simulations require
       | so much compute power.
       | 
       | Are they trying to model every single atom?
       | 
       | Is this a case where the physicists in charge get away with
       | programming the most inefficient models possible and then the
       | administration simply replies "oh I guess we'll need a bigger
       | supercomputer"
        
         | TeMPOraL wrote:
         | Pot, meet kettle? It's usually the industry that's leading with
         | "write inefficient code, hardware is cheaper than dev time"
         | approach. If anything, I'd expect a long-running physics
         | research project to have well-optimized code. After all, that's
         | where all the optimized math routines come from.
        
         | bongodongobob wrote:
         | My brother in Christ, it's a supercomputer. What an odd
         | question.
        
         | CapitalistCartr wrote:
         | It's because of the way the weapons are designed, which
         | requires a CNWDI clearance to know, so your curiosity is not
         | likely to be sated.
        
           | nordsieck wrote:
           | > It's because of the way the weapons are designed, which
           | requires a CNWDI clearance to know, so your curiosity is not
           | likely to be sated.
           | 
           | While that's true, the information that is online is
           | surprisingly detailed.
           | 
           | For example, this series "Nuclear 101: How Nuclear Bombs
           | Work"
           | 
           | https://www.youtube.com/watch?v=zVhQOhxb1Mc
           | 
           | https://www.youtube.com/watch?v=MnW7DxsJth0
        
             | CapitalistCartr wrote:
             | Having once had said clearance limits my answers.
        
         | p_l wrote:
         | It literally requires simulating each subatomic particle,
         | individually. The increases of compute power have been used for
         | twin goals of reducing simulation time (letting you run more
         | simulations) and to increase the size and resolution.
         | 
         | The alternative is to literally build and detonate a bomb to
         | get empirical data on given design, which might have problems
         | with replicability (important when applying the results to rest
         | of the stockpile) or how exact the data is.
         | 
         | And remember that there is more than one user of every
         | supercomputer deployed at such labs, whether it be multiple
         | "paying" jobs like research simulations, smaller jobs run to
         | educate, test, and optimize before running full scale work,
         | etc.
         | 
         | AFAIK for considerable amount of time, supercomputers run more
         | than one job at a time, too.
        
           | pkaye wrote:
           | Are they always designing new nuclear bombs? Why the ongoing
           | work to simulate?
        
             | danhon wrote:
             | It's also to check that the ones they have will still work,
             | now that there are test bans.
        
             | dekhn wrote:
             | The euphemistic term used in the field is "stockpile
             | stewardship", which is a catch-all term involving a wide
             | range of activities, some of them forward-looking.
        
             | p_l wrote:
             | Because even normal explosives degenerate over time, and
             | fissile material in nuclear devices is even worse about it
             | - remember that unstable elements are ongoing constant
             | fission events, critical mass is just one where they
             | trigger each others' fission fast enough for runaway
             | process.
             | 
             | So in order to verify that the weapons are still useful and
             | won't fail in random ways, you have to test them.
             | 
             | Which either involves actually exploding them (banned by
             | various treaties that have enough weight that even USA
             | doesn't break them), or numerical simulations.
        
             | AlotOfReading wrote:
             | Multiple birds with one stone.
             | 
             | * It's a jobs program to avoid the knowledge loss created
             | by the end of the cold war. The US government poured a lot
             | of money into recreating the institutional knowledge needed
             | to build weapons (e.g. materials like FOGBANK) and it's
             | preferred to maintain that knowledge by having people work
             | on nuclear programs that aren't quite so objectionable as
             | weapon design.
             | 
             | * It helps you better understand the existing weapons
             | stockpiles and how they're aging.
             | 
             | * It's an obvious demonstration of your capabilities and
             | funding for deterrence purposes.
             | 
             | * It's political posturing to have a big supercomputer and
             | the DoE is one of the few agencies with both the means and
             | the motivation to do so publicly. This has supposedly been
             | a major motivator for the Chinese supercomputers.
             | 
             | There's all sorts of minor ancillary benefits that come out
             | of these efforts too.
        
             | colonCapitalDee wrote:
             | Basically yes, we are always designing new nuclear bombs.
             | This isn't done to increase yield, we've actually been
             | moving towards lower yield nuclear bombs ever since the mid
             | Cold War. In the 60s the US deployed the B41 bomb with a
             | maximum yield of 25 megatons, making it the most powerful
             | bomb ever deployed by the US. When the B41 was retired in
             | the late 70s, the most powerful bomb in the US arsenal was
             | the B53 with a yield of 9 megatons. The B53 was retired in
             | 2011, leaving the B83 as the most powerful bomb in the US
             | arsenal with a yield of only 1.2 megatons.
             | 
             | There are two kinds of targeting that can be employed in a
             | nuclear war: counterforce and countervalue. Counterforce is
             | targeting enemy military installations, and especially
             | enemy nuclear installations. Countervalue is targeting
             | civilian targets like cities and infrastructure. In an all
             | out nuclear war counterforce targets are saturated with
             | nuclear weapons, with each target receiving multiple
             | strikes to hedge against the risks of weapon failure,
             | weapon interception, and general target survival due to
             | being in a fortified underground positions. Any weapons
             | that are not needed for counterforce saturation strike
             | countervalue targets. It turns out that having a yield
             | greater than a megaton is basically just overkill for both
             | counterforce and countervalue. If you're striking an
             | underground military target (like a missile silo) protected
             | by air defenses, your odds of destroying that target are
             | higher if you use three one megaton yield weapons than if
             | you use a single 20 megaton yield weapon. If you're
             | striking a countervalue target, the devastation caused by a
             | single nuclear detonation will be catastrophic enough to
             | make optimizing for maximum damage pointless.
             | 
             | Thus, weapons designers started to optimize for things
             | other than yield. Safety is a big one, an American nuclear
             | weapon going off on US soil would have far reaching
             | political effects and would likely cause the president to
             | resign. Weapons must fail safely when the bomber carrying
             | them bursts into flames on the tarmac, or when the rail
             | carrying the bomb breaks unexpectedly. They must be
             | resilient against both operator error and malicious
             | sabotage. Oh, and none of these safety considerations are
             | allowed to get in the way of the weapon detonating when it
             | is supposed to. This is really hard to get right!
             | 
             | Another consideration is cost. Nuclear weapons are
             | expensive to make, so a design that can get a high yield
             | out of a small amount of fissile material is preferred.
             | Maintenance, and the cost of maintenance, is also relevant.
             | Will the weapon still work in 30 years, and how much money
             | is required to ensure that?
             | 
             | The final consideration is flexibility and effectiveness.
             | Using a megaton yield weapon on the battlefield to destroy
             | enemy troop concentrations is not a viable tactic because
             | your own troops would likely get caught in the strike. But
             | lower yield weapons suitable for battlefield use (often
             | referred to as tactical nuclear weapons) aren't useful for
             | striking counterforce targets like missile silos. Thus,
             | modern weapon designs are variable yield. The B83 mentioned
             | above can be configured to detonate with a yield in the low
             | kilotons, or up to 1.2 megatons. Thus a single B83 weapon
             | in the US arsenal can cover multiple continencies, making
             | it cheaper and more effective than maintaining a larger
             | arsenal of single yield weapons. This is in addition to
             | special purpose weapons designed to penetrate underground
             | bunkers or destroy satellites via EMP, which have their own
             | design considerations.
        
           | Jabbles wrote:
           | > It literally requires simulating each subatomic particle,
           | individually.
           | 
           | Citation needed.
           | 
           | 1 gram of Uranium 235 contains 2e21 atoms, which would take
           | 15 minutes for this supercomputer to count.
           | 
           | "nuclear bomb simulations" do not need to simulate every
           | atom.
           | 
           | I speculate that there will be _some_ simulations at the
           | subatomic scale, and they will be used to inform other
           | simulations of larger quantities at lower resolutions.
           | 
           | https://www.wolframalpha.com/input?i=atoms+in+1+gram+of+uran.
           | ..
        
             | p_l wrote:
             | Subatomic scale is the perfect option, but we tend to not
             | have time for that, so we sample and average and do other
             | things. At least that's the situation within aerospace's
             | hunger for CFD, I figure nuclear has similar approaches.
        
               | Jabbles wrote:
               | I would like a citation for anyone in aerospace using (or
               | even realistically proposing) subatomic fluid dynamics.
        
         | JumpCrisscross wrote:
         | > _Are they trying to model every single atom?_
         | 
         | Given all nuclear physics happens _inside_ atoms, I 'd hope
         | they're being more precise.
         | 
         | Note that a frontier of fusion physics is characterising plasma
         | flows. So even at the atom-by-atom level, we're nowhere close
         | to a solved problem.
        
           | amelius wrote:
           | Or maybe it suffices to model the whole thing as a gas. It
           | all depends on what they're trying to compute.
        
             | JumpCrisscross wrote:
             | > _maybe it suffices to model the whole thing as a gas_
             | 
             | What are you basing this on? Plasmas don't flow like gases
             | even absent a magnetic field. They're self interacting,
             | even in supersonic modes. This is like saying you can just
             | model gases like liquids when trying to describe a plane--
             | they're different states of matter.
        
         | alephnerd wrote:
         | > I fail to understand how these nuclear bomb simulations
         | require so much compute power
         | 
         | I wrote a previous HN comment explaining this:
         | 
         | Tl;dr - Monte Carlo Simulations are hard and the NPT prevents
         | live testing similar to Bikini Atoll or Semipalatinsk-21
         | 
         | https://news.ycombinator.com/item?id=39515697
        
       | cryptozeus wrote:
       | This is great but I absolutely love that poster of el capitan on
       | the supercomputer racks ! Also TIL there is a list of top500 at
       | https://www.top500.org/lists/top500/2024/11/
        
         | theideaofcoffee wrote:
         | That's a pretty standard Cray feature for systems larger than a
         | few cabinets. El Capitan has the landscape, Hopper at NERSC had
         | a photo of Grace Hopper, Aurora at ANL has a creamy gradient
         | reminiscent of the Borealis, and on and on. Gives them a bit of
         | character beyond the bad-ass Cray label on the doors.
        
       | pama wrote:
       | Noting here that 2700 quadrillion operations per second is less
       | than the estimated sustained throughput of productive bfloat16
       | compute during the training of the large llama3 models, which
       | IIRC was about 45% of 16,000 quadrillion operations per second,
       | ie 16k H100 in parallel at about 0.45 MFU. The compute power of
       | national labs has fallen far behind industry in recent years.
        
         | handfuloflight wrote:
         | Any idea how that stacks up with GPT-4?
        
         | alephnerd wrote:
         | Training an LLM (basically Transformers) is different workflow
         | from Nuclear Simulations (basically Monte Carlo simulations)
         | 
         | There are a lot of intricates, but at a high level they require
         | different compute approaches.
        
           | handfuloflight wrote:
           | Can you expand on why the operations per second is not an apt
           | comparison?
        
             | pertymcpert wrote:
             | When you're doing scientific simulations, you're generally
             | a lot more sensitive to FP precision than ML training which
             | is very, very tolerant of reduced precision. So while FP8
             | might be fine for transformer networks, it would likely be
             | unacceptably inaccurate/unusable for simulations.
        
           | pama wrote:
           | Absolutely. Though the performance of El Capitain is only
           | measured by a linpack benchmark not the actual application.
        
             | pertymcpert wrote:
             | I thought modern supercomputers use benchmarks like HPCG
             | instead of LINPACK?
        
               | fancyfredbot wrote:
               | The top 500 includes both. There is no HPCG result for El
               | Capitan yet:
               | 
               | https://top500.org/lists/hpcg/2024/11/
        
           | Koshkin wrote:
           | This is about the raw compute, no matter the workflow.
        
         | bryanlarsen wrote:
         | A 64 bit float operation is >4X as expensive as a 16 bit float
         | operation.
        
           | Koshkin wrote:
           | In terms of heat dissipation, maybe, yes. But not necessarily
           | in time.
        
           | pama wrote:
           | Agreed. However also note that if it was only matrix
           | multiplies and no full transformer training, the performance
           | of that Meta cluster would be closer to 16k PFlops/s, still
           | much faster than the El Capitain performance measured on
           | linpack and multiplied by 4. Other companies presumably
           | cabled 100k H100s together, but they dont yet publish
           | training data for their LLMs. It is good to have competition,
           | I just didnt expect the tables to flip so dramatically over
           | the last two decades from a time when governments still ruled
           | the top spots in computer centers with ease to nowadays where
           | the assumption is that there are at least ten companies with
           | larger clusters than the most powerful governments.
        
       | declan_roberts wrote:
       | Do super computers need proximity to other compute nodes in order
       | to perform this kind of computations?
       | 
       | I wonder what would happen if Apple offered people something like
       | iCloud+ in exchange for using their idle M4 compute at night time
       | for a distributed super computer.
        
         | conception wrote:
         | If you weren't aware -
         | https://en.m.wikipedia.org/wiki/Folding@home
        
           | declan_roberts wrote:
           | More of a SETI@home man myself.
        
         | theideaofcoffee wrote:
         | The thing that sets these machines apart from something that
         | you could set up in AWS (to some degree), or in a distributed
         | sense like you're suggesting is the interconnect, how the
         | compute nodes communicate. For a large system like El Capitan,
         | you're paying a large chunk of the cost in connecting the nodes
         | together, low latency, interesting topologies that ethernet,
         | nor even Infiniband can get close to. Code that requires a lot
         | of DMA or message passing really will take up all of the
         | bandwidth that's available, that becomes the primary bottleneck
         | in these systems.
         | 
         | The interconnect has been Cray's bread and butter for multiple
         | decades: Slingshot, Dragonfly, Aries, Gemini, SeaStar, numalink
         | via sgi, etc. and those for the less massively parallel systems
         | before those.
        
         | philipkglass wrote:
         | Yes, supercomputers need low-latency communication between
         | nodes. If a problem is "embarrassingly parallel" (like
         | folding@home, mentioned by sibling comment) then you can use
         | loosely coordinated nodes. Those sorts of problems usually
         | don't get run on supercomputers in the first place, since there
         | are cheaper ways to solve them.
        
       | balia wrote:
       | Some may not want to hear this, but these "fastest supercomputer"
       | list is now meaningless because all the Chinese labs have started
       | obfuscating their progress.
       | 
       | A while ago there were a few labs in China in top 10 and they all
       | attracted sanctions / bad attention. Now no Chinese lab report
       | any data now
        
         | pknomad wrote:
         | I wouldn't say meaningless... just incomplete.
        
         | leptons wrote:
         | I doubt the US Government is telling everyone about their
         | fastest computer.
        
       | sandworm101 wrote:
       | This new upstart to the name may win in search results today, but
       | in a few years the first and true El Cap will reclaim its place.
       | It will outlast all of us.
       | 
       | https://en.wikipedia.org/wiki/El_Capitan
        
       ___________________________________________________________________
       (page generated 2024-11-19 23:01 UTC)