[HN Gopher] AMD powers the most powerful supercomputer
       ___________________________________________________________________
        
       AMD powers the most powerful supercomputer
        
       Author : lelf
       Score  : 172 points
       Date   : 2022-05-31 13:42 UTC (9 hours ago)
        
 (HTM) web link (venturebeat.com)
 (TXT) w3m dump (venturebeat.com)
        
       | curiousgal wrote:
       | How much of that performance will get undone by the software
       | though? Either through AMD's lack of effort or Intel's compiler
       | "sabotage".
        
         | ghc wrote:
         | It probably won't be a factor. The likelihood of the system
         | using standard compilers or drivers is quite low. It's non-
         | trivial to optimize a compiler and drivers for a supercomputer,
         | so companies like Cray make their own.
        
       | mrb wrote:
       | The HN crowd would probably prefer reading the many technical
       | details at the ORNL press release:
       | https://www.ornl.gov/news/frontier-supercomputer-debuts-worl...
       | which I just submitted here:
       | https://news.ycombinator.com/item?id=31573066
       | 
       | Also, yesterday Tom's hardware had a detailed article:
       | https://www.tomshardware.com/news/amd-powered-frontier-super...
       | 29 MW total, 400 kW per rack(!)
       | 
       | And, anyone else is like me and wants to see actual pictures or
       | videos of the supercomputer, instead of a rendering like in
       | venturebeat article? Well, head here, ORNL has a very short
       | video: https://www.youtube.com/watch?v=etVzy1z_Ptg We can see
       | among other things: that it's water-cooled (the blue and red
       | tubing), at 0m3s we see a PCB labelled "Cray Inc Proprietary ...
       | Sawtooth NIC Mezzanine Card"
        
         | pvg wrote:
         | Not much point submitting a dupe with the discussion already on
         | the front page but you can email your better links to the mods
         | who are looking for better a better link:
         | 
         | https://news.ycombinator.com/item?id=31571551
        
       | mihaic wrote:
       | The more powerful processors become, the less I feel there's a
       | need to build supercomputers.
       | 
       | Thinking about it, the most powerful supercomputer in the world
       | is pretty much a million consumer processors, working in
       | parallel. That's going to stay pretty constant, since cost scales
       | roughly linearly.
       | 
       | If X is the processing power of $1k of consumer hardware, the
       | bigger X gets, the less there is a difference in the class of
       | problems that you can solve with X or X * 1e6 processing power.
        
         | uniqueuid wrote:
         | Sure, but consumer hardware does not have infiniband or other
         | high-bandwidth interconnects. That means you can have at most
         | ~1-2TB of ram accessible at any point. Some problems need
         | coordination, and when you're back at OpenMP etc., a
         | supercomputer suddenly makes sense.
        
           | mihaic wrote:
           | I agree right now, I'm thinking maybe in 15 years you can
           | have >1PB on a single machine, and then those problems that
           | don't fit in that space but that fit in a supercomputer
           | become fewer. 2050 will be within out lifetime.
           | 
           | Basically I'm estimating the benefit ratio to be (log
           | SupercomputerSize - log ConsumerSize)/log ConsumerSize, and
           | that keeps decreasing.
        
             | uniqueuid wrote:
             | You're not wrong.
             | 
             | The set of problems that fit into a single node is growing.
             | At least in some fields where the added benefit of more
             | data is less important than, say, more precise
             | measurements.
        
         | hdjjhhvvhga wrote:
         | By the way, while cost may scale linearly, the number of cores
         | doesn't[0]. We have more powerful computers in our pockets than
         | Cray supercomputers from the 80s. And I feel we still haven't
         | learned how to use these cores in an efficient way.
         | 
         | [0] https://i.imgur.com/Gad4cKk.png
        
         | mastax wrote:
         | The coherent memory interconnects between nodes is typically
         | what makes supercomputers different than just a bunch of
         | consumer hardware. It allows different types of programming or
         | at least makes them easier.
        
           | jabl wrote:
           | It's a very fast, very low latency network fabric. But it's
           | not coherent in the sense of cache coherent multiprocessors,
           | and it doesn't offer shared memory style programming where
           | you'd just load/store to addresses that happen to be mapped
           | to another compute node somewhere in the system.
        
             | l33t2328 wrote:
             | I thought DMI allowed for exactly those kinds of load/store
             | operations
        
         | xhkkffbf wrote:
         | If you think of it this way, aren't some botnets truly the most
         | powerful computing systems?
        
       | uniqueuid wrote:
       | Since they are using AMD's accelerators as well [1], I do wonder
       | whether any usage of these will trickle down and give us
       | improvements in ROCm.
       | 
       | Surely the people at these labs will want to run ordinary DL
       | frameworks at some point - or do they have the money and time to
       | always build entirely custom stacks?
       | 
       | [1] AMD Instinct MI250x in this case.
        
         | mastax wrote:
         | These supercomputer contracts typically have a large amount
         | dedicated to software support. I remember reading on AnandTech
         | (?) that AMD was explicitly putting a bunch of engineers on
         | ROCm for this project. It's one of the reason companies like
         | these contracts so much.
        
         | pinhead wrote:
         | Surprisingly, ROCm support has been getting a lot better over
         | the very recent years. In my experience the pytorch support is
         | essentially seamless between CUDA and ROCm. Also, I know some
         | popular frameworks like DeepSpeed have announced support and
         | benchmarks on it as well:
         | https://cloudblogs.microsoft.com/opensource/2022/03/21/suppo...
        
         | dragontamer wrote:
         | > Surely the people at these labs will want to run ordinary DL
         | frameworks at some point
         | 
         | I don't know about that. A lot of these labs are doing physics
         | simulations and are probably happy to stick with their dense-
         | matrix multiply / BLAS routines.
         | 
         | Deep learning is a newer thing. These national labs can run
         | them of course, but these national labs have existed for many
         | decades and have plenty of work to do without deep learning.
         | 
         | > or do they have the money and time to always build entirely
         | custom stacks?
         | 
         | Given all the talk about OpenMP compatibility and Fortran... my
         | guess is that they're largely running legacy code in Fortran.
         | 
         | Perhaps some new researchers will come in and try to get some
         | deep-learning cycles in the lab and try something new.
        
           | marcosdumay wrote:
           | > Given all the talk about OpenMP compatibility and
           | Fortran... my guess is that they're largely running legacy
           | code in Fortran.
           | 
           | The must used linear algebra library is written in Fortran.
           | There's nothing "legacy" about it, it's just that nobody was
           | able to replicate its speed in C.
        
             | dragontamer wrote:
             | BLAS itself has been rewritten in Nvidia CUDA and AMD HIP,
             | and is likely the workhorse in this case. (Remember that
             | Frontier is mostly GPUs and the bulk of code should be GPU
             | compatible)
             | 
             | Presumably that old Fortran code has survived many
             | generations of ports: Connection Machine, DEC Alpha, Intel
             | Itanium, SPARC and finally today's GPU heavy systems. The
             | BLAS layer keeps getting rewritten but otherwise the bulk
             | of the simulators still works.
        
             | nspattak wrote:
             | If you are talking about netlib blas/lapack I am very
             | confused by what you are saying because the fastest
             | blas/lapack implementations are in c/c++.
        
             | jcranmer wrote:
             | > The must used linear algebra library is written in
             | Fortran.
             | 
             | My understanding is that most supercomputers have the
             | vendor provide their implementation of BLAS (e.g., if it's
             | Intel-based, you're getting MKL) that's specifically tuned
             | for that hardware. And these implementations stand a decent
             | chance of being written in _assembly_ , not Fortran.
        
               | bee_rider wrote:
               | Usually C or Fortran superstructure, and assembly
               | kernels.
               | 
               | The clearest form of this is in BLIS, which is a C
               | framework you can drop your assembly kernel into, and
               | then it makes a BLAS (along with some other stuff) for
               | you. But the idea is also present in OpenBlas.
               | 
               | Lots of this is due to the legacy of gotoBlas (which was
               | forked into OpenBlas, and partially inspired BLIS),
               | written by the somewhat famous (in HPC circles at least)
               | Kazushige Goto. He works at Intel now, so probably they
               | are doing something similar.
        
             | bee_rider wrote:
             | I think you've made a slightly bigger claim than is
             | necessary, which has lead to a focus on BLAS, which misses
             | the point.
             | 
             | The _best_ BLAS libraries use C and Assembly. This is
             | because BLAS is the de-facto standard interface for Linear
             | Algebra code, and so it is worthwhile to optimize it to an
             | extreme degree (given infinite programmer-hours, C can beat
             | any language, because you can embed assembly in C).
             | 
             | But for those numerical codes which aren't incredibly hand-
             | optimized, Fortran makes nice assumptions, it should be
             | able to optimize the output of a moderately skilled
             | programmer pretty well (hey we aren't all experts, right?).
        
             | paulmd wrote:
             | I don't remember the exact specifics, but Fortran disallows
             | some of the constructs that C/C++ struggle with aliasing
             | on, so Fortran can often be (safely) optimized to much
             | higher-performance code because of this
             | limitation/knowledge.
             | 
             | Like, it's always seemed like there's a certain amount of
             | fatalism around Undefined Behavior in C/C++, like this is
             | somehow how it has to be to write fast code but... it's
             | not. You can just declare things as actually forbidden
             | rather than just letting the compiler identify a boo-boo
             | and silently do whatever the hell it wants.
             | 
             | Of course it's not the right tool for every task, I don't
             | think you'd write bit-twiddling microcontroller stuff in
             | fortran, or systems programming. But for the HPC space, and
             | other "scientific" code? Fortran is a good match and very
             | popular despite having an ancient legacy even by C/C++
             | standards (both have, of course, been updated through
             | time). Little less flexible/general, but that allows less-
             | skilled programmers (scientists are not good programmers)
             | to write fast code without arcane knowledge of the gotchas
             | of C/C++ compiler magic.
        
               | jabl wrote:
               | > I don't remember the exact specifics, but Fortran
               | disallows some of the constructs that C/C++ struggle with
               | aliasing on, so Fortran can often be (safely) optimized
               | to much higher-performance code because of this
               | limitation/knowledge.
               | 
               | For a crude approximation, Fortran is somewhat equivalent
               | to C code where all pointer function arguments are marked
               | with the restrict keyword.
               | 
               | > Like, it's always seemed like there's a certain amount
               | of fatalism around Undefined Behavior in C/C++, like this
               | is somehow how it has to be to write fast code but...
               | it's not. You can just declare things as actually
               | forbidden rather than just letting the compiler identify
               | a boo-boo and silently do whatever the hell it wants.
               | 
               | Well, it's kind more dangerous than C, in this aspect.
               | The aliasing restriction is a restriction on the Fortran
               | programmer; the compiler or runtime is not required to
               | diagnose it, meaning that the Fortran compiler is allowed
               | to optimize assuming that two pointers don't alias.
               | 
               | That being said, in general I'd say Fortran has less
               | footguns than C or C++, and is thus often a better choice
               | for a domain expert that just wants to crunch numbers.
        
           | jcranmer wrote:
           | From my limited exposure to the HPC groups at the labs,
           | there's a mixture of languages in use. It seems that modern
           | C++ is the dominant language for a lot of new projects--some
           | of the people I talked to were working on libraries that
           | aggressively used C++11/C++14 features.
           | 
           | The biggest challenge the national labs face is that there's
           | not really any budget (or appetite) to _rewrite_ software to
           | take advantage of hardware features (particularly the GPU-
           | based accelerator that 's all the rage nowadays). You _might_
           | be able to get a code rewritten once, but an era where every
           | major HPC hardware vendor wants you to rewrite your code into
           | their custom language for their custom hardware results in
           | code that will not take advantage of the power of that custom
           | hardware. OpenMP, being already fairly widespread, ends up
           | becoming the easiest avenue to take advantage of that
           | hardware with minimal rewriting of code (tuning a pragma
           | doesn 't really count as rewriting).
        
           | Symmetry wrote:
           | Also, while NVidia has been adding extra AI acceleration to
           | their chips AMD has been throwing in extra double precision
           | resources that HPC generally requires. If you're training an
           | AI rather than simulating the climate/a thermonuclear
           | explosion/etc then you're probably better off using NVidia
           | cards but AMD made the right technical investments to get
           | these supercomputer contracts.
        
             | dekhn wrote:
             | It's kind of surprising that nvidia hasn't purchased AMD.
             | It really feels like there's a single company between the
             | two that would be truly effective- AMD for the classic CPU
             | oomph, nvidia for the GPU oomph, combining their strengths
             | in interconnects. It would be a player from the high-end PC
             | to the supercomputer market, without even pretending to go
             | for the low-power market (ARM).
        
               | krylon wrote:
               | Intel and AMD have a patent-licensing agreement where
               | Intel licenses their x86 stuff to AMD, and AMD licenses
               | their amd64 stuff to Intel. AFAIK, the moment AMD gets
               | bought by another company, they can no longer use Intel's
               | patents, and the moment _that_ happens, Intel can no
               | longer use AMD 's patents. I'm not sure how much of
               | x86/amd64 you can legally implement without infringing on
               | any of these patents, but it might very well result in a
               | _really_ awkward situation.
               | 
               | Sure, the new owners could re-negotiate with Intel, and
               | maybe nothing would change. But who knows? A combined
               | AMD/nVidia might be a sufficient threat to Intel they
               | might pull some desperate moves.
               | 
               | (In some timeline, this turns out to be the boost that
               | makes RISC-V the new "standard" ISA, but I am not so
               | optimistic it is the one we live in.)
        
               | ridgered4 wrote:
               | AMD and Nvidia were in talks to merge at one point,
               | apparently the talks fell apart because Nvidia's CEO
               | insisted on being the new CEO of the combined company and
               | AMD would have none of that. So they purchased ATI
               | instead, probably overpaid for it and probably pushed the
               | bulldozer concepept to hard in an effort to prove it was
               | worth it after all.
               | 
               | Nvidia actually used to develop chipsets for AMD
               | processors include onboard GPUs, they did for Intel as
               | well but they had a much more serious relationship with
               | AMD in my estimation. This stopped with the ATI purchase
               | since ATI is nvidia's main competitor the two companies
               | stopped working together. Intel later killed all 3rd
               | party chipset altogether and AMD had to do a lot of
               | chipset work they weren't doing before.
               | 
               | I sometimes wonder what would have happened if they had
               | merged back then. I personally think a Jensen Huang run
               | AMD would have done much better than AMD+ATI did in that
               | era. I could easily see ATI having collapsed. What would
               | the consoles use now? Would nvidia have been as
               | aggressive as it has been without the strategic weakness
               | of now controlling the platform it's products run on?
        
               | paulmd wrote:
               | I think based on recent history you can argue that NVIDIA
               | is very aware of the potential anticompetitive actions
               | that could result if they kill or even substantially pass
               | AMD.
               | 
               | There really used to be a lot of intra-generational
               | tweaking and refinement, like if you look back at Maxwell
               | there were really at least 3 and I suspect 4 total
               | steppings of the maxwell architecture (GM107,
               | GM204/GM200, and GM206 - and I suspect GM200 was a
               | separate "stepping" too due to how much higher it clocks
               | than GM204 - which is the opposite of what you'd expect
               | from a big chip). Kepler had at least 4 major versions
               | (GK1xx, GK110B, GK2xx, GK210), Fermi had at least 2
               | (although that's where I'm no longer super familiar with
               | the exact details).
               | 
               | Anyway point is there used to be a _lot_ more intra-
               | generational refinement, and I think that has largely
               | stopped, it 's just thrown over the wall and done. And I
               | think the reason for _that_ is that if NVIDIA really
               | cranked full-steam ahead they 'd be getting far enough
               | ahead of AMD to potentially start raising antitrust
               | concerns. We are now in the era of "metered performance
               | release", just enough to stay ahead of AMD but not enough
               | to actually raise problems and get attention from
               | antitrust regulators.
               | 
               | Same thing for the choice of Samsung 8nm for Ampere and
               | TSMC 12nm for Turing, while AMD was on TSMC 7nm for both
               | of those. Sure, volume was a large part of that decision,
               | but they're already matching AMD with a 1-node deficit
               | (Samsung 8nm is a 10+, and the gap between 10 and TSMC 7
               | is huge to begin with) and they were matching with a 1.5
               | node deficit during the Turing generation (12FFN is a
               | TSMC 16+ node - that is almost 2 full nodes to TSMC 7nm).
               | They _cannot_ just make arbitrarily fast processors that
               | dump on AMD, or regulators will get mad, so in that case
               | they might as well optimize for cost and volume instead.
               | If they had done a TSMC 7nm against RDNA1 they probably
               | would be starting to get in that danger zone - I 'm sure
               | they were watching it carefully during the Maxwell era
               | too.
               | 
               | (the people who imagined some giant falling-out between
               | TSMC are pretty funny in hindsight. (A) NVIDIA still had
               | parts at TSMC anyway, and (B) TSMC obviously couldn't
               | have provided the same volume as Samsung did, certainly
               | not at the same price, and volume ended up being a
               | godsend during the pandemic shortages and mining. Yeah,
               | shortages sucked, but they could still have been worse if
               | NVIDIA was on TSMC and shipping half or 2/3rds of their
               | current volume.)
               | 
               | Of course now we may see that dynamic flip with AMD
               | moving to MCM products earlier, or maybe that won't be
               | for another year or so yet rumors are suggesting
               | monolithic midrange chips will be AMD's first product. Or
               | perhaps "monolithic", being technically MCM but with
               | cache dies/IO dies rather than multiple compute dies. But
               | with RDNA3 AMD is potentially poised to push NVIDIA a
               | little bit, rather than just the controlled opposition
               | we've seen for the past few generations, hence NVIDIA
               | reportedly moving to TSMC N5P and going quite large with
               | a monolithic chip to compete.
        
               | jcranmer wrote:
               | > It's kind of surprising that nvidia hasn't purchased
               | AMD.
               | 
               | One word: antitrust. The discrete GPU market these days
               | consists of Nvidia and AMD, with Intel only just now
               | dipping its toes into the market (I don't think there's
               | anything saleable to retail customers yet). Nvidia buying
               | AMD would make it a true monopoly in that market, and
               | there's no way that would pass antitrust regulators.
               | Nvidia recently tried to buy ARM, and even that
               | transaction was enough for antitrust regulators to say
               | no.
        
           | [deleted]
        
         | torrance wrote:
         | I'm not using Frontier, but I am using Setonix which is a large
         | AMD cluster being rolled out in Australia. All of AMD's
         | teaching materials are about ROCm so this is very much how
         | they're expecting it to be used.
         | 
         | The real pain for us is that there's no decent consumer grade
         | chips with ROCm compatibility for us to do development on. AMD
         | have made it very clear they only care about the data centre
         | hardware when it comes to ROCm, but I have no idea what kind of
         | developer workflow they're expecting there.
        
           | tormeh wrote:
           | It's bare pickings, but there are chips: https://docs.amd.com
           | /bundle/Hardware_and_Software_Reference_...
        
           | uniqueuid wrote:
           | Interesting. So what is your workflow right now?
        
             | torrance wrote:
             | Develop against CUDA locally. Port my kernels to ROCm, and
             | occupy a whole HPC node for debugging and performance
             | tuning for a week. It's terrible.
             | 
             | Edit: I should say that their recommendation is to write
             | the kernels in 'hip' which is supposed to be their cross
             | device wrapper for both cuda or ROCm. I'm writing in Julia
             | however so that's not possible.
        
               | claforte wrote:
               | The AMD software stack has been behind for a long time
               | but I feel like we're finally catching up. I heard that
               | HIP (and hopefully the rest of ROCM) is now supported on
               | the RX6800XT consumer GPU... maybe that could help? BTW
               | my team at AMD has been using Julia for ML workloads for
               | a while. We should get in touch - maybe some of the
               | lessons we learn can be useful to you. My email is
               | claforte. The domain I'm sure you can guess. ;-)
        
               | vchuravy wrote:
               | If you are using Julia I would recommend looking at
               | AMDGPU.jl and (pluging my own project here)
               | KernelAbstractions.jl
        
               | claforte wrote:
               | BTW have you tried `KernelAbstractions.jl`? With it you
               | can write code once that will run reasonably fast on AMD
               | or NVIDIA GPUs or even on CPU. One of our engineers just
               | started using it and is pleased with it - apparently the
               | performance is nearly equivalent to native CUDA.jl or
               | AMDGPU.jl, and the code is simpler.
        
           | eslaught wrote:
           | I'm surprised you're not using HIP? At least in my experience
           | it seems like HIP is the go-to system for programming the AMD
           | GPUs, in large part because of CUDA compatibility. You can
           | mostly get things to work with a one-line header change [1].
           | 
           | (I work for a DOE lab but views are my own, etc.)
           | 
           | [1] As an example, see the approach in:
           | https://github.com/flatironinstitute/cufinufft/pull/116
        
             | pmarcelll wrote:
             | HIP is just the programming language/runtime, ROCm is the
             | whole software stack/platform.
        
           | sorenjan wrote:
           | Can you write SYCL code and compile it to ROCm for
           | production?
        
           | JonChesterfield wrote:
           | The rocm stack will run on non-datacentre hardware in YMMV
           | fashion. A lot of the llvm rocm development is done on
           | consumer hardware, the rocm stack just isn't officially
           | tested on gaming cards during the release cycle. In my
           | experience codegen is usually fine and the Linux driver a bit
           | version sensitive.
        
           | tkinom wrote:
           | NV21?
           | 
           | https://www.phoronix.com/scan.php?page=news_item&px=Radeon-R.
           | ..
        
           | dragontamer wrote:
           | Vega64 or Vega56 seems to work pretty well with ROCm in my
           | experience.
           | 
           | Hopefully AMD gets the Rx 6800xt working with ROCm
           | consistently, but even then, the 6800xt is RDNA2, while the
           | supercomputer Mx250x is closer to the Vega64 in more ways.
           | 
           | So all in all, you probably want a Vega64, Radeon VII, or
           | maybe an older MI50 for development purposes.
        
             | slavik81 wrote:
             | > Hopefully AMD gets the Rx 6800xt working with ROCm
             | consistently
             | 
             | I am a maintainer for rocSOLVER (the ROCm LAPACK
             | implementation) and I personally own an RX 6800 XT. It is
             | very similar to the officially supported W6800. Are there
             | any specific issues you're concerned about?
             | 
             | I know the software and I have the hardware. I'd be happy
             | to help track down any issues.
        
               | dragontamer wrote:
               | That's good to hear.
               | 
               | I might be operating off of old news. But IIRC, the 6800
               | wasn't well supported when it first came out, and AMD
               | constantly has been applying patches to get it up-to-
               | speed.
               | 
               | I wasn't sure what the state of the 6800 was (I don't own
               | it myself), so I might be operating under old news. As I
               | said a bit earlier, I use the Vega64 with no issues (for
               | 256-thread workgroups. I do think there's some obscure
               | bug for 1024-thread workgroups, but I haven't really been
               | able to track it down. And sticking with 256-threads is
               | better for my performance anyway, so I never really
               | bothered trying to figure this one out)
        
               | slavik81 wrote:
               | Navi 21 launched in November 2020 but it only got
               | official support with ROCm 5.0 in February 2022.
               | 
               | With respect to your issue running 1024 threads per
               | block, if you're running out of VGPRs, you may want to
               | try explicitly specify the max threads per block as 1024
               | and see if that helps. I recall that at one point the
               | compiler was defaulting to 256 despite the default being
               | documented as 1024.
        
               | dragontamer wrote:
               | The main issue I have with the idea of Navi 21 is that
               | its a 32-wide warp, when CDNA2 (like MX250x) is 64-wide
               | warp.
               | 
               | Granted, RDNA and CDNA still have largely the same
               | assembly language, so its still better than using say...
               | NVidia GPUs. But I have to imagine that the 32-wide vs
               | 64-wide difference is big in some use cases. In
               | particular: low-level programs that use warp-level
               | primitives, like DPP, shared-memory details and such.
               | 
               | I assume the super-computer programmers want a cheap
               | system to have under their desk to prototype code that's
               | similar to the big MI250x system. Vega56/64 is several
               | generations old, while 6800 xt is pretty different
               | architecturally. It seems weird that they'd have to buy
               | MI200 GPUs for this purpose, especially in light of
               | NVidia's strategy (where A2000 nvidia could serve as a
               | close replacement. Maybe not perfect, but closer to the
               | A100 big-daddy than the 6800xt is to the big daddy
               | MI250x).
               | 
               | --------
               | 
               | EDIT: That being said: this is probably completely moot
               | for my own purposes. I can't afford an MI250x system at
               | all. At best I'd make some kind of hand-built consumer
               | rig for my own personal purposes. So 6800 xt would be all
               | I personally need. VRAM-constraints feel quite real, so
               | the 16GBs of VRAM at that price makes 6800xt a very
               | pragmatic system for personal use and study.
        
         | eslaught wrote:
         | Yes, DOE is very interested in DL. I don't work on this
         | personally, but you can see an example e.g. here [1, 2]. You
         | can see in the first link they're using Keras. I'm not up to
         | date on all the details (again, don't work on this personally)
         | but in general the project is commissioned to run on all of
         | DOE's upcoming supercomputers, including Frontier.
         | 
         | [1]: https://github.com/ECP-CANDLE/Benchmarks
         | 
         | [2]: https://www.exascaleproject.org/research-project/candle/
        
         | JonChesterfield wrote:
         | The rocm stack is one of the toolchains deployed on Frontier.
         | With determination, llvm upstream and rocm libraries can be
         | manually assembled into a working toolchain too. It's not so
         | much trickle down improvements as the same code.
        
       | scardycat wrote:
       | Congratulations to AMD, HPE and ORNL! This is an amazing
       | achievement. Can't wait to see the spectacular science results
       | coming from this installation.
       | 
       | Intel was supposed to build the first Exascale system for ANL [1]
       | [2]. to be installed by 2018. They completely and utterly messed
       | up the execution, partly drive by 10nm failure, went back to the
       | drawing board multiple times, and now Raja switched the whole
       | thing to GPUs, a technology that Intel has no previous success
       | with and rebased it to 2 ExaFlops peak, meaning they probably
       | expect 1 EF sustained performance, a 50% efficiency. No other
       | facility would ever consider Intel as a prime contractor again.
       | ANL hitched their wagon to the wrong horse.
       | 
       | 1. https://www.alcf.anl.gov/aurora 2.
       | https://insidehpc.com/2020/08/exascale-exasperation-why-doe-...
        
         | throwawaylinux wrote:
         | What is Raja?
        
           | jcranmer wrote:
           | Raja is the head of GPU development at Intel.
        
           | maxwell86 wrote:
           | A person that works at Intel.
        
           | Bayart wrote:
           | Raja Koduri, the head of Graphics at Intel. Before that he
           | was leading the Radeon group at AMD. He's been doing GPU
           | stuff since the 90s.
        
         | interesting_pt wrote:
         | I worked at Intel in a very closely related area.
         | 
         | I quit after getting vaccinated for COVID, only stayed because
         | of the pandemic.
         | 
         | The biggest problem was that Intel simply couldn't execute.
         | They couldn't design and manufacture hardware in a timely
         | manner without too many bugs. I think this was due to poor
         | management practices. My direct manager was amazing, but my
         | skiplevel was always dealing with fires. It felt like instead
         | of the effort being orchestrated that someone approached a
         | crowd of engineers and used a bullhorn to tell them the big
         | goal and that was it. The left hand had no idea what the right
         | hand was doing.
         | 
         | I often called Intel an 'ant hill', because the engineers would
         | swarm a project just like ants do a meal. Some would get there
         | and pull the project forward, some would get on top and
         | uselessly pull upward, and more than I'd like would get behind
         | the project and pull it backwards. Just a mindless swarm of
         | effort, which generally inefficiently kinda did the right thing
         | sometimes.
         | 
         | The inability to execute started to effect my work. When I got
         | a ticket to complete something, I just wouldn't. There was a
         | very good chance that I'd have an extra few weeks (due to
         | slippage) or the task would never need to get done, because the
         | hardware would never appear. Planning was impossible.
         | 
         | Conversely, sometimes hardware _CAME OUT OF NOWHERE_ , not
         | simple stuff, but stuff like laptops made by partners. Just
         | randomly my manager would ask me to support a product we were
         | told directly wouldn't exist, but now did. I needed to help our
         | partner with support right now. Our partners were starting to
         | hate us and it was palpable in meetings.
         | 
         | I'm so glad I quit, I was being worked to the bone on a project
         | which will probably fail and be a massive liability. Even if
         | the economy crashes, and I can't get a job for years, and end
         | up broke, it'll still have been worth it. I also only made
         | 110K/yr base.
        
           | allie1 wrote:
           | I've been reading about Pat Gelsinger turning things around
           | on execution, but many of the announced products for this
           | year are already late (Sapphire Rapids, GPUs, even alder lake
           | roll out was late).
           | 
           | Do you know if anything has changed at Intel? Is it
           | reasonable to expect changes within a year and a half of
           | starting on the job given the size of the company and the
           | changes needed?
        
       | moffkalast wrote:
       | Now the real question: Can it run Crysis... without hardware
       | acceleration?
        
         | cesarb wrote:
         | > Can it run Crysis... without hardware acceleration?
         | 
         | I understand you are joking, but it's a legitimate benchmark,
         | one which I've seen at least Anandtech using. For instance, a
         | quick web search found an article from last year
         | (https://www.anandtech.com/show/16478/64-cores-of-
         | rendering-m...) which shows an AMD CPU (a Ryzen 9) running
         | Crysis without hardware acceleration at 1080p at nearly 20 FPS.
         | As that article says, it's hard to go much higher than that,
         | due to limitations of the Crysis engine.
        
       | gsibble wrote:
       | What an incredible achievement. Good for AMD. The Epyc is a
       | fantastic processor.
       | 
       | And there are another 2 (3?) faster systems coming online in the
       | next year or so.
        
         | adrian_b wrote:
         | Besides being the first system exceeding the 1 Exaflop/s
         | threshold, what is more impressive is that this is also the
         | system with the highest ratio between computational speed and
         | power consumption (i.e. the AMD devices have the first place in
         | both Top500 and Green500).
         | 
         | The AMD GPUs with the CDNA ISA have surpassed in energy
         | efficiency both the NVIDIA A100 GPUs and the Fujitsu ARM with
         | SVE CPUs, which had been the best previously.
         | 
         | Unfortunately, AMD has stopped selling at retail such GPUs
         | suitable for double-precision computations.
         | 
         | Until 5 or 6 years ago, the AMD GPUs were neither the fastest
         | nor the most energy-efficient, but they had by far the best
         | performance per dollar of any devices that could be used for
         | double-precision floating-point computations.
         | 
         | However, when they have made the transition to RDNA, they have
         | separated their gaming and datacenter GPUs. The former are
         | useless for DP computations and the latter cannot be bought by
         | individuals or small companies.
        
           | Const-me wrote:
           | > The former are useless for DP computations
           | 
           | Looking at "double-precision GFlops" columns there [1] they
           | don't seem terribly bad, more than twice as fast compared to
           | similar nVidia chips [2]
           | 
           | While specialized extremely expensive GPUs from both vendors
           | are way faster with many TFlops of FP64 compute throughput, I
           | wouldn't call high-end consumer GPUs useless for FP64
           | workloads.
           | 
           | The compute speed is not terribly bad, and due to some
           | architectural features (ridiculously high RAM bandwidth, RAM
           | latency hiding by switching threads) in my experience they
           | can still deliver a large win compared to CPUs of comparable
           | prices, even in FP64 tasks.
           | 
           | [1]
           | https://en.wikipedia.org/wiki/Radeon_RX_6000_series#Desktop
           | 
           | [2] https://en.wikipedia.org/wiki/GeForce_30_series#GeForce_3
           | 0_(...
        
             | guenthert wrote:
             | SP is sixteen times the performance of DP here for no other
             | reasons then market segmentation. Nvidia might have started
             | that, but that's no reason not to call AMD out for it.
        
             | adrian_b wrote:
             | "Useless" means that both DP Gflops/s/W and DP Gflops/s/$
             | are worse for the modern AMD and NVIDIA gaming GPUs, than
             | for many CPUs, so the latter are a better choice for such
             | computations.
             | 
             | The opposite relationship between many AMD GPUs and the
             | available CPUs was true until 5-6 years ago, while NVIDIA
             | had reduced the DP computation abilities of their non-
             | datacenter GPUs many years before AMD, despite their
             | previous aggressive claims about GPGPU being the future of
             | computation, which eventually proved to be true only for
             | companies and governments with exceedingly deep pockets.
        
               | Const-me wrote:
               | My desktop PC has Ryzen 7 5700G, on paper it can do 486
               | GFlops FP64 (8 cores at 3.8 GHz base frequency, two
               | 4-wide FMAs every cycle). However, that would require
               | 2TB/sec memory bandwidth, while the actual figure is 51
               | GB/second of that bandwidth. For large computational
               | tasks where the source data doesn't fit in caches, the
               | CPU can only achieve a small fraction of the theoretical
               | peak performance 'coz bottlenecked by memory.
               | 
               | The memory in graphics cards is an order of magnitude
               | faster, my current one has 480 GB/sec of that bandwidth.
               | For this reason, even gaming GPUs can be much faster than
               | CPUs on some workloads, despite the theoretical peak FP64
               | GFlops number is about the same.
        
               | adrian_b wrote:
               | You are right that there are problems whose solving speed
               | is limited by the memory bandwidth, and for such problems
               | GPUs may be better than CPUs.
               | 
               | Nevertheless, many of the problems of this kind require
               | more memory than the 8 GB or 16 GB that are available on
               | cheap GPUs, so the CPUs remain better for those.
               | 
               | On the other hand, there are a lot of problems whose
               | time-consuming part can be reduced to multiplications of
               | dense matrices. During the solution of all such problems,
               | the CPUs will reach a large fraction of their maximum
               | computational speed, regardless whether the operands fit
               | in the caches or not (when they do not fit, the
               | operations can be decomposed into sub-operations on
               | cache-sized blocks, and in such algorithms the cache
               | lines are reused enough times so that the time used for
               | transfers does not matter).
        
               | Const-me wrote:
               | I guess I was lucky with the CAM/CAE software I'm working
               | on. We don't have too many GB of data, the stuff fits in
               | VRAM of inexpensive consumer cards.
               | 
               | One typical problem is multiplying dense vector by a
               | sparse matrix. Unlike multiplication of two dense
               | matrices, I don't think it's possible to decompose into
               | manageable pieces which would fit into caches to saturate
               | the FP64 math of the CPU cores.
               | 
               | We have tested our software on nVidia Teslas in a cloud
               | (the expensive ones with many theoretical TFlops of FP64
               | compute), the performance wasn't too impressive.
        
           | visarga wrote:
           | Computational speed is important, but more important is the
           | data transfer speed. At least in ML. Is AMD the best for data
           | transfer speed?
        
       | gigatexal wrote:
       | I am still kicking myself every time I look at AMD's share price.
       | I sold a not-insignificant-to-me amount of shares when the price
       | was basically below 10 a share. Now it's above 100. All this is
       | to say that the turn around at AMD is good to see and the
       | missteps at Intel are hilarious.
       | 
       | This is like the time the Athlon64 and it's on die memory
       | controller was kicking the Pentiums around.
        
         | edm0nd wrote:
         | Now would be a pretty decent time to buy back in if you still
         | wanna go long on AMD again.
        
           | gigatexal wrote:
           | I did a few weeks ago. It's the only thing other than Nvidia
           | that is up in my portfolio right now, lol.
        
       | SoftTalker wrote:
       | Since Cray stopped making their own CPUs, they have been back and
       | forth between AMD and Intel several times.
        
         | wmf wrote:
         | It's not really back and forth; Cray supports Intel, AMD, and
         | ARM CPUs equally as well as Nvidia, AMD, and Intel GPUs.
        
       | zepmck wrote:
       | The most powerful and unfortunately unusable supercomputer of the
       | world. AMD's approach to GPUs is on a failing track since its
       | inception. The only software stack available is super fragile,
       | buggy and barely supported. Rather than building a HPL machine I
       | would have preferred see public money spent in a different way.
        
         | ghc wrote:
         | It's a supercomputer. The programming model is very, very
         | different. The software stack is full of incredibly fragile
         | stuff from any number of manufacturers. It's honestly hard to
         | even describe how much more difficult using MPI with Fortran on
         | a supercomputer is compared to _anything_ I 've ever touched
         | elsewhere. Maybe factory automation comes close?
        
       | photochemsyn wrote:
       | I wonder if having one supercomputer with x number of chips or
       | having eight supercomputers each with x/8 number of chips would
       | be the more practical working setup. Weather forecasting for
       | example is basically a complex probabilistic algorithm, and
       | there's a notion that running eight models in parallel and then
       | comparing and contrasting the results will give better estimates
       | of actual outcomes than running one model on a much more powerful
       | machine.
       | 
       | Is it feasible to run eight models on one supercomputer, or is
       | that inefficient?
        
         | timbargo wrote:
         | You can partition a large compute cluster into many smaller
         | ones. Users can make a request specifying how many processors
         | they want for how long. Check out this link to see the activity
         | of a supercomputer at Argonne.
         | 
         | https://status.alcf.anl.gov/theta/activity
         | 
         | And I believe it is more efficient to have a single large
         | cluster. As there are large overheard costs of power, cooling,
         | and having a physical space to put the machine in. Plus a
         | personnel cost to maintain the machines.
        
         | derac wrote:
         | You can run many programs on one supercomputer simultaneously,
         | yes. Check out XSEDE. Cost-wise one big is going to be cheaper
         | than 8 small due to infrastructure issues - cooling,
         | maintenance, space, etc.
        
           | pphysch wrote:
           | "XSEDE" proper is getting EOL'd in a couple months and
           | transitioning to ACCESS [1].
           | 
           | [1] - https://www.hpcwire.com/off-the-wire/nsf-announces-
           | upcoming-...
        
       | marcusjramsey wrote:
       | hmm
        
       | dang wrote:
       | This reads more or less like a press release - (edit: actually,
       | it reads exactly like a press release) - is there a more
       | substantive article on the topic?
        
         | eslaught wrote:
         | It's not an article, but there's always the front page for the
         | supercomputer (includes some limited specs):
         | 
         | https://www.olcf.ornl.gov/frontier/
         | 
         | There's also detailed architecture specs on Crusher, an
         | identical (but smaller) system:
         | 
         | https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide...
        
       | peter303 wrote:
       | One petaflop DP linpack achieved in 2008. Supercomputing "Moores
       | Law" is doubling speed every 1.5 years, order of magnitude every
       | five years, a thousand-fold 15 years. Pretty close to schedule.
       | 
       | Onward to a zettaflop around 2037?
        
       | robswc wrote:
       | Seems I've heard nothing but good things about AMD for the last
       | 10 years or so.
       | 
       | I once had an terrible experience with AMD ~10 years ago that
       | made me swear off them for good. Had something to do with
       | software but I remember it taking several days of work/solutions.
       | 
       | Willing to give them another try soon though. I never seem to
       | even use the full power of whatever CPU I get, lol.
        
         | verst wrote:
         | Late 2020 I switched from Intel to AMD Ryzen 5900X for my
         | gaming PC and only had great experiences as far as gaming is
         | concerned.
         | 
         | I should point out that there were significant USB problems on
         | AMD B550, X570 chipsets (eventually addressed via BIOS
         | updates).
         | 
         | Unfortunately some professional audio gear is only certified
         | for use with Intel chipsets and I have experienced some deal-
         | breaking latency issues with ASIO drivers. For gaming I will be
         | happy to continue using AMD - but for music I will probably
         | switch back to Intel for my next rig.
        
           | robswc wrote:
           | That actually sucks because I do a lot of music stuff and
           | having any issues with ASIO would be a deal breaker. Thanks
           | for the heads up! One of those things I would have never even
           | thought of to check!
           | 
           | Also sums up my AMD experience 10 years ago. Stuff just
           | wasn't working :/
        
         | UberFly wrote:
         | Lisa Su joined AMD in 2012 and in 2017 the first Zen chips were
         | released. Good people making good decisions.
        
       | belter wrote:
       | Thank you to the authors for not calling it the fastest computer
       | in the world :-) and instead, as they should, the most powerful.
       | Clock speed is not the only factor of course, as instruction per
       | cycle and cache sizes have an impact, but for a pure measure of
       | speed, the fastest still is:
       | 
       | - For practical use, and non overclocked, the EC12 at 5.5 Ghz:
       | https://www.redbooks.ibm.com/redbooks/pdfs/sg248049.pdf
       | 
       | or
       | 
       | - An AMD FX-8370 floating in Liquid Nitrogen at 8.7 Ghz:
       | https://hwbot.org/benchmark/cpu_frequency/rankings#start=0#i...
        
         | formerly_proven wrote:
         | I guarantee you an FX-8370 isn't even close to being the
         | fastest CPU even at 10 GHz. I bet most desktop CPUs you can buy
         | nowadays will be faster out of the box.
        
           | belter wrote:
           | Tell me what your measure of fast is?
        
             | NavinF wrote:
             | Does it matter? A modern CPU at 5.5ghz will outperform an 8
             | year old CPU overlocked to 10ghz on just about any
             | reasonable workload even if it's single threaded.
        
               | postalrat wrote:
               | Do you have any data to back up your claims?
        
           | PartiallyTyped wrote:
           | It's embarrassing how slow that thing is compared to CPUs 2
           | years ago...
           | 
           | The video below compares 8150 against CPUs from 2020 (i.e. no
           | 5900x or 12900KS), includes data from 8370.
           | 
           | https://youtu.be/RpcDF-qQHIo?t=425
        
         | chrisseaton wrote:
         | > Clock speed
         | 
         | When people talk about a supercomputer being 'fast' they
         | generally mean FLOPS - floating point operations per seconds,
         | which isn't clock-speed.
        
           | belter wrote:
           | My algorithm is single threaded :-)
           | 
           | Multiplying the number of processors by the clock speed of
           | the processors, and then multiplying that product by the
           | number of floating-point operations the processors can
           | perform in one second, as done for supercomputers FLOPS, does
           | not help me :-)
        
             | blackoil wrote:
             | More than your algorithm, seems you are on the wrong
             | thread.
        
             | jabl wrote:
             | > My algorithm is single threaded :-)
             | 
             | And why should your algorithm be the benchmark for
             | supercomputer performance, rather than something that is at
             | least somewhat related [1] to the workloads those machines
             | run?
             | 
             | [1] We can of course argue endlessly that HPL is no longer
             | a very representative benchmark for supercomputer
             | workloads, but I digress.
        
               | belter wrote:
               | My initial argument since the beginning of this thread,
               | is that it's the most powerful computer not the fastest,
               | as it will not be, for the case for some single threaded
               | task. Not really sure what is so controversial about
               | it...:-)
        
               | chrisseaton wrote:
               | > as it will not be, for the case for some single
               | threaded task
               | 
               | Nobody but you is confused about this.
        
               | belter wrote:
               | It's not confusion, is about clarifying that "fast" is
               | contextual...
        
             | chrisseaton wrote:
             | Why would you run a single-threaded algorithm on a
             | supercomputer?
        
               | _Wintermute wrote:
               | You say this, but unfortunately I've encountered a few
               | life-scientists who think their single threaded R code
               | will run faster because they've requested 128 cores and 4
               | GPUs.
        
               | belter wrote:
               | Because some say they are fastest computers in the world
               | ;-)
        
               | chrisseaton wrote:
               | I think you're possibly misunderstanding what these
               | supercomputers are for. They just aren't designed for
               | whatever single-threaded workload you personally have, so
               | it's not in scope.
        
               | belter wrote:
               | It is clear for me what they are for, and why I would not
               | use it for a single-threaded task.
               | 
               | I was trolling a little bit, the people who downvoted my
               | measure of speed :-) because the millions of FLOPS of a
               | supercomputer, will help for parallel tasks but will not
               | be "faster" for a common use case.
               | 
               | So fastest computer is one thing, most powerful is
               | another.
        
               | O5vYtytb wrote:
               | "fastest" is accurate. You can get more computation work
               | done in less time given an appropriate workload. No
               | matter what adjective you use, "fastest" or "powerful",
               | you're always within a context of an intended workload.
               | 
               | Your argument is a bit like saying the fastest land speed
               | vehicle isn't really the fastest because you can't go to
               | the grocery store with it.
        
               | stonogo wrote:
               | You don't get to call the supercar slow because your
               | driver doesn't know how to change gears.
        
         | dekhn wrote:
         | Clock rates of CPUs are not a measure of "speed". Time to
         | solution is the measure of speed. There have historically been
         | computers with lower clock rates with higher rates of results
         | production (larger cache, more work done per cycle).
        
           | belter wrote:
           | That was why I mentioned cache and of course we could talk
           | MIPS.
        
             | dekhn wrote:
             | But those are only proxy variables to explain
             | "performance", or "throughput", or "latency". No doubt, if
             | I wanted a fast single machine, the two configs you showed
             | would both be nice- the former because it's an off-the-
             | shelf part that just "runs stuff faster" than most slower
             | processors, and the latter because it represents the limit
             | of what a person with some infrastructure can do (although,
             | TBH, I'd double check every result the system generated).
             | 
             | Ultimately, however, no system is measured by its clock
             | rate- or by its cache size- or by its MIPS. Because no real
             | workload is truly determined by a simple linear function of
             | those variables.
        
               | belter wrote:
               | Agree. So because we have many parameters, and as master
               | of my universe, I selected the clock cycle as my measure
               | of fast :-)
               | 
               | Time to completion will depend on task.
        
         | babypuncher wrote:
         | Supercomputer power has been measured in FLOPS for decades now,
         | even in popular media coverage.
        
         | __alexs wrote:
         | Yes the fastest computers are those aboard the Parker Solar
         | Probe at 690,000 km/h.
        
           | belter wrote:
           | For those, the calculations need to included relativistic
           | effects into your algo:-) The Sun gravity affects clock
           | cycles, time relativistic distortions... :-)
           | 
           | https://physics.stackexchange.com/questions/348854/parker-
           | so...
        
       | briffle wrote:
       | What blows my mind is the newest NOAA super computer (that
       | triples the speed of the last one) is a whopping 12 petaflops. It
       | comes online this summer.
       | 
       | It kind of shows the difference in priority spending, when
       | nuclear labs get >1000 petaflop super computers, and the weather
       | service (that helps with disasters that affect many Americans
       | each year) gets a new one that is 1.2% of the speed.
       | 
       | https://www.noaa.gov/media-release/us-to-triple-operational-....
        
         | the_svd_doctor wrote:
         | DOE computers are used by a wide variety of
         | people/teams/projects, including academics and other
         | institutions though.
        
         | phonon wrote:
         | You're quite right.
         | 
         | "An estimate of future HPC needs should be both demand-based
         | and reasonable. From an operational NWP perspective, a four-
         | fold increase in model resolution in the next ten years
         | (sufficient for convection-permitting global NWP and kilometer-
         | scale regional NWP) requires on the order of 100 times the
         | current operational computing capacity. Such an increase would
         | imply NOAA needs a few exaflops of operational computing by
         | 2031. Exascale computing systems are already being installed at
         | Oak Ridge National Laboratory (1.5 exa floating point
         | operations per second (EF)) and Argonne Labs (1.0 EF) and it is
         | likely that these national HPC laboratories will approach 100
         | EF by 2031. Because HPC resources are essential to achieving
         | the outcomes discussed in this report, it is reasonable for
         | NOAA to aspire to a few percent of the computing capacity of
         | these other national labs at a minimum. Substantial investments
         | are also needed in weather research computing. To achieve a 3:1
         | ratio of research to operational HPC, NOAA will need an
         | additional 5 to 10 EF of weather research and development
         | computing by 2031. Since research computing generally does not
         | require high-availability HPC, it should cost substantially
         | less than operational HPC and should be able to leverage a
         | hybrid of outsourced, cloud and excess compute resources."[1]
         | 
         | [1]https://sab.noaa.gov/wp-content/uploads/2021/11/PWR-
         | Report_2...
        
         | mulmen wrote:
         | Would a faster computer improve outcomes for victims of natural
         | disaster? How much is left undiscovered about weather?
         | 
         | Research spending is based on the potential for discovery. As a
         | species we have studied weather since the beginning of time.
         | How long have we been doing nuclear research? A century?
         | 
         | Is there even an opportunity cost here? Or is it an economy of
         | scale? As we build more supercomputers the costs go down. So
         | NOAA and ORNL both get what they need for less.
        
           | aeroman wrote:
           | Although meteorology is in many ways a much older science, I
           | think you are underselling the difference (and importance of
           | computers here). Better computing power means a more accurate
           | forecast, but typically also a longer forecast horizon. That
           | is critical when preparing for natural disasters and
           | absolutely saves lives all the time.
           | 
           | Even at a 3-day lead time, GFS was still suggesting landfall
           | for hurricane Sandy outside the New York region, the longer
           | lead times provided by other centers (with more computing
           | power) were very important for preparation [1].
           | 
           | Even on the science side, increased computing power enables a
           | host of new discoveries. Even storing the locations for all
           | the droplets in a small cloud would require an excessive
           | amount of memory, let alone doing any processing [2].
           | Increased computer power enables us to better understand how
           | clouds respond to their environment, which is a key
           | uncertainty in predicting climate change.
           | 
           | Many disciplines of meteorology are also much newer than
           | nuclear physics. Cloud physics (for example) only really got
           | started with the advent of weather radar (so the 1940s).
           | Before that, even simple questions (such as can a cloud
           | without any ice in it produce rain?) were unknown.
           | 
           | Even today, we still have difficulty seeing into the most
           | intense storms. You cannot fly an aircraft in there, and
           | radar has difficulty distinguishing different types of
           | particle (ice, liquid, mushy ice, ice with liquid on the
           | surface, snow) and is not good at coutning the number of
           | particles either.
           | 
           | Even after thousands of years, we are onlyjust now getting
           | the tools to understand it. There is a lot left to discover
           | about the weather!
           | 
           | [1] - https://agupubs.onlinelibrary.wiley.com/doi/full/10.100
           | 2/201...
           | 
           | [2] - https://www.cloudsandclimate.com/blog/clouds_and_climat
           | e/#id...
        
           | Kuinox wrote:
           | > Erik P. DeBenedictis of Sandia National Laboratories has
           | theorized that a zettaFLOPS (1021 or one sextillion FLOPS)
           | computer is required to accomplish full weather modeling,
           | which could cover a two-week time span
           | accurately.[121][122][123] Such systems might be built around
           | 2030.
           | 
           | https://en.wikipedia.org/wiki/Supercomputer
        
           | ip26 wrote:
           | One of the more commonly discussed values is predicting where
           | a major hurricane makes landfall. We can't reliably do that
           | yet, but if we could, evacuation zones would be both smaller
           | & more effective.
        
           | Twirrim wrote:
           | > Would a faster computer improve outcomes for victims of
           | natural disaster? How much is left undiscovered about
           | weather?
           | 
           | The US is way behind on weather modelling, in part due to
           | lack of computing power available to do the grids at
           | sufficiently small cells compared to Europe and other parts
           | of the world. That means less accurate predictions and less
           | advance notice of impending disasters, which means more risk
           | of loss of life and impact on infrastructure and the economy
           | (and vice versa, inaccuracy can lead to more caution than is
           | necessary, which has economic impact too). The US has to lean
           | on Europe etc. for predictions.
           | 
           | https://cliffmass.blogspot.com/2020/02/smartphone-weather-
           | ap...
           | 
           | Talks about the fact that IBM / Weather.com actually uses a
           | more accurate system than the NWS uses, because the NWS is
           | still stuck on GFS (been several years now since congress
           | passed an act to force NOAA to update away from it, and
           | unfortunately it takes time)
        
             | studmuffin650 wrote:
             | I've heard that was the case with the old GFS model. They
             | just updated the GFS model in 2021 to provide higher
             | accuracy: https://www.noaa.gov/media-release/noaa-upgrades-
             | flagship-us....
             | 
             | I'm not entirely sure how it compared the ECMWF model
             | during last years hurricane season, but I do think its
             | improved substantially.
        
             | jsmith99 wrote:
             | For comparison, the UK government Met Office installed a
             | similar sized cluster of Cray XC40 machines about 6 years
             | ago, with a 60 petaflop replacement arriving this year.
             | Their forecasts are, anecdotally, locally considered a bit
             | rubbish though.
        
             | quanto wrote:
             | This is an interesting claim. Could you share a reputable
             | source on your claim that the US weather prediction
             | facility is behind its European counterparts? How does the
             | US depend on Europe for weather predictions?
        
         | arinlen wrote:
         | > _(..) when nuclear labs get >1000 petaflop super computers
         | (..)_
         | 
         | Would you prefer the research being performed based on
         | empirical testing instead of running simulations?
        
         | jcranmer wrote:
         | The national labs aren't purely--or likely even mostly--
         | dedicated to nuclear research. Instead, they cover a lot of the
         | basic science research. These supercomputers will likely be
         | used for projects like exploring cosmological models, or
         | studying intramolecular interactions for chemical compounds, or
         | fine-tuning predictions about properties of the top quark, etc.
        
           | gibolt wrote:
           | Because these are so linked to research, everyone and their
           | cousin is vying for time on them. Even though it may be
           | massive, no individual will get anywhere near peak.
        
             | convolvatron wrote:
             | usually on the DOE machines some time is reserved for 'hero
             | runs', but its generally only the classified stockpile work
             | that qualifies
        
           | gh02t wrote:
           | Oak Ridge in particular is in DoE Office of Science. They do
           | some national security work, but their primary focus is basic
           | science. Some of the national labs _do_ primarily do nuclear
           | weapons related research, but not Oak Ridge. Frontier is only
           | doing unclassified work, primarily basic science and
           | engineering.
        
           | briffle wrote:
           | I didn't know that, thank you for clarifying for me!
        
         | irfn wrote:
         | I am curious as to what class of problems are being solved on
         | these super computers. Also whats the abstraction of
         | computation here. Is it a container :-t :-t :-t
        
           | convolvatron wrote:
           | no, its a gang scheduled process - at least that's been the
           | standard model. those processes are run either close to bare
           | metal or as a process on linux. containers would be useful to
           | package up the shared dependencies, so that may have changed.
        
           | chomp wrote:
           | Weather modeling - X kilometers by Y layers of atmosphere can
           | get expensive to compute really quick. And NOAA does more
           | than just simulate weather, they're running climate/sea level
           | rise/arctic ice modelling, aggregating sensor data from
           | buoys/balloons/satellites, processing maps, and more.
           | 
           | I can't speak for NOAA, but my experience with supercomputing
           | has been that there is no abstraction of computation, your
           | workload is very much tied to hardware assumptions.
        
             | irfn wrote:
             | In my experience it's very hard to write code for parallel
             | compute workloads and I am guessing that half of the code
             | written would be creating abstractions about that.
        
           | jandrewrogers wrote:
           | They are used to do large-scale high-resolution analysis or
           | simulation of complex systems in the physical world. The
           | codes typically run on the bare metal with careful control of
           | resource affinity, often C++ these days.
           | 
           | They aren't just used for global-scale geophysical processes
           | like weather and climate or complex physics simulations. For
           | example, oil companies rent time to analytically reconstruct
           | the 3-dimensional structure of what's underneath the surface
           | of the Earth from seismic recordings.
        
         | systemvoltage wrote:
         | This is a weird take. There are so many things behind the
         | scenes to say anything conclusively. Different compute loads,
         | different problem domains, different accuracy and
         | predictability requirements, etc.
         | 
         | Cynicism is unwarranted, but it fits the current zeitgeist,
         | biases and feels good.
        
         | N1H1L wrote:
         | The nuclear lab computers are also rented out to anyone who
         | applies for an XSEDE grant. Anyone with a successful grant gets
         | free access (obviously limited to a reasonable core-hors).
         | Anyway, a ton of university researchers, all the way from
         | materials simulations to weather groups will be using this
         | computer to run their codes, as they have done for the last
         | ones too.
         | 
         | In fact, such use accounts for the vast majority of the compute
         | use.
        
       | jamesredd wrote:
       | China has two exaflop supercomputers. It's doubtful whether this
       | is the world's most powerful supercomputer.
       | 
       | https://www.nextplatform.com/2021/10/26/china-has-already-re...
        
         | ouid wrote:
         | I'm not really sure why you would trust this claim from China.
         | Its not impossible, but its also not impossible to lie about
        
       | anuvrat1 wrote:
       | Can someone please explain, how software is made at this scale?
        
         | sydthrowaway wrote:
         | Using a HPC Framework, such as OpenMP
        
         | mhh__ wrote:
         | Fairly low tech until you get to the super high end.
         | 
         | You have a blend of very specific domain specific knowledge
         | (e.g. they know the hardware - the interconnects more than the
         | CPUs) and old skool Unix system administration.
        
       ___________________________________________________________________
       (page generated 2022-05-31 23:01 UTC)