[HN Gopher] AMD powers the most powerful supercomputer
___________________________________________________________________
AMD powers the most powerful supercomputer
Author : lelf
Score : 172 points
Date : 2022-05-31 13:42 UTC (9 hours ago)
(HTM) web link (venturebeat.com)
(TXT) w3m dump (venturebeat.com)
| curiousgal wrote:
| How much of that performance will get undone by the software
| though? Either through AMD's lack of effort or Intel's compiler
| "sabotage".
| ghc wrote:
| It probably won't be a factor. The likelihood of the system
| using standard compilers or drivers is quite low. It's non-
| trivial to optimize a compiler and drivers for a supercomputer,
| so companies like Cray make their own.
| mrb wrote:
| The HN crowd would probably prefer reading the many technical
| details at the ORNL press release:
| https://www.ornl.gov/news/frontier-supercomputer-debuts-worl...
| which I just submitted here:
| https://news.ycombinator.com/item?id=31573066
|
| Also, yesterday Tom's hardware had a detailed article:
| https://www.tomshardware.com/news/amd-powered-frontier-super...
| 29 MW total, 400 kW per rack(!)
|
| And, anyone else is like me and wants to see actual pictures or
| videos of the supercomputer, instead of a rendering like in
| venturebeat article? Well, head here, ORNL has a very short
| video: https://www.youtube.com/watch?v=etVzy1z_Ptg We can see
| among other things: that it's water-cooled (the blue and red
| tubing), at 0m3s we see a PCB labelled "Cray Inc Proprietary ...
| Sawtooth NIC Mezzanine Card"
| pvg wrote:
| Not much point submitting a dupe with the discussion already on
| the front page but you can email your better links to the mods
| who are looking for better a better link:
|
| https://news.ycombinator.com/item?id=31571551
| mihaic wrote:
| The more powerful processors become, the less I feel there's a
| need to build supercomputers.
|
| Thinking about it, the most powerful supercomputer in the world
| is pretty much a million consumer processors, working in
| parallel. That's going to stay pretty constant, since cost scales
| roughly linearly.
|
| If X is the processing power of $1k of consumer hardware, the
| bigger X gets, the less there is a difference in the class of
| problems that you can solve with X or X * 1e6 processing power.
| uniqueuid wrote:
| Sure, but consumer hardware does not have infiniband or other
| high-bandwidth interconnects. That means you can have at most
| ~1-2TB of ram accessible at any point. Some problems need
| coordination, and when you're back at OpenMP etc., a
| supercomputer suddenly makes sense.
| mihaic wrote:
| I agree right now, I'm thinking maybe in 15 years you can
| have >1PB on a single machine, and then those problems that
| don't fit in that space but that fit in a supercomputer
| become fewer. 2050 will be within out lifetime.
|
| Basically I'm estimating the benefit ratio to be (log
| SupercomputerSize - log ConsumerSize)/log ConsumerSize, and
| that keeps decreasing.
| uniqueuid wrote:
| You're not wrong.
|
| The set of problems that fit into a single node is growing.
| At least in some fields where the added benefit of more
| data is less important than, say, more precise
| measurements.
| hdjjhhvvhga wrote:
| By the way, while cost may scale linearly, the number of cores
| doesn't[0]. We have more powerful computers in our pockets than
| Cray supercomputers from the 80s. And I feel we still haven't
| learned how to use these cores in an efficient way.
|
| [0] https://i.imgur.com/Gad4cKk.png
| mastax wrote:
| The coherent memory interconnects between nodes is typically
| what makes supercomputers different than just a bunch of
| consumer hardware. It allows different types of programming or
| at least makes them easier.
| jabl wrote:
| It's a very fast, very low latency network fabric. But it's
| not coherent in the sense of cache coherent multiprocessors,
| and it doesn't offer shared memory style programming where
| you'd just load/store to addresses that happen to be mapped
| to another compute node somewhere in the system.
| l33t2328 wrote:
| I thought DMI allowed for exactly those kinds of load/store
| operations
| xhkkffbf wrote:
| If you think of it this way, aren't some botnets truly the most
| powerful computing systems?
| uniqueuid wrote:
| Since they are using AMD's accelerators as well [1], I do wonder
| whether any usage of these will trickle down and give us
| improvements in ROCm.
|
| Surely the people at these labs will want to run ordinary DL
| frameworks at some point - or do they have the money and time to
| always build entirely custom stacks?
|
| [1] AMD Instinct MI250x in this case.
| mastax wrote:
| These supercomputer contracts typically have a large amount
| dedicated to software support. I remember reading on AnandTech
| (?) that AMD was explicitly putting a bunch of engineers on
| ROCm for this project. It's one of the reason companies like
| these contracts so much.
| pinhead wrote:
| Surprisingly, ROCm support has been getting a lot better over
| the very recent years. In my experience the pytorch support is
| essentially seamless between CUDA and ROCm. Also, I know some
| popular frameworks like DeepSpeed have announced support and
| benchmarks on it as well:
| https://cloudblogs.microsoft.com/opensource/2022/03/21/suppo...
| dragontamer wrote:
| > Surely the people at these labs will want to run ordinary DL
| frameworks at some point
|
| I don't know about that. A lot of these labs are doing physics
| simulations and are probably happy to stick with their dense-
| matrix multiply / BLAS routines.
|
| Deep learning is a newer thing. These national labs can run
| them of course, but these national labs have existed for many
| decades and have plenty of work to do without deep learning.
|
| > or do they have the money and time to always build entirely
| custom stacks?
|
| Given all the talk about OpenMP compatibility and Fortran... my
| guess is that they're largely running legacy code in Fortran.
|
| Perhaps some new researchers will come in and try to get some
| deep-learning cycles in the lab and try something new.
| marcosdumay wrote:
| > Given all the talk about OpenMP compatibility and
| Fortran... my guess is that they're largely running legacy
| code in Fortran.
|
| The must used linear algebra library is written in Fortran.
| There's nothing "legacy" about it, it's just that nobody was
| able to replicate its speed in C.
| dragontamer wrote:
| BLAS itself has been rewritten in Nvidia CUDA and AMD HIP,
| and is likely the workhorse in this case. (Remember that
| Frontier is mostly GPUs and the bulk of code should be GPU
| compatible)
|
| Presumably that old Fortran code has survived many
| generations of ports: Connection Machine, DEC Alpha, Intel
| Itanium, SPARC and finally today's GPU heavy systems. The
| BLAS layer keeps getting rewritten but otherwise the bulk
| of the simulators still works.
| nspattak wrote:
| If you are talking about netlib blas/lapack I am very
| confused by what you are saying because the fastest
| blas/lapack implementations are in c/c++.
| jcranmer wrote:
| > The must used linear algebra library is written in
| Fortran.
|
| My understanding is that most supercomputers have the
| vendor provide their implementation of BLAS (e.g., if it's
| Intel-based, you're getting MKL) that's specifically tuned
| for that hardware. And these implementations stand a decent
| chance of being written in _assembly_ , not Fortran.
| bee_rider wrote:
| Usually C or Fortran superstructure, and assembly
| kernels.
|
| The clearest form of this is in BLIS, which is a C
| framework you can drop your assembly kernel into, and
| then it makes a BLAS (along with some other stuff) for
| you. But the idea is also present in OpenBlas.
|
| Lots of this is due to the legacy of gotoBlas (which was
| forked into OpenBlas, and partially inspired BLIS),
| written by the somewhat famous (in HPC circles at least)
| Kazushige Goto. He works at Intel now, so probably they
| are doing something similar.
| bee_rider wrote:
| I think you've made a slightly bigger claim than is
| necessary, which has lead to a focus on BLAS, which misses
| the point.
|
| The _best_ BLAS libraries use C and Assembly. This is
| because BLAS is the de-facto standard interface for Linear
| Algebra code, and so it is worthwhile to optimize it to an
| extreme degree (given infinite programmer-hours, C can beat
| any language, because you can embed assembly in C).
|
| But for those numerical codes which aren't incredibly hand-
| optimized, Fortran makes nice assumptions, it should be
| able to optimize the output of a moderately skilled
| programmer pretty well (hey we aren't all experts, right?).
| paulmd wrote:
| I don't remember the exact specifics, but Fortran disallows
| some of the constructs that C/C++ struggle with aliasing
| on, so Fortran can often be (safely) optimized to much
| higher-performance code because of this
| limitation/knowledge.
|
| Like, it's always seemed like there's a certain amount of
| fatalism around Undefined Behavior in C/C++, like this is
| somehow how it has to be to write fast code but... it's
| not. You can just declare things as actually forbidden
| rather than just letting the compiler identify a boo-boo
| and silently do whatever the hell it wants.
|
| Of course it's not the right tool for every task, I don't
| think you'd write bit-twiddling microcontroller stuff in
| fortran, or systems programming. But for the HPC space, and
| other "scientific" code? Fortran is a good match and very
| popular despite having an ancient legacy even by C/C++
| standards (both have, of course, been updated through
| time). Little less flexible/general, but that allows less-
| skilled programmers (scientists are not good programmers)
| to write fast code without arcane knowledge of the gotchas
| of C/C++ compiler magic.
| jabl wrote:
| > I don't remember the exact specifics, but Fortran
| disallows some of the constructs that C/C++ struggle with
| aliasing on, so Fortran can often be (safely) optimized
| to much higher-performance code because of this
| limitation/knowledge.
|
| For a crude approximation, Fortran is somewhat equivalent
| to C code where all pointer function arguments are marked
| with the restrict keyword.
|
| > Like, it's always seemed like there's a certain amount
| of fatalism around Undefined Behavior in C/C++, like this
| is somehow how it has to be to write fast code but...
| it's not. You can just declare things as actually
| forbidden rather than just letting the compiler identify
| a boo-boo and silently do whatever the hell it wants.
|
| Well, it's kind more dangerous than C, in this aspect.
| The aliasing restriction is a restriction on the Fortran
| programmer; the compiler or runtime is not required to
| diagnose it, meaning that the Fortran compiler is allowed
| to optimize assuming that two pointers don't alias.
|
| That being said, in general I'd say Fortran has less
| footguns than C or C++, and is thus often a better choice
| for a domain expert that just wants to crunch numbers.
| jcranmer wrote:
| From my limited exposure to the HPC groups at the labs,
| there's a mixture of languages in use. It seems that modern
| C++ is the dominant language for a lot of new projects--some
| of the people I talked to were working on libraries that
| aggressively used C++11/C++14 features.
|
| The biggest challenge the national labs face is that there's
| not really any budget (or appetite) to _rewrite_ software to
| take advantage of hardware features (particularly the GPU-
| based accelerator that 's all the rage nowadays). You _might_
| be able to get a code rewritten once, but an era where every
| major HPC hardware vendor wants you to rewrite your code into
| their custom language for their custom hardware results in
| code that will not take advantage of the power of that custom
| hardware. OpenMP, being already fairly widespread, ends up
| becoming the easiest avenue to take advantage of that
| hardware with minimal rewriting of code (tuning a pragma
| doesn 't really count as rewriting).
| Symmetry wrote:
| Also, while NVidia has been adding extra AI acceleration to
| their chips AMD has been throwing in extra double precision
| resources that HPC generally requires. If you're training an
| AI rather than simulating the climate/a thermonuclear
| explosion/etc then you're probably better off using NVidia
| cards but AMD made the right technical investments to get
| these supercomputer contracts.
| dekhn wrote:
| It's kind of surprising that nvidia hasn't purchased AMD.
| It really feels like there's a single company between the
| two that would be truly effective- AMD for the classic CPU
| oomph, nvidia for the GPU oomph, combining their strengths
| in interconnects. It would be a player from the high-end PC
| to the supercomputer market, without even pretending to go
| for the low-power market (ARM).
| krylon wrote:
| Intel and AMD have a patent-licensing agreement where
| Intel licenses their x86 stuff to AMD, and AMD licenses
| their amd64 stuff to Intel. AFAIK, the moment AMD gets
| bought by another company, they can no longer use Intel's
| patents, and the moment _that_ happens, Intel can no
| longer use AMD 's patents. I'm not sure how much of
| x86/amd64 you can legally implement without infringing on
| any of these patents, but it might very well result in a
| _really_ awkward situation.
|
| Sure, the new owners could re-negotiate with Intel, and
| maybe nothing would change. But who knows? A combined
| AMD/nVidia might be a sufficient threat to Intel they
| might pull some desperate moves.
|
| (In some timeline, this turns out to be the boost that
| makes RISC-V the new "standard" ISA, but I am not so
| optimistic it is the one we live in.)
| ridgered4 wrote:
| AMD and Nvidia were in talks to merge at one point,
| apparently the talks fell apart because Nvidia's CEO
| insisted on being the new CEO of the combined company and
| AMD would have none of that. So they purchased ATI
| instead, probably overpaid for it and probably pushed the
| bulldozer concepept to hard in an effort to prove it was
| worth it after all.
|
| Nvidia actually used to develop chipsets for AMD
| processors include onboard GPUs, they did for Intel as
| well but they had a much more serious relationship with
| AMD in my estimation. This stopped with the ATI purchase
| since ATI is nvidia's main competitor the two companies
| stopped working together. Intel later killed all 3rd
| party chipset altogether and AMD had to do a lot of
| chipset work they weren't doing before.
|
| I sometimes wonder what would have happened if they had
| merged back then. I personally think a Jensen Huang run
| AMD would have done much better than AMD+ATI did in that
| era. I could easily see ATI having collapsed. What would
| the consoles use now? Would nvidia have been as
| aggressive as it has been without the strategic weakness
| of now controlling the platform it's products run on?
| paulmd wrote:
| I think based on recent history you can argue that NVIDIA
| is very aware of the potential anticompetitive actions
| that could result if they kill or even substantially pass
| AMD.
|
| There really used to be a lot of intra-generational
| tweaking and refinement, like if you look back at Maxwell
| there were really at least 3 and I suspect 4 total
| steppings of the maxwell architecture (GM107,
| GM204/GM200, and GM206 - and I suspect GM200 was a
| separate "stepping" too due to how much higher it clocks
| than GM204 - which is the opposite of what you'd expect
| from a big chip). Kepler had at least 4 major versions
| (GK1xx, GK110B, GK2xx, GK210), Fermi had at least 2
| (although that's where I'm no longer super familiar with
| the exact details).
|
| Anyway point is there used to be a _lot_ more intra-
| generational refinement, and I think that has largely
| stopped, it 's just thrown over the wall and done. And I
| think the reason for _that_ is that if NVIDIA really
| cranked full-steam ahead they 'd be getting far enough
| ahead of AMD to potentially start raising antitrust
| concerns. We are now in the era of "metered performance
| release", just enough to stay ahead of AMD but not enough
| to actually raise problems and get attention from
| antitrust regulators.
|
| Same thing for the choice of Samsung 8nm for Ampere and
| TSMC 12nm for Turing, while AMD was on TSMC 7nm for both
| of those. Sure, volume was a large part of that decision,
| but they're already matching AMD with a 1-node deficit
| (Samsung 8nm is a 10+, and the gap between 10 and TSMC 7
| is huge to begin with) and they were matching with a 1.5
| node deficit during the Turing generation (12FFN is a
| TSMC 16+ node - that is almost 2 full nodes to TSMC 7nm).
| They _cannot_ just make arbitrarily fast processors that
| dump on AMD, or regulators will get mad, so in that case
| they might as well optimize for cost and volume instead.
| If they had done a TSMC 7nm against RDNA1 they probably
| would be starting to get in that danger zone - I 'm sure
| they were watching it carefully during the Maxwell era
| too.
|
| (the people who imagined some giant falling-out between
| TSMC are pretty funny in hindsight. (A) NVIDIA still had
| parts at TSMC anyway, and (B) TSMC obviously couldn't
| have provided the same volume as Samsung did, certainly
| not at the same price, and volume ended up being a
| godsend during the pandemic shortages and mining. Yeah,
| shortages sucked, but they could still have been worse if
| NVIDIA was on TSMC and shipping half or 2/3rds of their
| current volume.)
|
| Of course now we may see that dynamic flip with AMD
| moving to MCM products earlier, or maybe that won't be
| for another year or so yet rumors are suggesting
| monolithic midrange chips will be AMD's first product. Or
| perhaps "monolithic", being technically MCM but with
| cache dies/IO dies rather than multiple compute dies. But
| with RDNA3 AMD is potentially poised to push NVIDIA a
| little bit, rather than just the controlled opposition
| we've seen for the past few generations, hence NVIDIA
| reportedly moving to TSMC N5P and going quite large with
| a monolithic chip to compete.
| jcranmer wrote:
| > It's kind of surprising that nvidia hasn't purchased
| AMD.
|
| One word: antitrust. The discrete GPU market these days
| consists of Nvidia and AMD, with Intel only just now
| dipping its toes into the market (I don't think there's
| anything saleable to retail customers yet). Nvidia buying
| AMD would make it a true monopoly in that market, and
| there's no way that would pass antitrust regulators.
| Nvidia recently tried to buy ARM, and even that
| transaction was enough for antitrust regulators to say
| no.
| [deleted]
| torrance wrote:
| I'm not using Frontier, but I am using Setonix which is a large
| AMD cluster being rolled out in Australia. All of AMD's
| teaching materials are about ROCm so this is very much how
| they're expecting it to be used.
|
| The real pain for us is that there's no decent consumer grade
| chips with ROCm compatibility for us to do development on. AMD
| have made it very clear they only care about the data centre
| hardware when it comes to ROCm, but I have no idea what kind of
| developer workflow they're expecting there.
| tormeh wrote:
| It's bare pickings, but there are chips: https://docs.amd.com
| /bundle/Hardware_and_Software_Reference_...
| uniqueuid wrote:
| Interesting. So what is your workflow right now?
| torrance wrote:
| Develop against CUDA locally. Port my kernels to ROCm, and
| occupy a whole HPC node for debugging and performance
| tuning for a week. It's terrible.
|
| Edit: I should say that their recommendation is to write
| the kernels in 'hip' which is supposed to be their cross
| device wrapper for both cuda or ROCm. I'm writing in Julia
| however so that's not possible.
| claforte wrote:
| The AMD software stack has been behind for a long time
| but I feel like we're finally catching up. I heard that
| HIP (and hopefully the rest of ROCM) is now supported on
| the RX6800XT consumer GPU... maybe that could help? BTW
| my team at AMD has been using Julia for ML workloads for
| a while. We should get in touch - maybe some of the
| lessons we learn can be useful to you. My email is
| claforte. The domain I'm sure you can guess. ;-)
| vchuravy wrote:
| If you are using Julia I would recommend looking at
| AMDGPU.jl and (pluging my own project here)
| KernelAbstractions.jl
| claforte wrote:
| BTW have you tried `KernelAbstractions.jl`? With it you
| can write code once that will run reasonably fast on AMD
| or NVIDIA GPUs or even on CPU. One of our engineers just
| started using it and is pleased with it - apparently the
| performance is nearly equivalent to native CUDA.jl or
| AMDGPU.jl, and the code is simpler.
| eslaught wrote:
| I'm surprised you're not using HIP? At least in my experience
| it seems like HIP is the go-to system for programming the AMD
| GPUs, in large part because of CUDA compatibility. You can
| mostly get things to work with a one-line header change [1].
|
| (I work for a DOE lab but views are my own, etc.)
|
| [1] As an example, see the approach in:
| https://github.com/flatironinstitute/cufinufft/pull/116
| pmarcelll wrote:
| HIP is just the programming language/runtime, ROCm is the
| whole software stack/platform.
| sorenjan wrote:
| Can you write SYCL code and compile it to ROCm for
| production?
| JonChesterfield wrote:
| The rocm stack will run on non-datacentre hardware in YMMV
| fashion. A lot of the llvm rocm development is done on
| consumer hardware, the rocm stack just isn't officially
| tested on gaming cards during the release cycle. In my
| experience codegen is usually fine and the Linux driver a bit
| version sensitive.
| tkinom wrote:
| NV21?
|
| https://www.phoronix.com/scan.php?page=news_item&px=Radeon-R.
| ..
| dragontamer wrote:
| Vega64 or Vega56 seems to work pretty well with ROCm in my
| experience.
|
| Hopefully AMD gets the Rx 6800xt working with ROCm
| consistently, but even then, the 6800xt is RDNA2, while the
| supercomputer Mx250x is closer to the Vega64 in more ways.
|
| So all in all, you probably want a Vega64, Radeon VII, or
| maybe an older MI50 for development purposes.
| slavik81 wrote:
| > Hopefully AMD gets the Rx 6800xt working with ROCm
| consistently
|
| I am a maintainer for rocSOLVER (the ROCm LAPACK
| implementation) and I personally own an RX 6800 XT. It is
| very similar to the officially supported W6800. Are there
| any specific issues you're concerned about?
|
| I know the software and I have the hardware. I'd be happy
| to help track down any issues.
| dragontamer wrote:
| That's good to hear.
|
| I might be operating off of old news. But IIRC, the 6800
| wasn't well supported when it first came out, and AMD
| constantly has been applying patches to get it up-to-
| speed.
|
| I wasn't sure what the state of the 6800 was (I don't own
| it myself), so I might be operating under old news. As I
| said a bit earlier, I use the Vega64 with no issues (for
| 256-thread workgroups. I do think there's some obscure
| bug for 1024-thread workgroups, but I haven't really been
| able to track it down. And sticking with 256-threads is
| better for my performance anyway, so I never really
| bothered trying to figure this one out)
| slavik81 wrote:
| Navi 21 launched in November 2020 but it only got
| official support with ROCm 5.0 in February 2022.
|
| With respect to your issue running 1024 threads per
| block, if you're running out of VGPRs, you may want to
| try explicitly specify the max threads per block as 1024
| and see if that helps. I recall that at one point the
| compiler was defaulting to 256 despite the default being
| documented as 1024.
| dragontamer wrote:
| The main issue I have with the idea of Navi 21 is that
| its a 32-wide warp, when CDNA2 (like MX250x) is 64-wide
| warp.
|
| Granted, RDNA and CDNA still have largely the same
| assembly language, so its still better than using say...
| NVidia GPUs. But I have to imagine that the 32-wide vs
| 64-wide difference is big in some use cases. In
| particular: low-level programs that use warp-level
| primitives, like DPP, shared-memory details and such.
|
| I assume the super-computer programmers want a cheap
| system to have under their desk to prototype code that's
| similar to the big MI250x system. Vega56/64 is several
| generations old, while 6800 xt is pretty different
| architecturally. It seems weird that they'd have to buy
| MI200 GPUs for this purpose, especially in light of
| NVidia's strategy (where A2000 nvidia could serve as a
| close replacement. Maybe not perfect, but closer to the
| A100 big-daddy than the 6800xt is to the big daddy
| MI250x).
|
| --------
|
| EDIT: That being said: this is probably completely moot
| for my own purposes. I can't afford an MI250x system at
| all. At best I'd make some kind of hand-built consumer
| rig for my own personal purposes. So 6800 xt would be all
| I personally need. VRAM-constraints feel quite real, so
| the 16GBs of VRAM at that price makes 6800xt a very
| pragmatic system for personal use and study.
| eslaught wrote:
| Yes, DOE is very interested in DL. I don't work on this
| personally, but you can see an example e.g. here [1, 2]. You
| can see in the first link they're using Keras. I'm not up to
| date on all the details (again, don't work on this personally)
| but in general the project is commissioned to run on all of
| DOE's upcoming supercomputers, including Frontier.
|
| [1]: https://github.com/ECP-CANDLE/Benchmarks
|
| [2]: https://www.exascaleproject.org/research-project/candle/
| JonChesterfield wrote:
| The rocm stack is one of the toolchains deployed on Frontier.
| With determination, llvm upstream and rocm libraries can be
| manually assembled into a working toolchain too. It's not so
| much trickle down improvements as the same code.
| scardycat wrote:
| Congratulations to AMD, HPE and ORNL! This is an amazing
| achievement. Can't wait to see the spectacular science results
| coming from this installation.
|
| Intel was supposed to build the first Exascale system for ANL [1]
| [2]. to be installed by 2018. They completely and utterly messed
| up the execution, partly drive by 10nm failure, went back to the
| drawing board multiple times, and now Raja switched the whole
| thing to GPUs, a technology that Intel has no previous success
| with and rebased it to 2 ExaFlops peak, meaning they probably
| expect 1 EF sustained performance, a 50% efficiency. No other
| facility would ever consider Intel as a prime contractor again.
| ANL hitched their wagon to the wrong horse.
|
| 1. https://www.alcf.anl.gov/aurora 2.
| https://insidehpc.com/2020/08/exascale-exasperation-why-doe-...
| throwawaylinux wrote:
| What is Raja?
| jcranmer wrote:
| Raja is the head of GPU development at Intel.
| maxwell86 wrote:
| A person that works at Intel.
| Bayart wrote:
| Raja Koduri, the head of Graphics at Intel. Before that he
| was leading the Radeon group at AMD. He's been doing GPU
| stuff since the 90s.
| interesting_pt wrote:
| I worked at Intel in a very closely related area.
|
| I quit after getting vaccinated for COVID, only stayed because
| of the pandemic.
|
| The biggest problem was that Intel simply couldn't execute.
| They couldn't design and manufacture hardware in a timely
| manner without too many bugs. I think this was due to poor
| management practices. My direct manager was amazing, but my
| skiplevel was always dealing with fires. It felt like instead
| of the effort being orchestrated that someone approached a
| crowd of engineers and used a bullhorn to tell them the big
| goal and that was it. The left hand had no idea what the right
| hand was doing.
|
| I often called Intel an 'ant hill', because the engineers would
| swarm a project just like ants do a meal. Some would get there
| and pull the project forward, some would get on top and
| uselessly pull upward, and more than I'd like would get behind
| the project and pull it backwards. Just a mindless swarm of
| effort, which generally inefficiently kinda did the right thing
| sometimes.
|
| The inability to execute started to effect my work. When I got
| a ticket to complete something, I just wouldn't. There was a
| very good chance that I'd have an extra few weeks (due to
| slippage) or the task would never need to get done, because the
| hardware would never appear. Planning was impossible.
|
| Conversely, sometimes hardware _CAME OUT OF NOWHERE_ , not
| simple stuff, but stuff like laptops made by partners. Just
| randomly my manager would ask me to support a product we were
| told directly wouldn't exist, but now did. I needed to help our
| partner with support right now. Our partners were starting to
| hate us and it was palpable in meetings.
|
| I'm so glad I quit, I was being worked to the bone on a project
| which will probably fail and be a massive liability. Even if
| the economy crashes, and I can't get a job for years, and end
| up broke, it'll still have been worth it. I also only made
| 110K/yr base.
| allie1 wrote:
| I've been reading about Pat Gelsinger turning things around
| on execution, but many of the announced products for this
| year are already late (Sapphire Rapids, GPUs, even alder lake
| roll out was late).
|
| Do you know if anything has changed at Intel? Is it
| reasonable to expect changes within a year and a half of
| starting on the job given the size of the company and the
| changes needed?
| moffkalast wrote:
| Now the real question: Can it run Crysis... without hardware
| acceleration?
| cesarb wrote:
| > Can it run Crysis... without hardware acceleration?
|
| I understand you are joking, but it's a legitimate benchmark,
| one which I've seen at least Anandtech using. For instance, a
| quick web search found an article from last year
| (https://www.anandtech.com/show/16478/64-cores-of-
| rendering-m...) which shows an AMD CPU (a Ryzen 9) running
| Crysis without hardware acceleration at 1080p at nearly 20 FPS.
| As that article says, it's hard to go much higher than that,
| due to limitations of the Crysis engine.
| gsibble wrote:
| What an incredible achievement. Good for AMD. The Epyc is a
| fantastic processor.
|
| And there are another 2 (3?) faster systems coming online in the
| next year or so.
| adrian_b wrote:
| Besides being the first system exceeding the 1 Exaflop/s
| threshold, what is more impressive is that this is also the
| system with the highest ratio between computational speed and
| power consumption (i.e. the AMD devices have the first place in
| both Top500 and Green500).
|
| The AMD GPUs with the CDNA ISA have surpassed in energy
| efficiency both the NVIDIA A100 GPUs and the Fujitsu ARM with
| SVE CPUs, which had been the best previously.
|
| Unfortunately, AMD has stopped selling at retail such GPUs
| suitable for double-precision computations.
|
| Until 5 or 6 years ago, the AMD GPUs were neither the fastest
| nor the most energy-efficient, but they had by far the best
| performance per dollar of any devices that could be used for
| double-precision floating-point computations.
|
| However, when they have made the transition to RDNA, they have
| separated their gaming and datacenter GPUs. The former are
| useless for DP computations and the latter cannot be bought by
| individuals or small companies.
| Const-me wrote:
| > The former are useless for DP computations
|
| Looking at "double-precision GFlops" columns there [1] they
| don't seem terribly bad, more than twice as fast compared to
| similar nVidia chips [2]
|
| While specialized extremely expensive GPUs from both vendors
| are way faster with many TFlops of FP64 compute throughput, I
| wouldn't call high-end consumer GPUs useless for FP64
| workloads.
|
| The compute speed is not terribly bad, and due to some
| architectural features (ridiculously high RAM bandwidth, RAM
| latency hiding by switching threads) in my experience they
| can still deliver a large win compared to CPUs of comparable
| prices, even in FP64 tasks.
|
| [1]
| https://en.wikipedia.org/wiki/Radeon_RX_6000_series#Desktop
|
| [2] https://en.wikipedia.org/wiki/GeForce_30_series#GeForce_3
| 0_(...
| guenthert wrote:
| SP is sixteen times the performance of DP here for no other
| reasons then market segmentation. Nvidia might have started
| that, but that's no reason not to call AMD out for it.
| adrian_b wrote:
| "Useless" means that both DP Gflops/s/W and DP Gflops/s/$
| are worse for the modern AMD and NVIDIA gaming GPUs, than
| for many CPUs, so the latter are a better choice for such
| computations.
|
| The opposite relationship between many AMD GPUs and the
| available CPUs was true until 5-6 years ago, while NVIDIA
| had reduced the DP computation abilities of their non-
| datacenter GPUs many years before AMD, despite their
| previous aggressive claims about GPGPU being the future of
| computation, which eventually proved to be true only for
| companies and governments with exceedingly deep pockets.
| Const-me wrote:
| My desktop PC has Ryzen 7 5700G, on paper it can do 486
| GFlops FP64 (8 cores at 3.8 GHz base frequency, two
| 4-wide FMAs every cycle). However, that would require
| 2TB/sec memory bandwidth, while the actual figure is 51
| GB/second of that bandwidth. For large computational
| tasks where the source data doesn't fit in caches, the
| CPU can only achieve a small fraction of the theoretical
| peak performance 'coz bottlenecked by memory.
|
| The memory in graphics cards is an order of magnitude
| faster, my current one has 480 GB/sec of that bandwidth.
| For this reason, even gaming GPUs can be much faster than
| CPUs on some workloads, despite the theoretical peak FP64
| GFlops number is about the same.
| adrian_b wrote:
| You are right that there are problems whose solving speed
| is limited by the memory bandwidth, and for such problems
| GPUs may be better than CPUs.
|
| Nevertheless, many of the problems of this kind require
| more memory than the 8 GB or 16 GB that are available on
| cheap GPUs, so the CPUs remain better for those.
|
| On the other hand, there are a lot of problems whose
| time-consuming part can be reduced to multiplications of
| dense matrices. During the solution of all such problems,
| the CPUs will reach a large fraction of their maximum
| computational speed, regardless whether the operands fit
| in the caches or not (when they do not fit, the
| operations can be decomposed into sub-operations on
| cache-sized blocks, and in such algorithms the cache
| lines are reused enough times so that the time used for
| transfers does not matter).
| Const-me wrote:
| I guess I was lucky with the CAM/CAE software I'm working
| on. We don't have too many GB of data, the stuff fits in
| VRAM of inexpensive consumer cards.
|
| One typical problem is multiplying dense vector by a
| sparse matrix. Unlike multiplication of two dense
| matrices, I don't think it's possible to decompose into
| manageable pieces which would fit into caches to saturate
| the FP64 math of the CPU cores.
|
| We have tested our software on nVidia Teslas in a cloud
| (the expensive ones with many theoretical TFlops of FP64
| compute), the performance wasn't too impressive.
| visarga wrote:
| Computational speed is important, but more important is the
| data transfer speed. At least in ML. Is AMD the best for data
| transfer speed?
| gigatexal wrote:
| I am still kicking myself every time I look at AMD's share price.
| I sold a not-insignificant-to-me amount of shares when the price
| was basically below 10 a share. Now it's above 100. All this is
| to say that the turn around at AMD is good to see and the
| missteps at Intel are hilarious.
|
| This is like the time the Athlon64 and it's on die memory
| controller was kicking the Pentiums around.
| edm0nd wrote:
| Now would be a pretty decent time to buy back in if you still
| wanna go long on AMD again.
| gigatexal wrote:
| I did a few weeks ago. It's the only thing other than Nvidia
| that is up in my portfolio right now, lol.
| SoftTalker wrote:
| Since Cray stopped making their own CPUs, they have been back and
| forth between AMD and Intel several times.
| wmf wrote:
| It's not really back and forth; Cray supports Intel, AMD, and
| ARM CPUs equally as well as Nvidia, AMD, and Intel GPUs.
| zepmck wrote:
| The most powerful and unfortunately unusable supercomputer of the
| world. AMD's approach to GPUs is on a failing track since its
| inception. The only software stack available is super fragile,
| buggy and barely supported. Rather than building a HPL machine I
| would have preferred see public money spent in a different way.
| ghc wrote:
| It's a supercomputer. The programming model is very, very
| different. The software stack is full of incredibly fragile
| stuff from any number of manufacturers. It's honestly hard to
| even describe how much more difficult using MPI with Fortran on
| a supercomputer is compared to _anything_ I 've ever touched
| elsewhere. Maybe factory automation comes close?
| photochemsyn wrote:
| I wonder if having one supercomputer with x number of chips or
| having eight supercomputers each with x/8 number of chips would
| be the more practical working setup. Weather forecasting for
| example is basically a complex probabilistic algorithm, and
| there's a notion that running eight models in parallel and then
| comparing and contrasting the results will give better estimates
| of actual outcomes than running one model on a much more powerful
| machine.
|
| Is it feasible to run eight models on one supercomputer, or is
| that inefficient?
| timbargo wrote:
| You can partition a large compute cluster into many smaller
| ones. Users can make a request specifying how many processors
| they want for how long. Check out this link to see the activity
| of a supercomputer at Argonne.
|
| https://status.alcf.anl.gov/theta/activity
|
| And I believe it is more efficient to have a single large
| cluster. As there are large overheard costs of power, cooling,
| and having a physical space to put the machine in. Plus a
| personnel cost to maintain the machines.
| derac wrote:
| You can run many programs on one supercomputer simultaneously,
| yes. Check out XSEDE. Cost-wise one big is going to be cheaper
| than 8 small due to infrastructure issues - cooling,
| maintenance, space, etc.
| pphysch wrote:
| "XSEDE" proper is getting EOL'd in a couple months and
| transitioning to ACCESS [1].
|
| [1] - https://www.hpcwire.com/off-the-wire/nsf-announces-
| upcoming-...
| marcusjramsey wrote:
| hmm
| dang wrote:
| This reads more or less like a press release - (edit: actually,
| it reads exactly like a press release) - is there a more
| substantive article on the topic?
| eslaught wrote:
| It's not an article, but there's always the front page for the
| supercomputer (includes some limited specs):
|
| https://www.olcf.ornl.gov/frontier/
|
| There's also detailed architecture specs on Crusher, an
| identical (but smaller) system:
|
| https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide...
| peter303 wrote:
| One petaflop DP linpack achieved in 2008. Supercomputing "Moores
| Law" is doubling speed every 1.5 years, order of magnitude every
| five years, a thousand-fold 15 years. Pretty close to schedule.
|
| Onward to a zettaflop around 2037?
| robswc wrote:
| Seems I've heard nothing but good things about AMD for the last
| 10 years or so.
|
| I once had an terrible experience with AMD ~10 years ago that
| made me swear off them for good. Had something to do with
| software but I remember it taking several days of work/solutions.
|
| Willing to give them another try soon though. I never seem to
| even use the full power of whatever CPU I get, lol.
| verst wrote:
| Late 2020 I switched from Intel to AMD Ryzen 5900X for my
| gaming PC and only had great experiences as far as gaming is
| concerned.
|
| I should point out that there were significant USB problems on
| AMD B550, X570 chipsets (eventually addressed via BIOS
| updates).
|
| Unfortunately some professional audio gear is only certified
| for use with Intel chipsets and I have experienced some deal-
| breaking latency issues with ASIO drivers. For gaming I will be
| happy to continue using AMD - but for music I will probably
| switch back to Intel for my next rig.
| robswc wrote:
| That actually sucks because I do a lot of music stuff and
| having any issues with ASIO would be a deal breaker. Thanks
| for the heads up! One of those things I would have never even
| thought of to check!
|
| Also sums up my AMD experience 10 years ago. Stuff just
| wasn't working :/
| UberFly wrote:
| Lisa Su joined AMD in 2012 and in 2017 the first Zen chips were
| released. Good people making good decisions.
| belter wrote:
| Thank you to the authors for not calling it the fastest computer
| in the world :-) and instead, as they should, the most powerful.
| Clock speed is not the only factor of course, as instruction per
| cycle and cache sizes have an impact, but for a pure measure of
| speed, the fastest still is:
|
| - For practical use, and non overclocked, the EC12 at 5.5 Ghz:
| https://www.redbooks.ibm.com/redbooks/pdfs/sg248049.pdf
|
| or
|
| - An AMD FX-8370 floating in Liquid Nitrogen at 8.7 Ghz:
| https://hwbot.org/benchmark/cpu_frequency/rankings#start=0#i...
| formerly_proven wrote:
| I guarantee you an FX-8370 isn't even close to being the
| fastest CPU even at 10 GHz. I bet most desktop CPUs you can buy
| nowadays will be faster out of the box.
| belter wrote:
| Tell me what your measure of fast is?
| NavinF wrote:
| Does it matter? A modern CPU at 5.5ghz will outperform an 8
| year old CPU overlocked to 10ghz on just about any
| reasonable workload even if it's single threaded.
| postalrat wrote:
| Do you have any data to back up your claims?
| PartiallyTyped wrote:
| It's embarrassing how slow that thing is compared to CPUs 2
| years ago...
|
| The video below compares 8150 against CPUs from 2020 (i.e. no
| 5900x or 12900KS), includes data from 8370.
|
| https://youtu.be/RpcDF-qQHIo?t=425
| chrisseaton wrote:
| > Clock speed
|
| When people talk about a supercomputer being 'fast' they
| generally mean FLOPS - floating point operations per seconds,
| which isn't clock-speed.
| belter wrote:
| My algorithm is single threaded :-)
|
| Multiplying the number of processors by the clock speed of
| the processors, and then multiplying that product by the
| number of floating-point operations the processors can
| perform in one second, as done for supercomputers FLOPS, does
| not help me :-)
| blackoil wrote:
| More than your algorithm, seems you are on the wrong
| thread.
| jabl wrote:
| > My algorithm is single threaded :-)
|
| And why should your algorithm be the benchmark for
| supercomputer performance, rather than something that is at
| least somewhat related [1] to the workloads those machines
| run?
|
| [1] We can of course argue endlessly that HPL is no longer
| a very representative benchmark for supercomputer
| workloads, but I digress.
| belter wrote:
| My initial argument since the beginning of this thread,
| is that it's the most powerful computer not the fastest,
| as it will not be, for the case for some single threaded
| task. Not really sure what is so controversial about
| it...:-)
| chrisseaton wrote:
| > as it will not be, for the case for some single
| threaded task
|
| Nobody but you is confused about this.
| belter wrote:
| It's not confusion, is about clarifying that "fast" is
| contextual...
| chrisseaton wrote:
| Why would you run a single-threaded algorithm on a
| supercomputer?
| _Wintermute wrote:
| You say this, but unfortunately I've encountered a few
| life-scientists who think their single threaded R code
| will run faster because they've requested 128 cores and 4
| GPUs.
| belter wrote:
| Because some say they are fastest computers in the world
| ;-)
| chrisseaton wrote:
| I think you're possibly misunderstanding what these
| supercomputers are for. They just aren't designed for
| whatever single-threaded workload you personally have, so
| it's not in scope.
| belter wrote:
| It is clear for me what they are for, and why I would not
| use it for a single-threaded task.
|
| I was trolling a little bit, the people who downvoted my
| measure of speed :-) because the millions of FLOPS of a
| supercomputer, will help for parallel tasks but will not
| be "faster" for a common use case.
|
| So fastest computer is one thing, most powerful is
| another.
| O5vYtytb wrote:
| "fastest" is accurate. You can get more computation work
| done in less time given an appropriate workload. No
| matter what adjective you use, "fastest" or "powerful",
| you're always within a context of an intended workload.
|
| Your argument is a bit like saying the fastest land speed
| vehicle isn't really the fastest because you can't go to
| the grocery store with it.
| stonogo wrote:
| You don't get to call the supercar slow because your
| driver doesn't know how to change gears.
| dekhn wrote:
| Clock rates of CPUs are not a measure of "speed". Time to
| solution is the measure of speed. There have historically been
| computers with lower clock rates with higher rates of results
| production (larger cache, more work done per cycle).
| belter wrote:
| That was why I mentioned cache and of course we could talk
| MIPS.
| dekhn wrote:
| But those are only proxy variables to explain
| "performance", or "throughput", or "latency". No doubt, if
| I wanted a fast single machine, the two configs you showed
| would both be nice- the former because it's an off-the-
| shelf part that just "runs stuff faster" than most slower
| processors, and the latter because it represents the limit
| of what a person with some infrastructure can do (although,
| TBH, I'd double check every result the system generated).
|
| Ultimately, however, no system is measured by its clock
| rate- or by its cache size- or by its MIPS. Because no real
| workload is truly determined by a simple linear function of
| those variables.
| belter wrote:
| Agree. So because we have many parameters, and as master
| of my universe, I selected the clock cycle as my measure
| of fast :-)
|
| Time to completion will depend on task.
| babypuncher wrote:
| Supercomputer power has been measured in FLOPS for decades now,
| even in popular media coverage.
| __alexs wrote:
| Yes the fastest computers are those aboard the Parker Solar
| Probe at 690,000 km/h.
| belter wrote:
| For those, the calculations need to included relativistic
| effects into your algo:-) The Sun gravity affects clock
| cycles, time relativistic distortions... :-)
|
| https://physics.stackexchange.com/questions/348854/parker-
| so...
| briffle wrote:
| What blows my mind is the newest NOAA super computer (that
| triples the speed of the last one) is a whopping 12 petaflops. It
| comes online this summer.
|
| It kind of shows the difference in priority spending, when
| nuclear labs get >1000 petaflop super computers, and the weather
| service (that helps with disasters that affect many Americans
| each year) gets a new one that is 1.2% of the speed.
|
| https://www.noaa.gov/media-release/us-to-triple-operational-....
| the_svd_doctor wrote:
| DOE computers are used by a wide variety of
| people/teams/projects, including academics and other
| institutions though.
| phonon wrote:
| You're quite right.
|
| "An estimate of future HPC needs should be both demand-based
| and reasonable. From an operational NWP perspective, a four-
| fold increase in model resolution in the next ten years
| (sufficient for convection-permitting global NWP and kilometer-
| scale regional NWP) requires on the order of 100 times the
| current operational computing capacity. Such an increase would
| imply NOAA needs a few exaflops of operational computing by
| 2031. Exascale computing systems are already being installed at
| Oak Ridge National Laboratory (1.5 exa floating point
| operations per second (EF)) and Argonne Labs (1.0 EF) and it is
| likely that these national HPC laboratories will approach 100
| EF by 2031. Because HPC resources are essential to achieving
| the outcomes discussed in this report, it is reasonable for
| NOAA to aspire to a few percent of the computing capacity of
| these other national labs at a minimum. Substantial investments
| are also needed in weather research computing. To achieve a 3:1
| ratio of research to operational HPC, NOAA will need an
| additional 5 to 10 EF of weather research and development
| computing by 2031. Since research computing generally does not
| require high-availability HPC, it should cost substantially
| less than operational HPC and should be able to leverage a
| hybrid of outsourced, cloud and excess compute resources."[1]
|
| [1]https://sab.noaa.gov/wp-content/uploads/2021/11/PWR-
| Report_2...
| mulmen wrote:
| Would a faster computer improve outcomes for victims of natural
| disaster? How much is left undiscovered about weather?
|
| Research spending is based on the potential for discovery. As a
| species we have studied weather since the beginning of time.
| How long have we been doing nuclear research? A century?
|
| Is there even an opportunity cost here? Or is it an economy of
| scale? As we build more supercomputers the costs go down. So
| NOAA and ORNL both get what they need for less.
| aeroman wrote:
| Although meteorology is in many ways a much older science, I
| think you are underselling the difference (and importance of
| computers here). Better computing power means a more accurate
| forecast, but typically also a longer forecast horizon. That
| is critical when preparing for natural disasters and
| absolutely saves lives all the time.
|
| Even at a 3-day lead time, GFS was still suggesting landfall
| for hurricane Sandy outside the New York region, the longer
| lead times provided by other centers (with more computing
| power) were very important for preparation [1].
|
| Even on the science side, increased computing power enables a
| host of new discoveries. Even storing the locations for all
| the droplets in a small cloud would require an excessive
| amount of memory, let alone doing any processing [2].
| Increased computer power enables us to better understand how
| clouds respond to their environment, which is a key
| uncertainty in predicting climate change.
|
| Many disciplines of meteorology are also much newer than
| nuclear physics. Cloud physics (for example) only really got
| started with the advent of weather radar (so the 1940s).
| Before that, even simple questions (such as can a cloud
| without any ice in it produce rain?) were unknown.
|
| Even today, we still have difficulty seeing into the most
| intense storms. You cannot fly an aircraft in there, and
| radar has difficulty distinguishing different types of
| particle (ice, liquid, mushy ice, ice with liquid on the
| surface, snow) and is not good at coutning the number of
| particles either.
|
| Even after thousands of years, we are onlyjust now getting
| the tools to understand it. There is a lot left to discover
| about the weather!
|
| [1] - https://agupubs.onlinelibrary.wiley.com/doi/full/10.100
| 2/201...
|
| [2] - https://www.cloudsandclimate.com/blog/clouds_and_climat
| e/#id...
| Kuinox wrote:
| > Erik P. DeBenedictis of Sandia National Laboratories has
| theorized that a zettaFLOPS (1021 or one sextillion FLOPS)
| computer is required to accomplish full weather modeling,
| which could cover a two-week time span
| accurately.[121][122][123] Such systems might be built around
| 2030.
|
| https://en.wikipedia.org/wiki/Supercomputer
| ip26 wrote:
| One of the more commonly discussed values is predicting where
| a major hurricane makes landfall. We can't reliably do that
| yet, but if we could, evacuation zones would be both smaller
| & more effective.
| Twirrim wrote:
| > Would a faster computer improve outcomes for victims of
| natural disaster? How much is left undiscovered about
| weather?
|
| The US is way behind on weather modelling, in part due to
| lack of computing power available to do the grids at
| sufficiently small cells compared to Europe and other parts
| of the world. That means less accurate predictions and less
| advance notice of impending disasters, which means more risk
| of loss of life and impact on infrastructure and the economy
| (and vice versa, inaccuracy can lead to more caution than is
| necessary, which has economic impact too). The US has to lean
| on Europe etc. for predictions.
|
| https://cliffmass.blogspot.com/2020/02/smartphone-weather-
| ap...
|
| Talks about the fact that IBM / Weather.com actually uses a
| more accurate system than the NWS uses, because the NWS is
| still stuck on GFS (been several years now since congress
| passed an act to force NOAA to update away from it, and
| unfortunately it takes time)
| studmuffin650 wrote:
| I've heard that was the case with the old GFS model. They
| just updated the GFS model in 2021 to provide higher
| accuracy: https://www.noaa.gov/media-release/noaa-upgrades-
| flagship-us....
|
| I'm not entirely sure how it compared the ECMWF model
| during last years hurricane season, but I do think its
| improved substantially.
| jsmith99 wrote:
| For comparison, the UK government Met Office installed a
| similar sized cluster of Cray XC40 machines about 6 years
| ago, with a 60 petaflop replacement arriving this year.
| Their forecasts are, anecdotally, locally considered a bit
| rubbish though.
| quanto wrote:
| This is an interesting claim. Could you share a reputable
| source on your claim that the US weather prediction
| facility is behind its European counterparts? How does the
| US depend on Europe for weather predictions?
| arinlen wrote:
| > _(..) when nuclear labs get >1000 petaflop super computers
| (..)_
|
| Would you prefer the research being performed based on
| empirical testing instead of running simulations?
| jcranmer wrote:
| The national labs aren't purely--or likely even mostly--
| dedicated to nuclear research. Instead, they cover a lot of the
| basic science research. These supercomputers will likely be
| used for projects like exploring cosmological models, or
| studying intramolecular interactions for chemical compounds, or
| fine-tuning predictions about properties of the top quark, etc.
| gibolt wrote:
| Because these are so linked to research, everyone and their
| cousin is vying for time on them. Even though it may be
| massive, no individual will get anywhere near peak.
| convolvatron wrote:
| usually on the DOE machines some time is reserved for 'hero
| runs', but its generally only the classified stockpile work
| that qualifies
| gh02t wrote:
| Oak Ridge in particular is in DoE Office of Science. They do
| some national security work, but their primary focus is basic
| science. Some of the national labs _do_ primarily do nuclear
| weapons related research, but not Oak Ridge. Frontier is only
| doing unclassified work, primarily basic science and
| engineering.
| briffle wrote:
| I didn't know that, thank you for clarifying for me!
| irfn wrote:
| I am curious as to what class of problems are being solved on
| these super computers. Also whats the abstraction of
| computation here. Is it a container :-t :-t :-t
| convolvatron wrote:
| no, its a gang scheduled process - at least that's been the
| standard model. those processes are run either close to bare
| metal or as a process on linux. containers would be useful to
| package up the shared dependencies, so that may have changed.
| chomp wrote:
| Weather modeling - X kilometers by Y layers of atmosphere can
| get expensive to compute really quick. And NOAA does more
| than just simulate weather, they're running climate/sea level
| rise/arctic ice modelling, aggregating sensor data from
| buoys/balloons/satellites, processing maps, and more.
|
| I can't speak for NOAA, but my experience with supercomputing
| has been that there is no abstraction of computation, your
| workload is very much tied to hardware assumptions.
| irfn wrote:
| In my experience it's very hard to write code for parallel
| compute workloads and I am guessing that half of the code
| written would be creating abstractions about that.
| jandrewrogers wrote:
| They are used to do large-scale high-resolution analysis or
| simulation of complex systems in the physical world. The
| codes typically run on the bare metal with careful control of
| resource affinity, often C++ these days.
|
| They aren't just used for global-scale geophysical processes
| like weather and climate or complex physics simulations. For
| example, oil companies rent time to analytically reconstruct
| the 3-dimensional structure of what's underneath the surface
| of the Earth from seismic recordings.
| systemvoltage wrote:
| This is a weird take. There are so many things behind the
| scenes to say anything conclusively. Different compute loads,
| different problem domains, different accuracy and
| predictability requirements, etc.
|
| Cynicism is unwarranted, but it fits the current zeitgeist,
| biases and feels good.
| N1H1L wrote:
| The nuclear lab computers are also rented out to anyone who
| applies for an XSEDE grant. Anyone with a successful grant gets
| free access (obviously limited to a reasonable core-hors).
| Anyway, a ton of university researchers, all the way from
| materials simulations to weather groups will be using this
| computer to run their codes, as they have done for the last
| ones too.
|
| In fact, such use accounts for the vast majority of the compute
| use.
| jamesredd wrote:
| China has two exaflop supercomputers. It's doubtful whether this
| is the world's most powerful supercomputer.
|
| https://www.nextplatform.com/2021/10/26/china-has-already-re...
| ouid wrote:
| I'm not really sure why you would trust this claim from China.
| Its not impossible, but its also not impossible to lie about
| anuvrat1 wrote:
| Can someone please explain, how software is made at this scale?
| sydthrowaway wrote:
| Using a HPC Framework, such as OpenMP
| mhh__ wrote:
| Fairly low tech until you get to the super high end.
|
| You have a blend of very specific domain specific knowledge
| (e.g. they know the hardware - the interconnects more than the
| CPUs) and old skool Unix system administration.
___________________________________________________________________
(page generated 2022-05-31 23:01 UTC)