hngopher.com

       [HN Gopher] Graviton 3: First Impressions
       ___________________________________________________________________
        
       Graviton 3: First Impressions
        
       Author : ingve
       Score  : 136 points
       Date   : 2022-05-29 13:30 UTC (9 hours ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | jadbox wrote:
       | I'd love to see benchmarks for webservers like Node or Py.
        
       | wmf wrote:
       | Related Graviton 3 benchmarks:
       | https://www.phoronix.com/scan.php?page=article&item=graviton...
        
       | bullen wrote:
       | What about memory contention when many cores try to write/read
       | the same memory?
       | 
       | There is no point to add more cores if they can't cooperate.
       | 
       | How come I'm the only one pointning this out?
       | 
       | I think 4 cores will max out the memory contention, so keep on
       | piling these 128 core heaters on. But they will not outlive a
       | simple Raspberry 4!?
        
         | electricshampo1 wrote:
         | The whole chip in general will be used in aggregate by
         | independent vms/containers etc that do NOT read and write to
         | the same memory. Some kernel datastructures within a given vm
         | are still shared, ditto for within a single process, but good
         | design minimizes that (per cpu/thread data structures, sharded
         | locks, etc etc).
        
           | MaxBarraclough wrote:
           | I don't think they were referring to contention across VM
           | boundaries.
        
         | gpderetta wrote:
         | Reading the same memory usually ok.
         | 
         | Writing us not, but respecting the single writer principle is
         | usually rule zero of parallel programming optimisation.
         | 
         | If you mean reading/writing to the same memory bus in general,
         | then yes, the bus need to be sized according to the need of the
         | expected loads (i.e. the machine need to be balanced).
        
         | Sirened wrote:
         | It's likely that it's going to need a post on its own since
         | it's an extremely complicated topic. Someone else wrote an
         | awesome post about this for the Neoverse 2 chips [1] and they
         | found that with LSE atomics, the N2 performs as well or better
         | than Icelake. Given gravitron3 has a much wider fabric, I would
         | assume this lead only improves.
         | 
         | [1] https://travisdowns.github.io/blog/2020/07/06/concurrency-
         | co...
        
           | bullen wrote:
           | Ah, yes I remember this post, but it reads pretty cryptic to
           | me. I would like to know what the slowdowns actually become
           | in practice, does it add latency to the execution of other
           | threads and how will the machine as a whole behave?
           | 
           | I know M4 had much better multicore shared memory perf. than
           | M3, but now both of those are old and I don't have users to
           | test anything now.
        
       | WhitneyLand wrote:
       | How much can SVE instructions help with machine learning?
       | 
       | I've wondered why Apple Silicon made the trade off decision to
       | not include SVE support yet, given that support for lower
       | precision FP vectorization seems like it could have made their
       | NVidia perf gap smaller.
        
       | tomrod wrote:
       | Very interesting! I'm not terribly well versed in ARM vs x86 so
       | its helpful to see these kinds of benchmarks and reports.
       | 
       | One bit of feedback for the author: the sliding scale is helpful,
       | but the y axes are different between the visualizations so you
       | cannot see the apples to apples comparison needed. Suggest
       | regenerating those.
        
       | rwmj wrote:
       | _> GCC will flat out refuse to emit SVE instructions (at least in
       | our limited experience), even if you use assembly,_
       | 
       | This seems ... wrong? I haven't tried it but according to the
       | link below SVE2 intrinsics are supported in GCC 10 (and Clang 9):
       | 
       | https://community.arm.com/arm-community-blogs/b/tools-softwa...
        
         | adrian_b wrote:
         | Yes, gcc 10.1 has introduced support for the SVE2 intrinsics
         | (ACLE).
         | 
         | Moreover, starting with the 8.1 version, gcc began to use SVE
         | in certain cases when it succeeded to auto-vectorize loops (if
         | the correct -march option had been used).
         | 
         | Nevertheless, many Linux distributions are still shipped with
         | older gcc versions, so SVE/SVE2 does not work with the
         | available compiler or cross-compiler.
         | 
         | You must upgrade gcc to 10.1 or a newer version.
        
       | ykevinator2 wrote:
       | No burstable graviton3's yet :-(
        
       | DeathArrow wrote:
       | Such a shame we can't play with a socketed CPU like this and a
       | motherboard with EFI support as a workstation.
        
       | jazzythom wrote:
       | I hate reading about all the new chips I can't afford. If only
       | there was a standardized univeral open source motherboard and
       | some type of subscription model where I would always have the
       | best chip at the latest fab mailed straight to me on release. I
       | mean I only just got my hands on 32 core Epyc. Linus Torvalds has
       | had a Threadripper 3970x for years and I still can't afford it
       | and I'm still jealous, although to be fair my C skills hit their
       | limit when I tried to write pong. I don't like the idea of
       | building a new computer around a chip. It's messy and stupid.
       | These systems can be made modular if the motherboards packed
       | unnecessary bandwidth into the interconnect/planar.
        
       | Erlangen wrote:
       | I don't understand these graphs titled "Branch Predictor Pattern
       | Recognition". What do they mean? Could someone here explain it a
       | bit in detail? Thanks ahead.
        
       | Hizonner wrote:
       | It feels like we've gone badly wrong somewhere when processors
       | have to spend so many of their resources guessing about the
       | program. I am not saying I have a solution, just that feeling.
        
         | staticassertion wrote:
         | IDK, that seems like how brains work, and brains are pretty
         | cool. They guess all the time in order to save time.
        
         | Cthulhu_ wrote:
         | It always did feel like a weird hack to me to avoid parts of
         | the CPU to be idle. I mean the performance benefits are there,
         | but it's at the cost of power usage in the end.
         | 
         | Can branch prediction be turned off on a compiler or
         | application level? If you're optimizing for energy use that is.
         | Disclaimer: I don't actually know if disabling branch
         | prediction is more energy efficient.
        
           | imtringued wrote:
           | Turning off branch prediction sounds like a weird hack that
           | serves no purpose, just underclock and undervolt your CPU if
           | you care about power consumption that much.
        
           | Veedrac wrote:
           | Disabling branch prediction would have such a catastrophic
           | effect on performance, there is no way it would pay for
           | itself. Actually this is true for most parts of a CPU;
           | Apple's little cores are extremely power efficient and yet
           | they are fairly impressive out-of-order designs. It would
           | take a very meaningful redesign of how a CPU works to beat a
           | processor like that, at least at useful speeds.
        
         | [deleted]
        
         | tyingq wrote:
         | There's the Mill CPU, which sounds terrific on paper. Hard to
         | gauge when it might turn into something commercially usable
         | though.
        
           | 0xCMP wrote:
           | Mill would definitely make things more interesting. They were
           | supposed to have their simulator online a while ago, but
           | sounds like they needed to redo the work on the compiler
           | (from what I understood). Once that comes out it sounds like
           | the next step is getting the simulator online for people to
           | play with.
        
         | cesaref wrote:
         | I thought this was the reasoning behind Itanium, the idea that
         | scheduling could be worked out in advance by the compiler
         | (probably profile guided from tests or something like that)
         | which would reduce the latency and silicon cost of
         | implementations.
         | 
         | However, it wasn't exactly a raging success, with I think the
         | predicted amazing compiler tech not materialising, but maybe it
         | is the right answer, but the implementation was wrong? I'm no
         | CPU expert...
        
           | Hizonner wrote:
           | I'm not sure what happened with Itanium.
           | 
           | I do think a big part of the problem is that people want to
           | distribute binaries that will run on a lot of CPUs that are
           | physically really different inside. But nowadays there's JIT
           | compilation even for JavaScript, so you could distribute
           | something like LLVM, or even (ecch) JavaScript itself, and
           | have the "compiler scheduling" happen at installation time or
           | even at program start.
        
             | imtringued wrote:
             | You can't distribute LLVM for that purpose without defining
             | a stable format like WebAssembly or SPIR-V.
        
           | Veedrac wrote:
           | Itanium was a really badly designed architecture, which a lot
           | of people skip over when they try to draw analogies to it. It
           | was a worst of three worlds, in that it was big and hot like
           | an out-of-order, it had the serial dependency issues of an
           | in-order, and it had all the complexity of fancy static
           | scheduling without that fancy scheduling actually working.
           | 
           | There have been a small number of attempts since Itanium,
           | like NVIDIA's Denver, which make for much better baselines. I
           | don't think those are anywhere close to optimal designs, or
           | really that they tried hard enough to solve in-order issues
           | at all, but they at least seem sane.
        
             | speed_spread wrote:
             | Would Itanium have been better served with bytecode and a
             | modern JIT? Also, doesn't RISC-V kinda get back on that
             | VLIW track with macro-ops fusion, using a very basic
             | instruction set and letting the compiler figure out the
             | best way to order stuff to help target CPU make sense of
             | it?
        
             | nine_k wrote:
             | I heard that the desire to make x86 emulation performant on
             | Itanium made things really bad, compared to a "clean" VLIW
             | architecture.
        
         | canarypilot wrote:
         | Why would you consider prediction based on dynamic conditions
         | to be the sign of a dystopian optimization cycle? Isn't it
         | mostly intuitive that interesting program executions are not
         | going to be things you can determine statically (otherwise your
         | compiler would have cleaned them up for you with inlining
         | etc.), or could be determined statically but at too great cost
         | to meet execution deadlines (JiTs and so on), or resource
         | constraints (you don't really want N code clones specialising
         | each branch backtrace to create strictly predictable chains).
         | 
         | Or is the worry on the other side; that processors have gotten
         | so out-of-order that only huge dedication to guesswork can keep
         | the beast sated? I don't see this as a million miles from
         | software techniques in JiT compilers to optimistically optimize
         | and later de-deoptimize when an assumption proves wrong.
         | 
         | I think you might be right to be nervous if you wrote programs
         | that took fairly regular data and did fairly regular things to
         | it. But, as Itanium learned the hard way, programs have much
         | more dynamic, emergent and interesting behaviour than that!
        
           | [deleted]
        
           | amelius wrote:
           | I guess the fear is that the CPU might start guessing wrong,
           | causing your program to miss deadlines. Also, the heuristics
           | are practically useless for realtime computing, where timings
           | must be guaranteed.
        
             | nine_k wrote:
             | I suppose that if you assume in-order execution and count
             | the clock cycles, you should get a guaranteed lower bound
             | of performance. It may be, say, 30-40% of the performance
             | you really observe, but having some headroom should feel
             | good.
        
         | rwmj wrote:
         | Uli Drepper has this tool which you can use to annotate source
         | code with explanations of which optimisations are applied. In
         | this case it would rely on GCC recognizing branches which are
         | hard to predict (eg. a branch in an inner loop which is data-
         | dependent), and I'm not sure GCC is able to do that.
         | 
         | https://github.com/drepper/optmark
        
         | bastawhiz wrote:
         | Isn't that the whole promise of general purpose computing? That
         | you don't need to find specialized hardware for most workloads?
         | Nobody wants to be out shopping for CPUs that have features
         | that align particularly well with their use case, then
         | switching to different CPUs when they need to release an update
         | or some customer comes along with data that runs less
         | efficiently with the algorithms as written.
         | 
         | Since processors are expensive and hard to change, they do
         | tricks to allow themselves to be used more efficiently in
         | common cases. That seems like a reasonable behavior to me.
        
         | adrian_b wrote:
         | A majority of the non-deterministic and speculative hardware
         | mechanisms that exist in a modern CPU are required due to the
         | consequences of one single hardware design decision: to use a
         | data cache memory.
         | 
         | The data cache memory is one of the solutions to avoid the
         | extremely long latency of loading data from a DRAM memory.
         | 
         | The alternative to a data cache memory is to have a hierarchy
         | of memories with different speeds, which are addressed
         | explicitly.
         | 
         | The latter variant is sometimes chosen for embedded computers
         | where determinism is more important than programmer
         | convenience. However, for general-purpose computers this
         | variant could be acceptable only if the hierarchy of memories
         | would be managed automatically by a high-level language
         | compiler.
         | 
         | It appears that writing a compiler that could handle the
         | allocation of data into a heterogeneous set of memories and the
         | transfers between them is a more difficult task than designing
         | a CPU that becomes an order of magnitude more complex due to
         | having a hierarchy a data cache memories and a long list of
         | other hardware mechanisms that must be added due to the
         | existence of the data cache memory.
         | 
         | Once it is decided that the CPU must have a data cache memory,
         | a lot of other hardware design decisions follow from it.
         | 
         | Because there is an inverse relationship between the load
         | latency and the data cache memory size, the cache memory must
         | be split into a multi-level hierarchy of cache memories.
         | 
         | To reduce the number of cache misses, data cache prefetchers
         | must be added, to speculatively fill the cache lines in advance
         | of load requests.
         | 
         | Now, when a data cache exists, most loads have a small latency,
         | but from time to time there still is a cache miss, when the
         | latency is huge, long enough to execute hundreds of
         | instructions.
         | 
         | There are 2 solutions to the problem of finding instructions to
         | be executed during cache misses, instead of stalling the CPU:
         | simultaneous multi-threading and out-of-order execution.
         | 
         | For explicitly addressed heterogeneous memories, neither of
         | these 2 hardware mechanisms is needed, because independent
         | instructions can be scheduled statically to overlap the memory
         | transfers. With a data cache, this is not possible, because it
         | cannot be predicted statically when cache misses will occur
         | (mainly due to the activity of other execution threads, but
         | even an if-then-else can prevent the static prediction of the
         | cache state, unless additional load instructions are inserted
         | by the compiler, to ensure that the cache state does not depend
         | on the selected branch of the conditional statement; this does
         | not work for external library functions or other execution
         | threads).
         | 
         | With a data cache memory, one or both of SMT and OoOE must be
         | implemented. If out-of-order execution is implemented, then the
         | number of registers needed to avoid false dependencies between
         | instructions becomes larger than it is convenient to encode in
         | the instructions. so register renaming must also be
         | implemented.
         | 
         | And so on.
         | 
         | In conclusion, to avoid the huge amount of resources needed by
         | a CPU for guessing about the programs, the solution would be a
         | high-level language compiler able to transparently allocate the
         | data into a hierarchy of heterogeneous memories and schedule
         | transfers between them when needed, like the compilers do now
         | for register allocation, loading and storing.
         | 
         | Unfortunately nobody has succeeded to demonstrate a good
         | compiler of this kind.
         | 
         | Moreover, the existing compilers have frequently difficulties
         | in discovering the optimal allocation and transfer schedule for
         | registers, which is a simpler problem.
         | 
         | Doing efficiently the same for a hierarchy of heterogeneous
         | memories seems out-of-reach for the current compilers.
        
           | solarexplorer wrote:
           | We do have these architectures already in the embedded space
           | and as DSPs. I suppose, they would be interesting for
           | supercomputers as well. But for general purpose CPUs they
           | would be a difficult sell. Since the memory size and latency
           | would be part of the ISA, binaries could not run unchanged on
           | different memory configurations, you would need another
           | software layer to take care of that. Context switching and
           | memory mapping would also need some rethinking. Of course,
           | all of this can be solved, but it would make adoption more
           | difficult.
           | 
           | And last not least, unknown memory latency is not the only
           | source of problems, branch (mis)predictions are another. And
           | they have the same remedies as cache misses: multithreading
           | and speculative execution.
           | 
           | So if you wanted to get rid of branch prediction as well, you
           | could come up with something like the CRAY-1.
        
             | adrian_b wrote:
             | You are right that a kind of multi-threading can be useful
             | to mitigate the effects of branch mispredictions.
             | 
             | However, for this, fine-grained multi-threading is enough.
             | Simultaneous multi-threading does not bring any advantage,
             | because the thread with the mispredicted branch cannot
             | progress.
             | 
             | Out-of-order execution cannot be used during branch
             | mispredictions, so like I have said, both SMT and OoOE are
             | techniques useful only when a data cache memory exists.
             | 
             | Any CPU with pipelined instruction execution needs a branch
             | predictor and it needs to execute speculatively the
             | instructions on the predicted path, in order to avoid the
             | pipeline stalls caused by control dependencies between
             | instructions. An instruction cache memory is also always
             | needed for a CPU with pipelined instruction execution, to
             | ensure that the instruction fetch rate is high enough.
             | 
             | Unlike simultaneous multi-threading, fine-grained multi-
             | threading is useful in a CPU without a data cache memory,
             | not only because it can hide the latencies of branch
             | mispredictions, but also because it can hide the latencies
             | of any long operations, like it is done in all GPUs.
             | 
             | Fine-grained multi-threading is significantly simpler to
             | implement than simultaneous multi-threading.
        
         | mhh__ wrote:
         | People have tried over and over again to "fix" this and it
         | hasn't worked.
         | 
         | The interesting probabilities are all decided at runtime.
         | 
         | Now we have AI workloads there is a place for a big lump of
         | dumb compute again, but not in general purpose code.
        
       | Dunedan wrote:
       | > The final result is a chip that lets AWS sell each Graviton 3
       | core at a lower price, while still delivering a significant
       | performance boost over their previous Graviton 2 chip.
       | 
       | That's not correct. AWS sells Graviton 3 based EC2 instances at a
       | higher price than Graviton 2 based instances!
       | 
       | For example a c6g.large instance (powered by Graviton 2) costs
       | $0.068/hour in us-east-1, while a c7g.large instance (powered by
       | Graviton 3) costs $0.0725/hour [1]. Both instances have the same
       | core count and memory, although c7g instances have slightly
       | better network throughput.
       | 
       | I believe that is pretty unusual as, if my memory serves me
       | right, newer instances family generations are usually cheaper
       | than the previous generation.
       | 
       | [1]: https://aws.amazon.com/ec2/pricing/on-demand/
        
         | adrian_b wrote:
         | Based on the first published benchmarks, even the programs
         | which have not been optimized for Neoverse V1, and which do not
         | benefit from its much faster floating-point and large-integer
         | computation abilities, still show a performance increase of at
         | least 40%, so greater than the price increase.
         | 
         | So I believe that using Graviton 3 at these prices is still a
         | much better deal than using Graviton 2.
        
         | myroon5 wrote:
         | Definitely unusual, as the graph here shows:
         | https://github.com/patmyron/cloud/
        
           | WASDx wrote:
           | Could it be due to increasing global energy prices?
        
             | usefulcat wrote:
             | I don't follow. You seem to be implying that Amazon would
             | like to reduce their electricity usage. If so, shouldn't
             | they be charging _less_ for the more efficient instance
             | type?
        
               | nine_k wrote:
               | No, they charge for compute, which the new CPU provides
               | more of, even though it consumes the same amount of
               | electricity as a unit.
        
         | jeffbee wrote:
         | It would be irrational to expect a durable lower price on
         | graviton. Amazon will price it lower initially to entice
         | customers to port their apps, but after they get a critical
         | mass of demand the price will rise to a steady state where it
         | costs the same as Intel. The only difference will be which guy
         | is taking your money.
        
           | zrail wrote:
           | Do you have a cite on Amazon raising prices like that at any
           | other point in their history?
        
           | greatpostman wrote:
           | I don't think Amazon has ever raised their prices. This
           | comment is based on nothing
        
             | losteric wrote:
             | Prime has gone up quite a bit
             | 
             | Nearly every business seeks to maximize profit. Right now
             | AWS is in growth phase - why wouldn't they raise rates in
             | the future?
        
             | orf wrote:
             | I mean, they just raised their graviton prices between
             | generations.
             | 
             | I don't think the point was that they would increase the
             | cost of existing instance types, only that over time and
             | generations the price will trend upwards as more workloads
             | shift over.
        
               | staticassertion wrote:
               | I wouldn't call that "raising prices"... you can still
               | use Graviton 2 if it's a better price for you.
        
           | jhugo wrote:
           | I dunno, this take is a bit weird to me. The work we did to
           | support Graviton wasn't "moving" from Intel to ARM, it was
           | making our build pipeline arch-agnostic. If Intel one day
           | works out cheaper again we'll use it again with basically
           | zero friction.
        
             | ykevinator2 wrote:
             | Same
        
           | dilyevsky wrote:
           | Considering blank stares that I get when mentioning arm as
           | potential cost saving measure it will take years and maybe
           | decades before that happens by which point you're def getting
           | your money's worth as early adopter
        
           | spookthesunset wrote:
           | When is the last time Amazon has raised cloud prices?
        
             | jeffbee wrote:
             | Literally 6 days ago when they introduced this thing.
        
               | dragonwriter wrote:
               | > Literally 6 days ago when they introduced this thing.
               | 
               | Offering a new option is not a price increase. You can
               | still do all the same things at the same prices, plus if
               | the new thing is more efficient for your particular task
               | you have an additional option.
        
               | jeffbee wrote:
               | When they introduced c6i they did it at the same price as
               | c5, even though the c6i is a lot more efficient. They're
               | raising the price on c7g vs. c6g to bring it closer to
               | the pricing of c6i, which is pretty much exactly what I
               | suggested?
        
               | deanCommie wrote:
               | You're being highly obtuse.
               | 
               | Universally everyone understands "raising prices" to be -
               | "raising prices without any customer action".
               | 
               | As in you consider your options, take into consideration
               | pricing, design your architecture, you deploy it, and you
               | get a bill. Then suddenly, later, without any action of
               | your own, your bill goes up.
               | 
               | THAT is raising prices, and it is something AWS has
               | essentially never done.
               | 
               | What you're describing is a situation where a customer
               | CHOOSES to upgrade to a new generation of instances, and
               | in doing so gets a larger bill. That is nowhere near the
               | same thing.
        
         | arpinum wrote:
         | Graviton 2 (c6g) also cost more than the Graviton 1 (a1)
         | instances
        
         | mastax wrote:
         | Given the surrounding context I read that sentence to mean that
         | focusing on compute density allowed them to sell each core at a
         | lower price vs focusing on performance, not that Graviton 3 is
         | cheaper than Graviton 2.
        
       | invalidname wrote:
       | While the article is interesting I would be more interested in
       | details about carbon footprint and cost reduction. Also how would
       | this impact more typical node, Java loads?
        
         | Hizonner wrote:
         | You know, if you wanted to improve carbon footprint, a better
         | place to look might be at software bloat. The sheer number of
         | times things get encoded and decoded to text is mind boggling.
         | Especially in "typical node, Java loads".
        
           | tyingq wrote:
           | Logging and Cybersecurity are bloaty areas as well. I've seen
           | plenty of AWS cost breakdowns where the cybersec functions
           | were the highest percentage of spend. Or desktops where
           | carbon black, or windows defender were using most of the CPU
           | or IO cycles. And networks where syslog traffic was the
           | biggest percentage of traffic.
        
             | Dunedan wrote:
             | As AWS doesn't price services based on carbon footprint,
             | you can't infer the carbon footprint from the cost.
             | 
             | I agree however that certain AWS services are
             | disproportional expensive.
        
               | maxerickson wrote:
               | Presumably the price provides some sort of bounds.
               | 
               | (Unless they are doing something like putting profits
               | towards some sort of carbon maximization scheme)
        
               | tyingq wrote:
               | Well and a fair amount of cybersec oriented services are
               | a pattern of _" sniff and copy every bit of data and do
               | something with it"_ or _" trawl all state"_. Which is
               | inherently heavy.
        
             | orangepurple wrote:
             | Norton, Symantec, and McAfee contribute greatly to global
             | warming in the financial services sector. At least half of
             | CPU cycles on employee laptops are devoted to them.
        
               | Cthulhu_ wrote:
               | But do they actually work? For years I've been of the
               | opinion that most anti-virus solutions don't actually
               | stop virusses, instead they give you a false sense of
               | security and their messaging is intentionally alarmist to
               | make individuals and organizations pay their subscription
               | fees.
               | 
               | In my limited and sheltered experience, the only viruses
               | I've gotten in the past decade or so was from dodgy
               | pirated stuff or big "download" button ads on download
               | sites.
        
               | MrBuddyCasino wrote:
               | At best they don't work, in reality they are an attack
               | vector themselves and a performance nightmare. They
               | should (mostly) not exist.
        
               | MaxBarraclough wrote:
               | Presumably then they're knocking hours off the laptops'
               | battery lives?
        
         | jeffbee wrote:
         | Virtually 100% of cloud operating expenses are electricity, so
         | you can pretty much assume that if it costs less it has a lower
         | carbon footprint.
        
           | _joel wrote:
           | + Rent, support staff, development costs, regulation and
           | compliance, network, maintenance (cooling, fire suppression +
           | lots more), marketing.
           | 
           | Speaking as someone who did sys admin for a small independent
           | cloud provider, it definitely isn't virtually 100% of
           | operating costs
        
             | jeffbee wrote:
             | No offense intended to your personal experience, but I
             | don't think "small independent cloud" is terribly important
             | in the global analysis. This paper concludes that TDP and
             | TCO have become the same thing, i.e. power is heat, power
             | is opex.
             | 
             | https://www.gwern.net/docs/ai/scaling/hardware/2021-jouppi.
             | p...
        
         | shepherdjerred wrote:
         | AWS is pushing to move its internal services (most which are in
         | Java) to graviton, so I would expect it to be excellent for
         | "normal" workloads/languages
        
       ___________________________________________________________________
       (page generated 2022-05-29 23:00 UTC)