[HN Gopher] AWS Graviton 3 Instances
___________________________________________________________________
AWS Graviton 3 Instances
Author : Trisell
Score : 156 points
Date : 2021-11-30 17:16 UTC (5 hours ago)
(HTM) web link (aws.amazon.com)
(TXT) w3m dump (aws.amazon.com)
| ksec wrote:
| >Graviton3 will deliver up to 25% more compute performance and up
| to twice as much floating point & cryptographic performance. On
| the machine learning side, Graviton3 includes support for
| bfloat16 data and will be able to deliver up to 3x better
| performance.
|
| >First in the cloud industry to be equipped with DDR5 memory.
|
| Quite hard to tell whether this is Neoverse V1 or N2. Since the
| description fits both . But this SVE extensions will move a lot
| of workload that previously wont suitable for Graviton 2
|
| Edit: Judging from Double floating point performance it should be
| N2 with SVE2. Which also means Graviton 3 will be ARMv9 and on
| 5nm. No wonder why TSMC doubled their 5nm expansion spending. It
| will be interesting to see how they price G3 and G2. And much
| lowered priced G2 instances will be _very_ attractive.
| ksec wrote:
| Cant edit it now, It should be V1, not N2.
| lostmsu wrote:
| Any benchmarks? I'd like to see Geekbench 5 results from a full-
| sized one socket instance.
| lizthegrey wrote:
| No benchmarks yet, but I can get 30% faster latency, on 2/3 the
| number of instances compared to c6g for the same
| uncompress/compress/write to Kafka workload.
| talawahtech wrote:
| I was wondering how they were going to manage the fact that AMDs
| Zen3 based instances would likely be faster than Graviton2. Color
| me impressed. AWS' pace of innovation is blistering.
| b9a2cab5 wrote:
| On a per thread basis - yes. On a per dollar basis on AWS - no.
| Per instance basis - depends a lot.
| thinkafterbef wrote:
| Nice wording with 'per core performance'. We had difficulties
| properly conveying this point in our product when comparing
| our CI runners[1] to GitHub Actions CI Runner. I will be
| using it in our next website update. Tack
|
| [1]https://buildjet.com/for-github-actions
| Rafuino wrote:
| Aren't the Zen3 instances still faster than Graviton 3? DDR5 is
| interesting, and while lower power is nice, the customers don't
| benefit from that much, mostly AWS itself with its power bill.
| I haven't seen pricing yet, but assume AWS will price their own
| stuff lower to win customers and create further lock-in
| opportunities (and even take a loss like with Alexa).
| ksec wrote:
| Where is this narrative that Zen 3 instances are faster than
| Graviton 3 instances coming from?
|
| I am pretty sure Zen 3 doesn't bring 25% ST performance
| improvement compared to Zen 2.
| aPoCoMiLogin wrote:
| amd cailms 19% improvements, in some cases is more than
| that: https://www.anandtech.com/show/16214/amd-zen-3-ryzen-
| deep-di...
| b9a2cab5 wrote:
| Unfortunately those claims don't translate to servers,
| because the IO die's power usage increased and perf/w
| isn't much better [1]. Do the math and you get around 14%
| gain in SPEC MT workloads.
|
| [1]: https://www.anandtech.com/show/16778/amd-epyc-milan-
| review-p...
| tyingq wrote:
| >Aren't the Zen3 instances still faster
|
| I would guess price/performance matters more than peak
| performance for a lot of use cases. With prior Graviton
| releases, AWS has made it so they are better
| price/performance. Keep in mind that a vCPU on Graviton is a
| full core rather than SMT/Hyperthread (half a core).
| staticassertion wrote:
| How does Graviton create lock-in? It's ARM.
| jffry wrote:
| I think the idea is by attracting new customers to EC2 via
| performance/price, and then enticing them to integrate with
| other harder-to-leave AWS services
| staticassertion wrote:
| That might make sense for Lambdas, but I don't see how
| that's the case with EC2, or how that's specific to
| Graviton vs x86 etc.
| justicezyx wrote:
| What's the motivation behind this question? Or why do you
| think Amazon wants to create lockin for Graviton
| processors.
|
| Note that graviton represent a classic "disruptive
| technology" that is outside of the main stream market's
| "value network". I.e., it provides something that is
| valuable to marginal customers who are far from the primary
| revenue source of the larger market.
| staticassertion wrote:
| The post I was responding to said:
|
| > but assume AWS will price their own stuff lower to win
| customers and create further lock-in opportunities (
|
| Implying that by pricing Graviton lower users will be
| 'locked in' to AWS.
| dragontamer wrote:
| I don't know. I prefer it when companies actually give the
| technical details.
|
| > While we are still optimizing these instances, it is clear
| that the Graviton3 is going to deliver amazing performance. In
| comparison to the Graviton2, the Graviton3 will deliver up to
| 25% more compute performance and up to twice as much floating
| point & cryptographic performance. On the machine learning
| side, Graviton3 includes support for bfloat16 data and will be
| able to deliver up to 3x better performance.
|
| This means nothing to me. Why is there more floating point and
| cryptographic performance? Did Amazon change the Neoverse core?
| Is this N1 cores still? Did they tweak the L1 caches?
|
| I don't think Amazon has the ability to change the core design
| unfortunately. This suggests to me that maybe Amazon is using
| N2 cores now?
|
| But it'd be better if Amazon actually said what the core design
| changes are. Even just saying "updated to Neoverse N2" would go
| a long way to our collective understanding.
| talawahtech wrote:
| AWS re:Invent is this week. This was announced as part of the
| CEO's keynote. I am sure we will get more details throughout
| the week in some of the more technical sessions.
| ksec wrote:
| >Why is there more floating point and cryptographic
| performance?
|
| You can infer this to N2 which ARM gave their own results
| [1], N2 uses SVE2 256bit.
|
| [1] https://community.arm.com/arm-community-
| blogs/b/architecture...
| SecurityPony wrote:
| N2 does not use 256b SVE2, though its cousin Neoverse V1
| does. I think there's a very real chance that Grav3 is
| actually V1, not N2. (N2 uses 128b SVE vectors, as does the
| Cortex-A710 it's based on.)
| ksec wrote:
| Oh YES [1] . I got the two mixed up. I should have
| doubled checked. So it is V1.
|
| The V1 design is available in both 7nm and 5nm.
|
| Oh well I guess if AWS had it in 5nm they would have at
| least marketed it as such. So may be it is the same 7nm.
|
| [1] https://images.anandtech.com/doci/16640/Neoverse_Intr
| o_3.png
| wayoutthere wrote:
| Honestly, they're not innovating so much as forcing a product
| market fit that everyone knew existed, but didn't have the
| business case to develop without a hyperscalar anchor customer
| (like AWS!)
|
| If anything, this is just another data point that shows how
| truly commoditized tech is. I just worry what happens when
| Amazon decides to "differentiate" after they lock you in.
| haukem wrote:
| Does Graviton3 use the Neoverse V1? The Graviton2 used the
| Neoverse N1.
|
| The features listed here match the core:
| https://developer.arm.com/ip-products/processors/neoverse/ne...
|
| The N2 misses the bfloat, but it could be that the ARM marketing
| named it differently: https://developer.arm.com/ip-
| products/processors/neoverse/ne...
| dragontamer wrote:
| N2 has BFloat16 instructions. EDIT: I probably should have a
| citation:
| https://developer.arm.com/documentation/PJDOC-466751330-1825...
|
| Page 50 of 92 shows off BFCVTN, BFDOT, BFMMLA (matrix multiply
| and accumulate), BFCVT, and other BF16 instructions on the N2.
|
| I'd assume this Graviton 3 is a N2 core. But that's just me
| assuming.
| haukem wrote:
| Thanks for the document, so the ARM marketing just confused
| me. ;-)
|
| Yes N2 is more likely than V1. N2 has the better PPA ratio.
| Own CPU core is very unlikely as I am not aware of any rumors
| which we would have notice before.
|
| N2 also supports ARMv9 which is nice.
| sabujp wrote:
| yeah signed stack pointers is a thing since at least 2017 :
| https://lwn.net/Articles/718888/
| sydthrowaway wrote:
| aka arm64e
| givemeethekeys wrote:
| Side-note: Polly's intonation makes for a fairly confusing
| listening session.
| jfbaro wrote:
| This seems to be a good fit Lambda as well. Looking forward to
| seeing more information about the CHIP design and capabilities.
| amelius wrote:
| I really hate it that big companies are rolling their own CPU
| now. Soon, you're not a serious developer if you don't have your
| own CPU. And everybody is stuck in some walled garden.
|
| I mean, it's great that the threshold to produce ICs is now
| lower, but is this really the way forward? Shouldn't we have
| separate CPU companies, so that everybody can benefit from
| progress, not only the mega corporations?
| minedwiz wrote:
| I mean, it's just ARM - pretty standard architecture these
| days. If the big companies want to compete on chip design, I
| don't see it as all that different from AMD, Intel (and Via if
| you count them) competing on x86-compatibles.
| zamadatix wrote:
| AMD/Intel/Via/IBM/ARM are in horizontal competition on chip
| design, Amazon/Google/Microsoft/Apple are in vertical
| competition on chip design. Vertical competition typically
| results in far less ability for the market to optimize.
|
| Compare for instance the Zen 3 upset vs the M1 upset. Zen 3
| allowed the market to pick what they thought was the best
| CPU, the M1 allowed the market to pick if they wanted to buy
| an entire computer, OS, and software set because the CPU was
| good. Similarly with Graviton and Amazon, you can't just say
| Amazon is competing the same as Via, their interest is in
| selling the AWS ecosystem not in providing the best
| individual components. Same with Google and their custom
| chips and Microsoft with theirs now. Yes many are "just ARM"
| but due to custom extensions/chips and (in some cases) lack
| of standard ARM features that doesn't mean they are the same
| ARM.
|
| Of course that's not to argue it's wrong because it's
| vertical integration, many will think that's the better way
| to make complicated products, but that's not the point - the
| way big companies are competing on chip design is very
| different than if one acted like an AMD/Intel/Via competitor
| to actually compete in the chip space instead of a larger
| space.
| zokier wrote:
| I do point out that vendor-specific architectures were the norm
| for long time in the history and the sky didn't fall down. The
| x86 dominance was relatively short anomaly more than anything.
| amelius wrote:
| The sky didn't fall down because business folks didn't yet
| figure out how to build the most effective walled gardens.
| ksec wrote:
| >I really hate it that big companies are rolling their own CPU
| now. Soon, you're not a serious developer if you don't have
| your own CPU. And everybody is stuck in some walled garden.
|
| It is still just ARM. You can buy ARM chip everywhere. There is
| no walled garden.
|
| > Shouldn't we have separate CPU companies, so that everybody
| can benefit from progress,
|
| You are benefiting the same CPU design from ARM, and same Fab
| improvement from TSMC. Amortised across the whole industry.
| Doesn't get any better than that.
| amelius wrote:
| > It is still just ARM. You can buy ARM chip everywhere.
|
| Only large companies can build CPUs based on ARM. Also, now
| companies might rely on everything in vanilla ARM, but soon
| they will be adding parts of their own ISA, improvements to
| the memory hierarchy, a GPU, or perhaps even their own
| management engine to keep an eye on things or to keep things
| locked down.
|
| > There is no walled garden.
|
| There is huge potential for walled gardens, just look at
| Apple.
| ksec wrote:
| >Only large companies can build CPUs based on ARM....
|
| Only large companies can build _any_ modern CPUs.
|
| > improvements to the memory hierarchy, a GPU, or perhaps
| even their own management engine
|
| None of these has anything to do with ISA nor ARM. As a
| matter of fact adding any of these does not contribute to
| lock in. They are ( potentially ) fragmentation in terms of
| optimisation requirement.
|
| If I am inferring correctly, the only thing that wouldn't
| count as locked down and walled garden by your definition
| would be a total open source design from Hardware to
| Software.
| amelius wrote:
| > If I am inferring correctly, the only thing that
| wouldn't count as locked down and walled garden by your
| definition would be a total open source design from
| Hardware to Software.
|
| No. I'm totally fine with CPUs with fully open
| documentation and preferably designed by a company that
| specializes in CPUs. Open documentation + liberal license
| should allow other CPU manufacturers to compete based on
| the same ISA.
| glogla wrote:
| ARM in general is available. But good ARM is only in
| proprietary walled gardens.
|
| Compare RaspberryPi or Snapdragon with Apple M1.
| b9a2cab5 wrote:
| You can buy Ampere Arm CPUs that are based on the same core
| as AWS Graviton2. And in general mobile CPUs are a
| generation ahead of server architecturally.
| xxpor wrote:
| Only in theory. Try to buy, as a consumer, a bare Ampere
| CPU right now. It's impossible.
| floatboth wrote:
| Apple M1 isn't in a proprietary walled garden, it's a
| general purpose computer like any good old x86 laptop. It's
| not designed with industry standards in mind and doesn't
| have any official documentation on internals, but it's not
| locked down in any way, and reverse engineering is solving
| the documentation problem already.
|
| (Also Qualcomm Snapdragon is a far more cursed platform
| internally.)
| oblio wrote:
| This:
|
| > Apple M1 isn't in a proprietary walled garden, it's a
| general purpose computer like any good old x86 laptop.
|
| is functionally contradicted by this:
|
| > It's not designed with industry standards in mind and
| doesn't have any official documentation on internals
|
| And this:
|
| > but it's not locked down in any way, and reverse
| engineering is solving the documentation problem already.
|
| only improves things partially.
|
| > (Also Qualcomm Snapdragon is a far more cursed platform
| internally.)
|
| The TL;DR would be more that ARM in practice and SoCs for
| sure, suck. They're functionally their own little
| islands, compared to the PC.
| lizthegrey wrote:
| You can get Ampere chips which are not proprietary to any
| specific cloud provider. Both Packet/Equinix Metal and Oracle
| use them.
| [deleted]
| qbasic_forever wrote:
| Where are Google and Azure with ARM instances? It's been nothing
| but crickets for years now... this is starting to get silly that
| their customers can't at least start getting workloads on a
| different architecture, nevermind get better performance per
| dollar, etc. too. The silence is deafening.
| baybal2 wrote:
| > Where are Google and Azure with ARM instances?
|
| I think they are bound by long term supply agreements with
| Intel. They will just bargain for better prices with Intel.
|
| Not an easy task it will be, given that Intel is capacity
| jammed.
| ksec wrote:
| >The silence is deafening
|
| Remember Amazon brought Annapurna Labs in _2015_. And only
| released their first Graviton instances in 2018. The lead time
| for a Server CPU product is at least a year even when you have
| blueprints. That is ignoring fab capacity booking and many
| other things like testing. And without scale ( AWS is bigger
| than GCP and Azure _combined_ ) it is hard to gain competitive
| advantage ( which often delays management decision making ).
|
| I think you should see Azure and GCP ARM offering in late 2022.
| Marvel's exit statement on ARM server SoC pretty much all but
| confirmed Google and Microsoft are working on their own ARM
| offering.
| baybal2 wrote:
| > And without scale ( AWS is bigger than GCP and Azure
| combined )
|
| Azure cloud is bigger than AWS
|
| https://cloudwars.co/microsoft/microsoft-q2-cloud-revenue-
| st...
|
| In terms of physical servers. I believe Chinese cloud
| providers, and their monster CNDs (China has terribly slow
| Internet, and superlocal CDNs are the only way out there)
| overtook AWS quite a few years ago.
| lostmsu wrote:
| Revenue != compute resources. That revenue includes Office
| Online subscriptions AFAIK as a way to game the marketing.
| pdelgallego wrote:
| And if I recall correctly, AWS needed to develop Nitro before
| offering Graviton.
| MangoCoffee wrote:
| https://interestingengineering.com/microsoft-will-build-adva...
|
| "The second phase will see these entities develop custom
| integrated chips and System on a Chip (SoC) with "lower power
| consumption, improved performance, reduced physical size, and
| improved reliability for application in DoD systems."
|
| maybe its coming?
| lostmsu wrote:
| Funnily, IBM offers ARM instances. Even on free tier.
| Uehreka wrote:
| Doesn't IBM also offer a bunch of weird architectures that
| can't be found anywhere else? One day I was looking up some
| old PowerPC and s390 architectures that are supported by a
| lot of docker images, trying to figure out why anyone would
| want them, and it appears the answer is that they're used in
| IBM mainframes.
| oblio wrote:
| IBM had computer architectures before there were computer
| architectures, so it's not exactly a fair comparison :-p
| tyingq wrote:
| Same for Oracle.
| lostmsu wrote:
| I am sorry, I was wrong. I actually meant Oracle. I don't
| see any ARM options in IBM cloud.
| Alex3917 wrote:
| How long until we get T5g with Graviton3?
| croddin wrote:
| Graviton2 was announced at re:invent 2019 and t4g came out in
| September 2020 so my guess is we will see t5g instances by
| September 2022.
| shaicoleman wrote:
| They don't always update the T series instances for every
| generation, so I wouldn't hold my breath.
| mwcampbell wrote:
| If I'm not mistaken, they have updated the T series for every
| generation since the introduction of their Nitro
| virtualization.
| ghshephard wrote:
| So - for those deeper into security - is this useful?
|
| "Graviton3 processors also include a new pointer authentication
| feature that is designed to improve security. Before return
| addresses are pushed on to the stack, they are first signed with
| a secret key and additional context information, including the
| current value of the stack pointer. When the signed addresses are
| popped off the stack, they are validated before being used. An
| exception is raised if the address is not valid, thereby blocking
| attacks that work by overwriting the stack contents with the
| address of harmful code. We are working with operating system and
| compiler developers to add additional support for this feature,
| so please get in touch if this is of interest to you"
| staticassertion wrote:
| I've heard very promising things about pointer authentication.
| tyingq wrote:
| It would take the heat off for mitigating buffer overflow CVEs
| in a rushed way. There are many of those that give remote code
| execution, so typically a frenzied patching exercise. A little
| more time to do the patching in a more deliberate way would be
| nice.
| staticassertion wrote:
| I think in some cases this would effectively mitigate a
| vulnerability entirely. If you require control over the
| return address you're basically shit out of luck. A buffer
| overflow at that point is going to have to target some other
| function pointer or data, which may not be feasible in a
| given function.
| ComputerGuru wrote:
| These slides should help:
| https://llvm.org/devmtg/2019-10/slides/McCall-Bougacha-arm64...
| spijdar wrote:
| Very useful, depending on the implementation and potential
| trade-offs. If the performance is good, this is a nice extra
| layer that makes return-oriented programming more difficult.
| Combined with NX bits, it really raises the difficulty in
| developing/using many types of exploits.
|
| (it's not impossible to bypass, I'm vaguely aware it's been
| done on Apple's new chips that implement a similar (the same?)
| ARM extension, but there's no perfect security)
| pram wrote:
| Yup arm64e in general has pointer authentication, iOS and
| MacOS already implement it.
| philistine wrote:
| It's incredible how Apple went from a laggard in new
| technologies to a trailblazer.
|
| I guess that's the power of a corporation on the scale of
| an 18th century trade monopoly.
| pjmlp wrote:
| Except Oracle did it first with SPARC ADI, Solaris on
| SPARC is one of the few UNIXes that have tamed C for
| couple of years now.
|
| And actually pointer tagging architectures go all the way
| back to the early 1960's, with Burroughs being one of the
| first ones having a go at it.
| zokier wrote:
| It's not much of trailblazing if nobody follows on the
| trail.
| pjmlp wrote:
| Indeed, Apple is following the trail of Burroughs, iAPX
| 432, MPX, SPARC ADI, Lisp Machines, among others.
| Accujack wrote:
| Performance is what I wonder about. The idea sounds good, but
| what crypto scheme can perform encryption of a signature both
| securely and fast enough to keep up with every pointer pushed
| on the stack?
|
| What's the trade-off?
| staticassertion wrote:
| > but what crypto scheme can perform encryption of a
| signature both securely and fast enough
|
| XOR, I assume.
|
| https://pure.tugraz.at/ws/portalfiles/portal/37604654/ind_b
| r...
|
| > On average, encoding addresses and verifying them at each
| indirect branch using the dedicated blraaz and braaz
| instructions yields a performance overhead of 1.50%. The
| protection of the link between indirect control-flow
| transfers induces a runtime overhead of 0.83% on average.
| For the combination of both protection mechanism, we
| measured an average performance overhead of 2.34%.
| my123 wrote:
| https://eprint.iacr.org/2016/444.pdf is the cipher used
| for pointer auth on Arm-designed cores.
| staticassertion wrote:
| Wonderful, thank you.
| MaxBarraclough wrote:
| This isn't something I know a lot about but it sounds like the
| idea of a _shadow stack_ [0] but implemented with crypto. See
| also _Intel CET_ (for _Control-flow Enforcement Technology_ ).
| [1]
|
| [0] https://en.wikipedia.org/wiki/Shadow_stack
|
| [1]
| https://www.intel.com/content/www/us/en/developer/articles/t...
| [deleted]
| ljhsiung wrote:
| Pointer authentication has been around for several years
| already. As with many things in hardware, though, it takes time
| for the software ecosystem around it to mature. Still, I've
| found it to be quite influential.
|
| Here are a couple "real world" examples--
|
| Project Zero had a blogpost about some of the weaknesses on the
| original Pointer Auth spec [0], and even had a follow up [1].
|
| Here is an example of what some mitigation might look like,
| showing how gets(), which is a classically trivially vulnerable
| primitive, becomes not-so-trivial (but still feasible enough to
| do in a blogpost, obviously) [2].
|
| Cost-wise, in terms of both hardware and software, it's rather
| cheap. The hardware to support this isn't too expensive, about
| on par with a multiplier. On the software end, like I said,
| it's taken some time to mature and gotten to a pretty good
| state IMO, with basically all compilers providing simple usage
| since 2019-- just turn on a flag!
|
| ARM also did a performance vs. ROP gadget reduction analysis
| [3]. The takeaway is, as others have mentioned, while it
| doesn't completely mitigate, it does heavily increase the
| complexity for rather cheap.
|
| In fact, I'm rather annoyed Amazon didn't include this feature
| on Graviton2, and to claim it as new or innovative on their end
| feels just like marketing speak. Any CPU that claims to be
| ARMv8.5-a compliant *must* have this feature, and that's been
| around for quite a few years now.
|
| [0]: https://googleprojectzero.blogspot.com/2019/02/examining-
| poi...
|
| [1]: https://bazad.github.io/presentations/BlackHat-
| USA-2020-iOS_...
|
| [2]: https://blog.ret2.io/2021/06/16/intro-to-pac-arm64/
|
| [3]:
| https://developer.arm.com/documentation/102433/0100/Applying...
| aclelland wrote:
| So that's the c6g and c7g but x86 instances are still on c5. Will
| AWS ever release an x86 computing instance again or is this just
| a sign that x86 has reached peak performance on AWS?
| messe wrote:
| They did. A month ago.
|
| https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-ec...
| aclelland wrote:
| I missed this. Thanks!
| pella wrote:
| (29 NOV 2021) "New - Amazon EC2 M6a Instances Powered By 3rd
| Gen AMD EPYC Processors"
|
| https://aws.amazon.com/blogs/aws/new-amazon-ec2-m6a-instance...
|
| _" Up to 35 percent higher price performance per vCPU versus
| comparable M5a instances, up to 50 Gbps of networking speed,
| and up to 40 Gbps bandwidth of Amazon EBS, more than twice that
| of M5a instances."_
|
| _" Larger instance size with 48xlarge with up to 192 vCPUs and
| 768 GiB of memory, enabling you to consolidate more workloads
| on a single instance. M6a also offers Elastic Fabric Adapter
| (EFA) support for workloads that benefit from lower network
| latency and highly scalable inter-node communication, such as
| HPC and video processing."_
|
| _" Always-on memory encryption and support for new AVX2
| instructions for accelerating encryption and decryption
| algorithms"_
| aclelland wrote:
| Missed this one too. Awesome, thanks.
___________________________________________________________________
(page generated 2021-11-30 23:01 UTC)