[HN Gopher] AWS unveils Graviton4 & Trainium2
       ___________________________________________________________________
        
       AWS unveils Graviton4 & Trainium2
        
       Author : skilled
       Score  : 57 points
       Date   : 2023-11-28 16:41 UTC (6 hours ago)
        
 (HTM) web link (press.aboutamazon.com)
 (TXT) w3m dump (press.aboutamazon.com)
        
       | fhub wrote:
       | Not much to discuss until there is pricing. I have a bunch of
       | Graviton2 instances that didn't make sense to upgrade to any
       | Graviton3 instances due to pricing bump for 16GB 4 cores
       | (t4g.xlarge).
        
       | monlockandkey wrote:
       | What Arm core is Graviton 4 using? 30% performance uplift is a
       | good amount
        
         | cherioo wrote:
         | Likely Neoverse V2 architecture, based on A710 cores
        
         | aeyes wrote:
         | > Neoverse V2
         | 
         | https://aws.amazon.com/blogs/aws/join-the-preview-for-new-me...
        
       | PedroBatista wrote:
       | What happens to old&used Graviton 3 chips?
       | 
       | At least in the "old days" there was ( still is ) a secondary
       | market for used server parts..
       | 
       | Don't know how companies like Amazon, Microsoft and Google would
       | frame a question like this so their "green" narratives wouldn't
       | be hurt but I'm sure they'll do an excellent job.
        
         | rstupek wrote:
         | They'll continue to run in their datacenter since they're still
         | basically brand new?
        
         | baz00 wrote:
         | They're still running prehistoric Intel Xeons. I'm sure they'll
         | just rot slowly until the instances fail.
        
         | threeseed wrote:
         | As a user you don't get much visibility into the specs of
         | managed services e.g. DynamoDB.
         | 
         | So that's an obvious home for the chips that are no longer
         | available to users.
        
         | asperous wrote:
         | If you haven't used aws a lot then you might not know this but
         | the old instance types stick around and you can still use them,
         | especially as "spot" which lets you bid for server time.
         | 
         | I had a science project which was cpu bound and it turns out
         | because people bid based on the performance, the old chips end
         | up costing the same in terms of cpu work done/$ (older chips
         | cost less per hr but do less).
         | 
         | aws though was by far the most expensive so switching to like
         | oracle with their ampere arm was a lot cheaper for me.
        
         | discodave wrote:
         | They just... don't retire them? The most expensive thing in a
         | DC is the chips, so it's worth it to just build more datacenter
         | space and keep the old ones around.
         | 
         | In 2019, before I left the EC2 Networking / VPC team, we were
         | using M3 instances for our internal services... those machines
         | were probably installed in 2013 or 2014, making them over 5
         | years old.
         | 
         | With the slowdown in Moore's law and chip speeds, I'd wager
         | that team is still using those M3s now.
         | 
         | Eventually the machines actually start failing, so they need to
         | be retired, but a large portion of machines likely make it to
         | 10 years.
        
           | temp0826 wrote:
           | They for sure can find a use internally for them. Hat-tip to
           | the less-shiny teams like glacier that have to endlessly put
           | out fires on dilapidated old s3 compute/array handmedowns.
        
         | whalesalad wrote:
         | I can't wait to find old surplus custom ARM silicon from this
         | period at the recycler or on eBay.
         | 
         | As a kid I always wanted one of those yellow google search
         | appliances and now you can find them everywhere being used as
         | like lawn ornaments.
        
         | cjsplat wrote:
         | Depending on the numbers involved, previous generation hardware
         | can waterfall to infrastructure apps that are throughput based.
         | 
         | Things accessed through network APIs and billed per op or in
         | aggregate. Distributed file systems, databases, even build and
         | regression suite systems.
         | 
         | Another key point is that older generations of servers for full
         | custom cloud environments tend to co-evolve with their
         | environments. The amount of power and cooling for a rack may
         | not support a modern deployment.
         | 
         | Especially if a generation lasts 6 years. You might be able to
         | cascade gen N+1 to N, but N+6 may require a full retrofit. A 6
         | year old data center that is partially filled as individual
         | servers fail may justify waiting for N+7 or even 8 to cover the
         | cost of the downtime and retrofit.
         | 
         | There is a reason Google announced that they are depreciating
         | servers over 6 years and Meta is at 5 years, vs the old
         | accounting standard of 3 years.
         | 
         | Then of course there is a secondary market for memory and
         | standard PCI cards, but the market for 6 year old tech is
         | mainly spares, so it is unlikely to absorb the full size of the
         | N-6 year data center build.
         | 
         | If you are considering a refurb style resale market for 6 year
         | old tech, it is often the case that the performance per dollar
         | is a non-starter because of the amount of power the older tech
         | consumes.
        
         | aseipp wrote:
         | They don't sell these. They reuse them and perform maintenance
         | on them until their last breath and part them out once they
         | die.
         | 
         | Hyperscalers design their own datacenter "SKUs" for
         | storage/compute, all the way from power delivery to networking
         | to chassis. These servers are going to be heavily customized
         | and it's unlikely that even if they fit normal form factors
         | that they will work in the same way as COTS devices or things
         | you would buy from Supermicro.
         | 
         | You could possibly make it work. If they sold them. But they
         | don't, and if you're in the market for that stuff, Supermicro
         | will just design it for you anyway, because presumably you have
         | actual money.
         | 
         | And the reality is they're probably either break even or
         | greener doing it this way, as opposed to washing their hands of
         | it and selling servers on Ebay so they can eventually get throw
         | in landfills wholesale by nerds once their startups fail or
         | they get bored of them. Just because you stick your head in the
         | sand doesn't mean it doesn't end up in a landfill.
        
       | buildbot wrote:
       | The scale they are quoting at 100,000 chip clusters and 65
       | exaflops seems impossible. At 800W per chip, that's 80MW of
       | power! Unless they literally built an entire DC of these things,
       | nobody is training anything on the entire cluster at once. It's
       | probably 10-20 separate datacenters being combined for marketing
       | reasons here.
        
         | tempay wrote:
         | What makes you think it's 800W per chip?
        
           | buildbot wrote:
           | It's about what the I though the H100 was, that's 700W
           | actually. But even at say, 400W, that's 40MW of power. I
           | guess some datacenters are built in the 40-100MW range from
           | some quick googling, but I really doubt they actually can
           | network 100,000 chips together in any sort of performant way,
           | that's supercomputer level interconnect. I don't think most
           | datacenters support highly interlinked network interconnect
           | like this would need either.
        
             | tempay wrote:
             | They have instances with 16 chips so I presume there are at
             | least 16 chips per server. I'd also expect the power
             | consumption to be more like 100-200W given they seem more
             | like Google's TPUs than a H100.
             | 
             | For the interconnect I doubt this is their typical
             | interconnect but it doesn't seem completely unreasonable.
             | Even when not running massive clusters they'll still need
             | the interconnect to pair the random collections of machines
             | that people are using.
        
           | bluedino wrote:
           | Per server? Our dual CPU intel servers take about 800-900W at
           | full power
        
       | LunaSea wrote:
       | Since Graviton3 still isn't available in most regions, especially
       | on the RDS side, I'm really not holding my breath.
        
       | ilaksh wrote:
       | Do you need specific software to train a model using Trainium2?
       | For example, what about fine-tuning a language model? Will
       | something like QLoRA work?
        
       | snewman wrote:
       | > Graviton4 processors deliver up to 30% better compute
       | performance, 50% more cores, and 75% more memory bandwidth than
       | Graviton3.
       | 
       | This seems ambiguous. Presumably this is 50% more cores per chip.
       | What about "30% better compute performance" and "75% more memory
       | bandwidth": is that per core, or per chip? If the latter, then
       | per-core compute performance would actually be lower.
       | 
       | Also, "up to" could be hiding almost anything. Has anyone seen a
       | source with clearer information as to how per-core application
       | performance compares to earlier Graviton generations?
        
         | p1esk wrote:
         | Wait, how could "50% more cores, and 75% more memory bandwidth"
         | result in anything less than 50% of better compute performance?
        
           | jagger27 wrote:
           | Clock speed
        
         | bluedino wrote:
         | It's the best Graviton processor yet!
        
         | kevincox wrote:
         | I would assume that "up to" means that for all of the workloads
         | that they benchmarked the best result was 30% better compute
         | performance. Not a very useful number as your workload is very
         | unlikely to hit the right set of requirements to see that
         | uplift.
        
         | otterley wrote:
         | AWS Graviton specialist here!
         | 
         | The performance improvement is on a per-core basis. The pending
         | availability of 96-vCPU Graviton4 instances is icing on the
         | cake!
        
       | aseipp wrote:
       | Neoverse V2, so this will be probably be the first widely
       | available ARMv9 server with SVE2, a server-class SKU you can
       | actually get your hands on (i.e. not a mobile
       | phone/Grace/Fugaku.) It's about damn time!
        
       ___________________________________________________________________
       (page generated 2023-11-28 23:01 UTC)