[HN Gopher] The first two custom silicon chips designed by Micro...
___________________________________________________________________
The first two custom silicon chips designed by Microsoft for its
cloud
Author : buildbot
Score : 104 points
Date : 2023-11-15 16:02 UTC (6 hours ago)
(HTM) web link (www.theverge.com)
(TXT) w3m dump (www.theverge.com)
| spandextwins wrote:
| So does NVIDIA, just turns out NVIDIA can profit more because of
| their software and the software ecosystem around it adds so much
| value, nobody can compete. It's gonna take a lot of work and many
| years to approach that. Even by leveraging AI. And by diverging
| with their own chips, they're gonna miss out on the mainstream.
| ugh123 wrote:
| I don't know much about the software-side of nvidia/gpus +
| LLMs. Can you catch me up on what software they've created
| means as a differentiator? Is that CUDA? How does this relate
| to things like tensorflow with Google's chips?
| jedberg wrote:
| It was only a matter of time. Google announced theirs years ago,
| Amazon announced theirs last year.
|
| Right now NVIDIA has the lead because they have the better
| software, but they can't make the chips fast enough. Will be
| interesting to see if their better software continues to keep
| them in the lead or if people are more interested in getting the
| capacity in any form.
| rehitman wrote:
| I would say software is very important. We already have tons of
| different models, standards, library, etc. I usually have
| smooth experience if I am using NVDA, but any other variation,
| I have to spend some times to get things started.
|
| Supporting the common librarires that I use is very important
| for me to chose the cloud platform.
| Jagerbizzle wrote:
| Considering that Jensen is on stage with Satya at the moment
| sharing the keynote of Microsoft Ignite, I suspect NVIDIA won't
| be going anywhere anytime soon.
| paulpan wrote:
| A bit surprised to see Jensen's stage appearance since
| clearly Microsoft's success with its own AI chips means less
| business for Nvidia's chips.
|
| Because different than the ARM chip also announced in the
| same Ignite event, Microsoft doesn't exactly "need" nor can
| fully utilize an AI chip. Google trains its foundational
| models (e.g. Gemini) on its own TPU hardware but Microsoft's
| is heavily reliant on OpenAI for its generative AI serving
| needs.
|
| Unless Microsoft is planning to acquire OpenAI fully and
| switch over from Nvidia hardware...
| echelon wrote:
| > Unless Microsoft is planning to acquire OpenAI fully
|
| They're going to play a modified version of the old
| Rareware trick.
|
| It's also a pretty great game to buy up OpenAI equity,
| which ultimately gets spent on Microsoft compute. Two
| birds, one stone.
| WanderPanda wrote:
| I don't think the margins here are so considerable that
| we can assume revenue = profit
| echelon wrote:
| They can be losing money, but gaining in market share and
| moat.
|
| Almost nobody in this game cares about profit right now.
| kmeisthax wrote:
| ...they're going to overbid on a studio that was actively
| falling apart, after being rebuffed from buying one of
| the biggest giants in the business[0], all as part of an
| ill-advised attempt to muscle into a game business they
| didn't understand?
|
| [0] Microsoft tried to buy Nintendo very early on
| aseipp wrote:
| Microsoft absolutely runs their own models on their own
| hardware, at scale, and they have done so for years just
| like every other hyperscaler -- Project Brainwave was first
| publicly talked about as far back as 2018. The generative
| LLM craze is a recent phenomenon in comparison. They are
| absolutely going to go all in on putting AI functionality
| in Bing, in Excel, in Windows, etc etc. To do that, you
| need hardware.
|
| None of this is really strange. It also wasn't strange when
| Google announced H100 systems while also pushing TPUs they
| developed. Microsoft has Jensen on stage because customers
| of Microsoft Azure demand Nvidia products. Customers of
| Google Cloud demand Nvidia products. So, they provide them
| those products, because not providing them loses those
| customers. It's that simple. Everyone involved in these
| deals acknowledges this.
| buildbot wrote:
| Awesome! Someone who knows about Brainwave!
| aseipp wrote:
| Yeah! Did/do you work on it? The original publications
| were good timing; I was working as a consultant on an
| FPGA-based ML accelerator at the time the original stuff
| was talked about, and I really enjoyed reading everything
| I could about Brainwave! Really neat project from both a
| system design perspective (e.g. heterogeneous compiler)
| to the choice of using and interconnecting FPGAs and
| integrating the network/software/ML stack (IIRC, there
| was a good paper on the overlay network they used to make
| those custom functions available on the global network
| fabric.)
|
| I'm guessing at this point the ASICs make a lot more
| economic sense, though. :)
| rapsey wrote:
| nVidia still has the monopoly on training. Everyone else is
| just making chips for inference.
| muro wrote:
| https://cloud.google.com/tpu/docs/training-on-tpu-pods
| adrian_b wrote:
| While NVIDIA provides the best absolute performance for
| training, Intel (i.e. Gaudi) already provides much better
| training performance per dollar.
|
| The funny thing is that this fact has been shown
| inadvertently by NVIDIA:
|
| https://www.servethehome.com/nvidia-shows-intel-
| gaudi2-is-4x...
| sgillen wrote:
| That article shows that it takes about 50x as long to train
| gpt-3 with intel's offering vs Nvidia. At least in the
| current environment, if you are training llms I think
| almost no amount of cost savings can justify that.
| adrian_b wrote:
| That 50X is only if you can afford one thousand NVIDIA
| H100.
|
| There cannot be more than a handful of companies in the
| entire world that could afford such a huge price (tens of
| millions of $).
|
| In comparison with a still extremely expensive cluster of
| 64 NVIDIA H100, the difference in speed would reduce to
| only two to three times, and paying several times less
| for the entire training becomes very attractive.
| jedberg wrote:
| Amazon announced Trainum last year and recently announced
| their partnership with Anthropic, where Claude will be
| trained on Trainium.
|
| https://press.aboutamazon.com/2023/9/amazon-and-anthropic-
| an...
| belval wrote:
| > Amazon announced theirs last year
|
| Inferentia (inf1) was GA'ed in December 2019 so it's actually
| almost 4 years old now. The trainium (trn1) chips and the
| Inferentia 2 (inf2) refresh is indeed 1 year old though.
| jedberg wrote:
| I was referring to trainium.
| ShamelessC wrote:
| Yikes those names are horrible.
| bonton89 wrote:
| They'll sound better after we hear Microsoft's names.
| cameronh90 wrote:
| Microsoft Azure Inference for Cloud Apps 365 Pro Live
| Series X
| Someone wrote:
| I think you mean
|
| Microsoft Azure(tm) Inference for Cloud Apps(c) 365
| Pro(r) Live Series X(tm)(r)
| mdaniel wrote:
| I'm still _so bitter_ about this nonsense
| https://www.microsoft.com/en-
| us/security/business/identity-a...
|
| > Azure Active Directory is now Microsoft Entra ID
|
| ok, geez, thanks
| bee_rider wrote:
| I don't really know what an active directory is, but I
| assume that the default type of directory is a passive
| one, in that it just holds files or subdirectory (it
| doesn't act). An active directory sounds like a directory
| that is going to play tricks on me.
|
| Entra ID sounds like a type of ID.
|
| I'm not sure how something could legitimately have each
| of these names. I assume the functionality changed pretty
| dramatically over the lifespan of the product?
| fragmede wrote:
| Active Directory is Microsoft's database targeted mostly
| for user login and associated data.
| belval wrote:
| Why? Inferentia => inference, trainium => training. Given
| the usually naming of AWS product, having one where the
| name roughly matches what it does is pretty good?
|
| TPU is pretty good but is associated with Google. MTIA is
| an acronym but still maps to what the chip does. ~~"Cobalt"
| is worse as it does not mean anything~~ . Cobalt is the CPU
| chip, MAIA is the accelerator so this matches Meta's
| naming.
| ShamelessC wrote:
| > Why? Inferentia => inference, trainium => training.
|
| Funny, that's precisely why I think the names are bad.
| It's like if Google had chosen "Search-ola" as their
| name. Way too on the nose and/or lazy. Having said that,
| I don't really care all that much and I imagine that may
| have been the spirit of those who chose the names.
| mdaniel wrote:
| heh, as someone who has to deal with this nonsense all
| day <https://aws.amazon.com/products/> I would for sure
| welcome some straightforward naming. $(echo "AWS Fargate"
| | sed s/Fargate/ServerlessContainerium/)
| organsnyder wrote:
| My previous role was a lot of AWS, and I became convinced
| that the value of an AWS cert was mostly learning how to
| map all of the product names to their actual functions.
| esafak wrote:
| They're searchable and self-explanatory so they're not bad.
| coredog64 wrote:
| Graviton CPU is a year older.
| scarface_74 wrote:
| If they "can't make the chips fast enough" being TSMC's second
| highest volume customer behind Apple and probably second in
| priority, what chance does Microsoft have getting enough of
| TSMCs capacity?
| donatzsky wrote:
| Microsoft does have the benefit of not having any customers
| other than themselves, so volumes are smaller.
| tambre wrote:
| They're more constrained by advanced packaging (CoWoS)
| capacity rather than the manufacturing of the silicon.
| tyfighter wrote:
| But from the pictures, the Maia 100 is also using CoWoS
| packaging. That will be necessary for any new HBM chip.
| kristianp wrote:
| https://archive.is/jzxDG
| pseudosavant wrote:
| Not a lot of information about the chips yet. About 100B
| transistors in the AI chip. For comparison, an RTX 4090 has 76B,
| and an H100 has about 80B. So the Maia chip is pretty massive.
| hulitu wrote:
| This is like performance review based on written line of code.
| Slartie wrote:
| GPUs (and AI chips) are highly parallel, containing thousands
| upon thousands of the same compute units. The performance of
| these chips is very much dependent on having a sheer number
| of transistors to form into as many compute units as
| possible.
|
| If we assume that Microsoft is roughly able to architect
| compute units of a similar performance-to-number-of-
| transistors ratio as nVidia is, then having twice the number
| of transistors should roughly result in twice the
| performance.
|
| That is very different than it is with typical software. If
| you give a programmer who needs to write 100 lines of code to
| solve a given problem 100 more lines to fill, he won't simply
| be able to copy-paste his 100 lines another time and by that
| action be twice as fast at solving whatever problem you
| tasked him with. With GPU compute units, such copy-pasting of
| compute units is exactly what's being done (at least until
| you hit the limits of other resources such as management
| units, memory bandwidth etc.).
| virgildotcodes wrote:
| This seems a little unclear to me. They're saying this isn't
| meant to compete with Nvidia, and that it's more of an ARM based
| CPU?
|
| So more of an SOC a la AWS graviton or Apple Silicon than a pure
| GPU?
| jsnell wrote:
| There's two separate chip announcements here. One is a ML
| accelerator (Maia). The other is an ARM CPU (Cobalt).
| conradev wrote:
| "Microsoft gave few technical details that would allow gauging
| the chips' competitiveness versus those of traditional
| chipmakers"
|
| Clearly. All I got was "using ARM IP" and "TSMC N5"
| buildbot wrote:
| Also the new MX datatypes
| (https://www.opencompute.org/blog/amd-arm-intel-meta-
| microsof...), from the article:
|
| "Manufactured on a 5-nanometer TSMC process, Maia has 105
| billion transistors -- around 30 percent fewer than the 153
| billion found on AMD's own Nvidia competitor, the MI300X AI
| GPU. "Maia supports our first implementation of the sub 8-bit
| data types, MX data types, in order to co-design hardware and
| software," says Borkar. "This helps us support faster model
| training and inference times.""
| conradev wrote:
| Good catch! That is very cool
| minedwiz wrote:
| Looking over another story on this at
| https://www.zdnet.com/article/microsoft-unveils-first-ai-chi...,
| Cobalt seems to be a general purpose ARM CPU.
| monlockandkey wrote:
| Arm in inevitable for the server. It's interesting how now days,
| efficiency/power consumption is a consideration over pure raw
| performance.
| wargames wrote:
| I genuinely don't see how x86 architecture will continue to
| survive the next 10 years. It will of course take longer to
| change home desktop users to new architectures; they will be
| the last segment to switch, but it seems all but inevitable.
|
| BTW, I'm not even speaking to whether x86 can compete at the
| same power per watt... I think it just won't make sense
| financially to be out of sync with the industry.
| jimmaswell wrote:
| I care vastly more about raw performance than energy usage
| for my home systems. I also have good reasons to care about
| the best single core performance. I don't see x86 going away
| that fast.
| magnio wrote:
| FWIW, the current king of single core Geekbench is the M3
| chips. Even the base M3 scores as high as the i9-14900K and
| higher than the Ryzen 9 7950X3D, at less than half their
| TDPs.
| monlockandkey wrote:
| Mobile, desktop, laptop, edge, server. These are the
| domains of compute. 4 out of the 5 domains value power
| efficiency. Laptop that were once x86 are now coming round
| to Arm because it really does make a better product i.e
| battery life and thermals. For the server, savings in
| energy and cost of chip manufacturing, datacentres and
| users both benefit.
| adrian_b wrote:
| Even if it should be possible to design Arm CPUs competitive
| with the x86 CPUs, there are a lot of application domains for
| which no vendor of Arm CPUs has ever attempted to make
| competitive Arm CPUs.
|
| For example, for scientific computation and computer-aided
| design, Fujitsu is the only company that has designed Arm
| CPUs that can compete with the x86 CPUs, but they do not sell
| their CPUs on the free market.
|
| For a huge company, the floating-point performance of the
| CPUs is less important, because they can use datacenter GPUs
| with even greater throughput, so the existing Arm server CPUs
| could be good enough even for a supercomputer, as they only
| have to move the data to and from the GPUs. However the small
| businesses and the individuals cannot use datacenter GPUs,
| which have huge prices, so they can use only x86 CPUs and
| there is not the slightest chance of any alternative that
| would appear soon.
|
| Another application domain for which no Arm vendor has ever
| made competitive devices is for cheap personal computers.
|
| Nothing what Apple does matters, because they do not sell
| computers, they only lend computers that remain under their
| control and which are much more expensive than their
| alternatives anyway.
|
| Besides Apple, only Qualcomm, Mediatek and NVIDIA are able to
| make Arm CPUs with a performance similar to the cheapest of
| the Intel and AMD CPUs, but all these 3 companies demand for
| their CPUs prices that are several times higher than the
| prices of comparable x86 CPUs.
|
| Like for CPUs with high floating-point or big integer
| performance, there is not the slightest chance for the
| appearance of any company that would be willing to sell Arm
| CPUs that are both cheap and fast.
|
| Also for server CPUs, all the companies that have attempted
| to design Arm-based server CPUs have never designed models
| suitable for small businesses or individuals, but only models
| that can be bought only by very big companies.
|
| I would not mind to switch from x86 to Arm, but there is
| absolutely no perspective for that.
|
| If the x86 CPUs would disappear, that would be a catastrophe
| for the people who do not want to depend on the mercy of the
| big companies. That would be a return to the times from
| before the personal computers, when all computing had to be
| done remotely, in the computing centers of big companies,
| which have been renamed now as "clouds".
| wmf wrote:
| Grace is an ARM HPC CPU.
|
| I agree that Qualcomm/Mediatek/Rockchip/Nvidia pricing is
| really terrible but I guess prices don't matter when
| there's almost no demand anyway.
| hulitu wrote:
| > I genuinely don't see how x86 architecture will continue to
| survive the next 10 years.
|
| ARM is ok only for reasonable performance at low power (if we
| forget about VIA).
| bee_rider wrote:
| Most of the nodes I see every day are still x86. But I'm in an
| academic environment, maybe things are slower over here. Does
| ARM actually seem to have legs outside? (Other than, like,
| nodes subsidized by Amazon's wish to in-house everything they
| can).
| monlockandkey wrote:
| It's going to take time, but momentum is seriously starting
| to build up now. Laptop market going to pick up with
| Snapdragon X and cloud providers are going to continue with
| more powerful designs.
| bacchusracine wrote:
| But will these run Linux, run AI stuff the way the Apple
| Silicon seems to be able to do?
|
| Because right now I'm looking to save up for a majorly
| spec'd Apple MacbookPro just to be able to do this stuff on
| a *nix operating system. I have no great love for Apple but
| the abilities of their chips and the vast software
| offerings are tempting this Linux guy in that direction.
|
| Something that Microsoft cannot seem to do any more. I used
| Windows from 3.x-WinME; NT3.51-WinXP, getting off before
| Vista. What I've seen since then has done nothing to tempt
| me back to their side. Since I unfortunately must deal with
| Windows 10 at work, it definitely reinforces my distaste
| for their systems....
|
| So despite thinking OSX has been rendered ugly for the past
| ten years now, I'm still thinking heavily in that
| direction, even with the high costs. Snapdragon X sounds
| nice enough but I have zero expectations based on past
| behavior at those getting decent Linux support any time
| soon. And no one else seems to even be trying, that one
| Thinkpad aside.
| bee_rider wrote:
| Microsoft has taken a couple swings at making ARM laptops
| (which, we should note, doesn't appear to be what they
| are announcing here).
|
| I'd expect a future hypothetical Microsoft ARM laptop to
| be like a surface-RT; some Windows dropped on a third
| party ARM chip. Microsoft is a software company, after
| all. So it is more of a matter of, do they happen to have
| bought a chip that supports Linux (probably yes, because
| what hardware manufacturer wants to be dependent on one
| company for OS support?) and can you get past Secureboot
| (probably yes, after a couple years at least, when the
| jailbreak happens).
| swozey wrote:
| I converted my corp apps to ARM (Fargate Graviton) last year
| and our AWS bill plummeted and the time to fully initialize a
| container did so as well.
|
| I'd never tell the higher ups this but it was pretty easy, too.
| I'll let them bask in my glory of saving the company
| $60k/month.
| andy_xor_andrew wrote:
| Damn, Microsoft is playing all sides of this, huh.
|
| Within a ten minute window:
|
| - Satya announced GPT-4 runs (at least partly) on a new AMD
| offering
|
| - Satya announced an in-house chip for ML acceleration
|
| - Satya brings NVidia CEO Jensen Huang on stage
|
| they've got every horse in the race, huh
|
| (disclaimer, I work for MS but all the stuff talked about here
| far is waaaay above my paygrade haha, and all brand new info to
| me)
| onlyrealcuzzo wrote:
| Microsoft really likes being valued as a growth stock...
|
| Obviously they're going to play every angle.
| kyboren wrote:
| > - Satya brings NVidia CEO Jensen Huang on stage
|
| This should be ringing alarm bells at FTC and DoJ.
|
| You know what's even better than trust busting and breaking up
| cartels? Preventing the formation of cartels and trusts in the
| first place.
| scarface_74 wrote:
| Raising alarm bells because Microsoft is actually dealing
| with multiple companies?
|
| Would it be better for competition if Microsoft only used one
| supplier?
| aiman3 wrote:
| NVIDIA should start offer AI cloud, buy DigitalOcean,
| counterattack microsoft, google, amazon, since they are attacking
| NVIDIA's territory.
| stanac wrote:
| They already have gaming cloud, wouldn't be unthinkable to
| offer gpgpu cloud.
| solardev wrote:
| It's a hell of a cloud too. Geforce Now performs several
| times better than Microsoft's crappy offering, xCloud (or
| whatever it's called now).
|
| Nvidia made some really amazing strides in the past few
| years, taking over cloud gaming where Onlive and Stadia
| utterly failed, making DLSS, etc.
|
| I just hope they don't abandon us gamers for their AI stuff
| :( Probably the entire gaming market is way smaller than the
| potential AI market, just hopefully not too small to matter.
| gary_0 wrote:
| > Microsoft said it does not plan to sell the chips
|
| Add it to the list of things you can't buy at any price, and can
| only rent. That list is getting pretty long, especially if you
| count "any electronic device you can't fully control or modify".
| ronsor wrote:
| Google does the same thing with their TPUs. The masses will be
| left with the NVidia monopoly, while large companies will be
| able to free themselves from that.
| wmf wrote:
| MI300 is coming.
| machinekob wrote:
| This time AMD for sure will fight with NV (its only failed
| 20 times already copium)
| wmf wrote:
| On one hand this is a fair prediction but Triton exists
| now and it didn't exist last time.
| FuriouslyAdrift wrote:
| December 6 launch date:
|
| https://ir.amd.com/news-events/press-
| releases/detail/1168/am...
| llm_nerd wrote:
| nvidia is a $1.2 trillion dollar company (the 6th largest
| company by cap), and at this point AI is a huge component of
| that wealth. It has appreciated by 3.3x since just the
| beginning of this year.
|
| If any of these companies _truly_ made competitive silicon
| they absolutely would commercialize it.
|
| I suspect they aren't as competitive as the press releases
| hold them to be, and this Microsoft entrant is likely to
| follow the same path. Like Google, Tesla, Amazon and others
| it seems mostly an initiative to negotiate discounts from
| nvidia.
|
| It would be great if there were really competition. When
| Google was hyped about their Tensor chips they did have a
| period where they were looking to commercialize it, and there
| are some pretty crappy USB products they sell.
| jsnell wrote:
| They are commercializing the silicon, by selling access to
| it on their clouds.
|
| Now, I know that what you actually mean is selling the
| chips themselves to third parties :) But it's not obvious
| that there's any point to it given their already existing
| model of commercializing the chips.
|
| First, literally everyone is already supply-constrained due
| to limits on high end foundry capacity. Nvidia has a ton of
| capacity because they're one of TSMC's top two customers.
| The big tech companies will have much smaller allocations
| which are used up just supplying their own clouds. Even if
| the demand for buying these chips rather than renting were
| there, they just don't have the chips to sell without
| losing out on the customers who want to rent capacity.
|
| Second, the chips by themselves are probably not all that
| useful. A lot of the benefit is coming from the
| silicon/system/software co-design. (E.g. the TPUv4 papers
| spent as much attention on the optical interconnect as the
| chips). Selling just chips or accelerator cards wouldn't do
| much good to any customers. Nor can they just trust that
| systems integrators could buy the cards and build good
| systems to house them in. They need to sell and support
| massive large scale custom systems to third parties. That's
| not a core competency for any of them, it'll take years to
| build up that org if you start now. And it means they need
| to ship the software to the customers, it can't continue
| being the secret sauce any more.
|
| Nvidia on the other hand has been building up an ecosystem
| and organization for exactly this for the last decade.
| scarface_74 wrote:
| > Nvidia has a ton of capacity because they're one of
| TSMC's top two customers. The big tech companies will
| have much smaller allocations which are used up just
| supplying their own clouds.
|
| And TSMCs top customer is not even playing in the cloud
| space.
| aleph_minus_one wrote:
| > The masses will be left with the NVidia monopoly, while
| large companies will be able to free themselves from that.
|
| My bet: if it really becomes clear what capabilities an AI
| accelerator chip needs _and_ lots of people want to run (or
| even train) AIs _on their own_ computers, AI accelerators
| will appear at the market. This is how capitalism typically
| works.
|
| My further bet: these AI accelerators will initially come
| from China.
|
| Just look at the history of Bitcoin: initially the blocks
| were mined on CPUs, but then the miners switched to GPUs and
| "everybody" was complaining about increasing GPU prices
| because of all the Bitcoin mining. At some moment, Bitcoin
| mining ASICs appeared from China and after those spread, GPUs
| were not attractive anymore for Bitcoin mining (of course the
| cryptocurrency fans who bought the GPUs for mining attempted
| to use their investment for mining other cryptocurrencies).
| brucethemoose2 wrote:
| The capital costs are enormous, not even counting the CUDA
| moat. It takes years to start producing a big AI processor.
|
| Yet many startups and existing designers anticipated this
| demand correctly, years in advance, and they are all
| _still_ kinda struggling. Nvidia is massively supply
| constrained. AI customers would be buying up MI250s, CS-2s,
| IPUs, Tenstorrent accelerators, Gaudi 2s and so on en masse
| if they wanted to... But they are not, and its not going to
| get any easier once the supply catches up.
|
| Unless there's a big one in stealth mode, I think we are
| stuck with the hardware companies we have.
| solardev wrote:
| Is there not a distributed computing potential here like
| there was for crypto mining? Some sort of seti@home/boinc
| like setup where home users can donate or sell compute
| time?
| latchkey wrote:
| The capex/opex is different for ML/AI than it was for
| crypto mining. Totally different hardware profiles.
| fragmede wrote:
| you can setup a computer and sell time on it on a couple
| of saas platforms, but only for inference. for training,
| the slowness of the interconnect between nodes become a
| bottleneck.
| solardev wrote:
| I see, thanks for the explanation!
| brucethemoose2 wrote:
| Yes, see projects like the AI Horde and Petals. I highly
| recommend the Horde in particular.
|
| Theres also some kind of actual AI crypto project that I
| wouldn't touch with a 10 foot pole.
|
| But ultimately, even if true distribution like Petals
| figures out the inefficiency (and thats hard), it had the
| same issue as non Nvidia hardware: its not turnkey.
| aleph_minus_one wrote:
| > Yet many startups and existing designers anticipated
| this demand correctly, years in advance, and they are all
| _still_ kinda struggling.
|
| As I already hinted in my post: I see a huge problem in
| the fact that in my opinion it still is not completely
| clear to this day which capabilities an AI accelerator
| really needs - too much is in my opinion still in a state
| of flux.
| brucethemoose2 wrote:
| The answer is kinda "whatever Nvidia implements."
| Research papers literally build around their hardware
| capabilities.
|
| A good example of this is Intel canceling, and AMD
| sidelining, their unified memory CPU/GPU chips for AI.
| They are super useful!.. In theory. But actually, they
| totally useless because no one is programming frameworks
| with unified memory SoCs in mind, as Nvidia does not make
| something like that.
| tomcam wrote:
| > My bet: if it really becomes clear what capabilities an
| AI accelerator chip needs and lots of people want to run
| (or even train) AIs on their own computers, AI accelerators
| will appear at the market.
|
| My bet: in 6 months jart will have models running on local
| or server, with support for all platforms and using only
| 88K of ram ;)
| layer8 wrote:
| Sounds like an opportunity for hardware startups.
| wmf wrote:
| There are over a dozen AI hardware startups.
| layer8 wrote:
| Sounds like they are taking the opportunity.
| Tempest1981 wrote:
| Waiting to be assimilated by the big guys
| scarface_74 wrote:
| Hardware startups aren't going to stand a chance because they
| have to fight for the scraps of capacity that is left over
| after Apple and NVidia and the cloud providers use what they
| can.
| cjdrake wrote:
| You can buy the Ampere chips:
| https://www.adlinktech.com/en/ampere-based-solution
| er4hn wrote:
| This is a custom chip that they are making. I don't think that
| they should be required to sell it, but if others find it
| valuable you could expect to see hardware startups making their
| own RISC-V AI chips as well that you could buy.
| tmikaeld wrote:
| Another AI chip made by TSMC, they make all of them?
| jon-wood wrote:
| TSMC manufacture (more or less) all of them. There's very few
| companies in the world capable of manufacturing high
| performance chips.
| dragontamer wrote:
| That's like complaining that all books are made by Penguin
| Press or something, ignoring the effort individual authors
| make.
|
| Most of the value of chips is in their design, which is owned
| by different entities. Manufacturing is important too (only
| TSMC can make these advanced designs at scale and at lower
| costs than the competition).
|
| The question I have is if Cobalt has any innovations in its
| design, or if its just bog-standard ARM Neoverse cores. Its not
| too big of a deal to download ARM's latest designs and slap
| them into... erm... your designs. But hopefully Microsoft added
| value somewhere along the road (The Uncore remains important:
| cache sizes, communications, and the like).
| adrian_b wrote:
| Microsoft claims that Cobalt has a much lower power
| consumption than any other Arm CPUs that they have used.
|
| Presumably this means that Cobalt has a much lower power
| consumption than the current Ampere CPUs used by Azure.
|
| Most of the power consumption reduction for a given
| performance may have come from using a more recent TSMC
| process, together with a more recent Arm Neoverse core, but
| perhaps there might be also some other innovation in the MS
| design.
| solardev wrote:
| It's not a matter of giving due credit, but supply
| constraints. Books aren't limited by the availability of
| printing presses, are they? (maybe they are and I just didn't
| know?)
|
| But if TSMC is the only company that can do this, they're a
| bottleneck for the entire world. Not to mention a strategic
| and geopolitical risk for the West.
|
| It's be nice if some domestic companies invested in fabs
| again...
| lnsru wrote:
| I wish I could work in some team designing these chips. Maia is
| probably my dream product to work on. Super new, super cool and
| one of it's kind.
| geodel wrote:
| I mean that looks cool and exciting if it is really small
| colocated team if one is lead engineer or director of large
| team of engineers so that they can learn do things that
| interest them and assign boring/routine work to individuals.
| Otherwise it would be just another job where people work on
| assigned JIRA stories and go home in evening.
| buildbot wrote:
| Well, azure devops in this case ;)
| aleph_minus_one wrote:
| > I wish I could work in some team designing these chips. Maia
| is probably my dream product to work on. Super new, super cool
| and one of it's kind.
|
| You likely became bewitched by their glamorous marketing side.
| I'd bet that the real work that the team does is very similar
| to the work that basically _every_ ASIC design team does.
| asdfman123 wrote:
| Think of all the requirements meetings!
| hulitu wrote:
| > Maia is probably my dream product to work on.
|
| I bet you haven't used any Microsoft product before. /s
| neilv wrote:
| Initially looks pretty, but will mess you up:
| https://en.wikipedia.org/wiki/Cobalt#Health_issues
| opcode84 wrote:
| The Microsoft chip has MX data types:
| https://news.ycombinator.com/item?id=37930663
|
| https://arxiv.org/abs/2310.10537
| buildbot wrote:
| Yes! Great catch :)
| xnx wrote:
| Chip manufacturers (including Nvidia) really missed where the
| market was going if customers like Microsoft, Amazon, etc. feel
| the need to make their own chips.
| adgjlsfhk1 wrote:
| I think they got the direction right but the price wrong. They
| are used to dealing with super-computers as the main server
| clients who aren't big enough to fight back if the prices creep
| to high.
| airstrike wrote:
| Microsoft, Amazon, etc. feel the need to make their own chips
| so that they don't let NVIDIA take all the profits, not because
| they think NVIDIA is incompetent
| vb-8448 wrote:
| My guess is that it's more about the wish of cloud vendors to
| control everything from the hw to sw: it's called vertical
| integration, and it's common in a lot of businesses.
|
| It makes a lot of sense from the point of view of cloud giants.
| abraae wrote:
| Or cloud vendors have decided that at their scale owning their
| own chips represents a valuable differentiation opportunity,
| and they don't think of them as commodities.
| User3456335 wrote:
| The demand for chips has increased so much that it's profitable
| for these customers to start producing their own chips. This
| doesn't mean Nvidia's chips are bad or that they missed
| anything.
| dragontamer wrote:
| Alternatively: ARM has just made Neoverse designs that easy to
| use that no one feels like going through another middleman
| anymore.
| lotsofpulp wrote:
| As far as I understand, Nvidia does not manufacture chips, they
| only design them and create the software.
| justapassenger wrote:
| It's just part of the cycle. For new paradigm, you have
| companies jumping to build their own things. Later it unifies
| and you get few main leaders.
|
| It wasn't that long ago that computer manufacturers would build
| their own chips.
| solardev wrote:
| Nvidia rode a gaming high from RTX straight into a crypto high
| and then straight into the AI high. Their products just print
| money right now and nobody else is close yet. They can always
| lower prices later, but for now they're getting filthy rich...
| MR4D wrote:
| Every day we get closer and closer to the hypothetical "five
| computers". [0] Software was first, and now every one of them
| have hardware too.
|
| Amazon
|
| Apple
|
| Goole
|
| Microsoft
|
| [0] - https://engines.egr.uh.edu/episode/1059
| asdfman123 wrote:
| Five mainframes, maybe. I've got at least five terminals for
| those mainframes attached to, or inside of, my TV alone.
| tibbydudeza wrote:
| Does it run Linux or the Windows for Azure (definitely not a run
| of the mill Windows for DataCenter).
| kmod wrote:
| I thought an interesting point was the liquid cooling -- unclear
| how important this is to them, but I'm guessing it means that
| they designed it with a TDP that requires liquid cooling.
|
| This (wanting higher density) is the opposite of the trade-off
| that I was expecting. In my (limited and out of date) experience,
| power was the limiting factor before space, and I believe AI
| racks have very high power draws already.
|
| I would have guessed this would be because larger nodes would be
| better for AIs tight communication patterns, but they
| specifically call out datacenter space as the constraint. Curious
| if anyone knows more about this
| asdfman123 wrote:
| What would be even more impressive would be making the chip in
| the US so we weren't so completely reliant on TMSC.
| ThinkBeat wrote:
| I am a little confused on all this. I thought conventional wisdom
| was that creating "super good/ top performant " computer
| processors was hard, and unless you could produce them at scale
| prohibitable expensive.
|
| Is this done as a bridge until/if Nvidia is able to deliver their
| processors fast enough?
|
| I would think that competitors to Nvidea would have serious
| competitors on the market already if competing can be done by
| Microsoft for whom producing hardware is not their main business
| focus.
| p1esk wrote:
| It remains to be seen how well these chips will compete with
| Nvidia.
| s0rce wrote:
| Presumably Microsoft won't be making the chips, I guess they
| will also be made by TSMC.
| thetrb wrote:
| Why would Microsoft not produce them at scale? With Azure and
| internal use cases I'm sure they're at a good scale.
| karmasimida wrote:
| AI, or creation of AI, will very soon leave the hands of common
| folks at this rate. The cost will soon unattainable.
| er4hn wrote:
| This is likely to be cyclical though. Fast cars[1], large
| amounts of processing power[2], access to cryptographic
| algorithms, are all things that started out expensive to be on
| the cutting edge, some still are expensive on the cutting edge,
| but then became more affordable for the consumer over time. AI
| has already had explorations on training models with limited
| resources. It's feasible, just has tradeoffs that will
| hopefully get better over time.
|
| [1] A 1970 Corvette 427 has a 0 - 60 mph of 5.3 seconds (src:
| https://www.caranddriver.com/features/g15379023/the-
| chevrole...) and cost around $44k inflation adjusted dollars.
| You can buy a 2008 Nissasn 350Z Enthusiast that will do it in
| 5.2 (src: https://www.zeroto60times.com/vehicle-
| make/nissan-0-60-mph-t...?) for around $13k today.
|
| [2] I'm too lazy to calculate relative cost / cycle in old
| warehouse computers vs phones but it's gotten _better_.
| mackid wrote:
| "People who are really serious about software should make their
| own hardware." - Alan Kay
___________________________________________________________________
(page generated 2023-11-15 23:00 UTC)