[HN Gopher] Nvidia sheds almost $600B in market cap, biggest one...
___________________________________________________________________
Nvidia sheds almost $600B in market cap, biggest one-day loss in US
history
Author : mfiguiere
Score : 154 points
Date : 2025-01-27 21:13 UTC (1 hours ago)
(HTM) web link (www.cnbc.com)
(TXT) w3m dump (www.cnbc.com)
| ChrisArchitect wrote:
| More discussion: https://news.ycombinator.com/item?id=42839650
| KarmaArchitect wrote:
| Thank you.
| booleandilemma wrote:
| Are you two related?
| pinkmuffinere wrote:
| Pet-peeve: I hate titles like this, "biggest loss in history" is
| meaningless unless it's inflation-adjusted, and it almost never
| is. Is this a bigger loss in real terms? Or is this just a big
| number? I'm guessing it's the second.
|
| Edit: I read the article, it's definitely the second case. Is
| this _also_ the biggest loss in real terms? I don't know.
| Terr_ wrote:
| Similarly, "new record for highest-grossing film", or "new
| record for number of popular votes received."
| lotsofpulp wrote:
| "xyz company [in business that historically earns low single
| digit profit margins] reports highest profits"
| paxys wrote:
| And "record profits" were actually less than previous
| year's profits + inflation.
| paulddraper wrote:
| Hey, they clearly just keep making more and more popular
| movies.
|
| /s
|
| FYI for the curious: the highest inflation-adjusted lifetime
| grossing film is Gone with the Wind.
| magicalhippo wrote:
| I like the "fastest growing app/magazine/whatever" when the
| competition has been there for 50+ years.
| coliveira wrote:
| Well, this is what the media does, it is a low hanging fruit
| that will attract readers. The reality is that NVDA price was
| way too high and would fall anyway. This was just a catalyst.
| amazingamazing wrote:
| Counterpoint - generally larger companies should be less
| susceptible to the type of volatility that leads to the title,
| so I still think it's newsworthy. Even if you were to change
| the title to largest percentage single-day loss amount among 3
| trillion+ companies it would still be true (might even be true
| amount 2+ trillion as well)
| Retric wrote:
| Inflation adjusted is still limited to the largest companies.
| It's simply using 2025 dollars for companies rather than
| comparing 1950 or whatever dollars vs 2025 dollars.
| amazingamazing wrote:
| but inflation adjusted doesn't make any sense. things are
| less volatile now, there's high speed trading, better
| knowledge, easier trades, more automation, etc. it wouldn't
| make any sense.
|
| even now compared to 2020 there has been a huge change in
| amount of retail investors.
| Retric wrote:
| Things aren't less volatile in terms of extreme events.
| Look in depth at say the 2010 flash crash one of the all
| time great examples of very short term volatility.
|
| Similarly, longer term we still see huge shifts such as
| the COVID dip before all that money flooded the markets.
| amazingamazing wrote:
| i'm talking about the strictly among the biggest
| companies
| walterbell wrote:
| Would inflation change the ranking?
|
| _> For Nvidia, the loss was more than double the $279 billion
| drop the company saw in September, which was the biggest one-
| day market value loss in history at the time, unseating Meta's
| $232 billion loss in 2022. Before that, the steepest drop was
| $182 billion by Apple in 2020._
| seizethecheese wrote:
| Not compared to these recent losses, but compared to all
| history it's very possible, the value drops by orders of
| magnitude as you go back in time.
| Andrex wrote:
| It's not meaningless to people who accept that the vast
| majority of these reportings _always_ use non-adjusted numbers
| and usually call out when that isn 't the case.
| grajaganDev wrote:
| This could be an extinction level event for some VCs.
| echelon wrote:
| Only if their portcos don't build products people buy.
|
| Making a pure-research, foundation model company is silly. Make
| a product company that sells products.
| grajaganDev wrote:
| Google and Microsoft both bundled their AI offering into
| their office suites because they weren't getting traction as
| an add-on.
|
| Startups don't have that option.
| echelon wrote:
| They did that because
|
| 1) Their initial AI offerings weren't real products
| customers would use or pay for
|
| 2) They weren't seeing sufficient adoption to justify the
| expense
|
| 3) They have insane levels of distribution in their
| existing product lines and can incrementally add AI
| features
|
| This is entirely orthogonal to whether or not other
| startups can build AI-first products or whether they can
| position themselves to compete with the giants.
| 2OEH8eoCRo0 wrote:
| Still overvalued IMO. Their market cap remains ludicrous.
| m3kw9 wrote:
| Everything is overvalued if you think in terms of earning
| multiples, p/s, p/e should be 1:1
| andrewmcwatters wrote:
| Who's out here buying businesses for 1x the sales revenue
| volume? What a silly concept. If businesses could be so
| cheap, you'd just double down every single year until you
| owned every business on the planet.
| dinkblam wrote:
| 1:1? so that dividends cover the share price in the first
| year of ownership?
| justahuman74 wrote:
| Earning multiples choose an arbitrary time length of 1 year.
|
| What you're really trying to purchase is a machine that
| creates more money than it uses. You need to guess at if that
| machine will do its job at an arbitrary point in the future,
| and how well it will do it. Those factors are only loosely
| correlated with current PE
| phyzix5761 wrote:
| Discounted future cashflows. If you buy an asset and every
| year it produces $100 profit for you it's worth more than
| $100. You're buying the ability to produce profits in the
| future not the profits its produced in the past. Those
| profits belong to the shareholders who have cashed that out
| already (through dividends or reinvestment).
| saulpw wrote:
| No, that means that they're earning enough in one year to
| cover their entire valuation. You want something like 10:1
| p/e which means that the next 10 years earnings are factored
| in to cover their present valuation.
| h1h1hh1h1h1 wrote:
| So the Chinese graciously gift a paper and model which describes
| methods that radically increase the efficiency of hardware which
| will allow US AI firms to create much better models due to having
| significantly more AI hardware and people are bearish on US AI
| now?
| baal80spam wrote:
| Don't look for logic in the market, I suppose.
| magic_hamster wrote:
| I think the idea that SOTA models can run on limited hardware
| makes people think that Nvidia sales will take a hit.
|
| But if you think about it for two more seconds you realize that
| if SOTA was trained on mid level hardware, top of the line
| hardware could still put you ahead, and DeepSeek is also open
| source so it won't take long to see what this architecture
| could do on high end cards.
| amazingamazing wrote:
| there's no reason to believe that performance will continue
| to scale with compute, though. _that 's_ why there's a rout.
| more simply, if you assume maximum performance with the
| current LLM/transformer architecture is say, twice as good as
| what humanity is capable of now, then that would mean that
| you're approaching 50%+ performance with orders of magnitude
| less compute. there's just no way you could justify the
| amount of money being spent on nvidia cards if that's true,
| hence the selloff.
| futureshock wrote:
| Wait no, there is actually PLENTY of evidence that
| performance continues to scale with more compute. The
| entire point of the o3 announcement and benchmark results
| of throwing a million bucks of test time compute at ARC-AGI
| is that the ceiling is really really high. We have 3
| verified scaling laws of pre-training corpus size,
| parameter count, and test time compute. More efficiency is
| fantastic progress, but we will always be able to get more
| intelligence by spending more. Scale is all you need.
| DeepSeek did not disprove that.
| amazingamazing wrote:
| there's evidence that performance increases with compute,
| but not that it _scales_ with compute, e.g. linearly or
| exponentially. the SOTA models already are seeing
| diminishing returns w.r.t parameter size, training time
| and generally just engineering effort. it 's a fact that
| doubling, say, parameter size does not double benchmark
| performance.
|
| would love to see evidence to the contrary. my assertion
| comes from seeing claude, gemini and o1.
|
| if anything I feel performance is more of a function of
| the quality of data than anything else.
| bcrosby95 wrote:
| If people are bullish on Nvidia because the hot new thing
| requires tons of Nvidia hardware and someone releases a paper
| showing you need 1/45th of Nvidia's hardware to get the same
| results, of course there's going to be pullback.
|
| Whether its justified or not is outside my wheelhouse. There's
| too many "it depends" involved that, best case, only people
| working in the field can answer, worst case, no one can answer
| right now.
| pokstad wrote:
| Or you could argue you can now do 45x greater things with the
| same hardware. You can take an optimistic stance on this.
| c0redump wrote:
| Except it's not clear at all that this is actually the
| case. It's entirely conjecture on your part.
| cortesoft wrote:
| For the overall economy, sure... for Nvidia, no
|
| A huge increase in fuel efficiency is great for the
| economy, horrible for fuel companies
| DiscourseFan wrote:
| No, because what this implies is that the Chinese have better
| labor power in the tech-sector than the US, considering how
| much more efficient this technology is. Which means that even
| if US companies adopt these practices, the best workers will
| still be in China, communicating largely in Chinese, building
| relationships with other Chinese-speaking people purchasing
| chinese speaking labor. These relationships are already
| present. It would be difficult for OpenAI to catch up.
| brokencode wrote:
| What a stretch. One Chinese model makes a breakthrough in
| efficiency and suddenly China has all the best people in the
| world?
|
| What about all the people who invented LLMs and all the
| necessary hardware here in the US? What about all the models
| that leapfrog each other in the US every few months?
|
| One breakthrough implies that they had a great idea and
| implemented it well. It doesn't imply anything more than
| that.
| jonatron wrote:
| I can't say about how good they are, but over 400,000 CS
| graduates in China [1] per year sounds like a lot.
| https://www.ctol.digital/news/chinas-it-boom-slows-
| computer-...
| tokioyoyo wrote:
| Chinese tech companies are also investing into AI. DeepSeek
| team isn't the only one (and probably the least funded
| one?) within mainland. This is mostly a challenge to the
| "American AI is yeas ahead" illusion, and a show that maybe
| investing only in American companies isn't the smartest
| method, as others might beat them in their own game.
| bilbo0s wrote:
| I think it's probably more accurate to say that people are now
| a bit more bullish on what the Chinese will be able to
| accomplish even in the face of trade restrictions. Now whether
| or not it makes sense to be bearish on US AI is a totally
| different issue.
|
| Personally I think being bearish on US AI makes zero sense. I'm
| almost positive there will be restrictions on using Chinese
| models forthcoming in the near to medium term. I'm not saying
| those restrictions will make sense. I'm just saying they will
| steer people in the US market towards US offerings.
| paulddraper wrote:
| US AI is only somewhat related though.
|
| The subject is NVIDIA.
| dragonwriter wrote:
| I think the market perception of NVidia's value is
| currently heavily driven by the expected demand for
| datacenter chips following anticipated trendlines of the
| big US AI firms; I think DeepSeek disrupted that (I think
| when the implications of greater value per unit of compute
| applied to AI are realized, it will end up being seen as
| beneficial to the GPU market in general and, barring a big
| challenge appearing in the very near future, NVidia
| specifically, but I think that's a slower process.)
| nine_k wrote:
| Not the AI proper, but the need for _additional_ AI hardware
| down the line. Especially the super-expensive, high-margin,
| huge AI hardware that DeepSeek seems not to require.
|
| Similarly, microcomputers led to an explosion of computer
| market, but definitely limited the market for mainframe
| behemoths.
| paulddraper wrote:
| https://www.reddit.com/r/investing/comments/1ib5vf9/deepseek...
|
| They did it by using H800 chips, not H100 or B200 or anything
| crazy.
|
| This means NVIDIA may not be the only game in town.
|
| E.g. Chinese manufacturers.
| paxys wrote:
| You can be bullish about US AI but at the same time not believe
| that the industry is worth $10T+ right now.
| linkregister wrote:
| > the Chinese
|
| Daya Guo, Dejian Yang, Haowei Zhang, et.al., quant researchers
| at High Flyer, a hedge fund based in China, open-sourced their
| work on a chain-of-thought reasoning model, based on Qwen and
| LLama (open source LLMs).
|
| It would be somewhat bizarre to describe Meta's open sourcing
| of LLama as "the Americans gifting a model", despite Meta
| having a corporate headquarters in the United States.
| skizm wrote:
| I like the chart Bloomberg has of the top 10 largest single day
| stock drops in history. 8 out of the 10 are NVDA (Meta and Amazon
| are the other two).
| culi wrote:
| Why mention it if you're not gonna link it?
| paxys wrote:
| IMO this is less about DeepSeek and more that Nvidia is
| essentially a bubble/meme stock that is divorced from the reality
| of finance and business. People/institutions who bought on
| nothing but hype are now panic selling. DeepSeek provided the
| spark, but that's all that was needed, just like how a vague
| rumor is enough to cause bank runs.
| onlyrealcuzzo wrote:
| Hard to argue Nvidia is a meme stock and that Tesla is not a
| bigger meme stock.
|
| If meme stocks were imploding, why is Tesla fine?
|
| This is about DeepSeek.
| paxys wrote:
| What does any of this have to do with Tesla? Even if Tesla is
| a bigger bubble, not all bubbles have to pop at the same
| time.
| coliveira wrote:
| Tesla is playing the political game with Trump. They're
| riding that wave. Musk always find some new reason for people
| to believe the stock.
| pokstad wrote:
| The market can stay irrational longer than you can stay
| solvent.
| tgtweak wrote:
| Hype buyers are also Hype sellers - anything Nvidia was last
| week is exactly what it is this week - DeepSeek doesn't really
| have any impact on Nvidia sales - Some argument could be made
| that this can shift compute off of cloud and onto end user
| devices, but that really seems like a stretch given what I've
| seen running this locally.
| Analemma_ wrote:
| I agree hype is a big portion of it, but if DeepSeek really
| has found a way to train models just as good as frontier ones
| for a hundredth of the hardware investment, that is a
| substantial material difference for Nvidia's future earnings.
| zozbot234 wrote:
| > if DeepSeek really has found a way to train models just
| as good as frontier ones for a hundredth of the hardware
| investment
|
| Frontier models are heavily compute constrained - the
| leading AI model makers have got _way_ more training data
| already than they could do anything with. Any improvement
| in training compute-efficiency is great news for them, no
| matter where it comes from. Especially since the DeepSeek
| folks have gone into great detail wrt. documenting their
| approach.
| rpcope1 wrote:
| > leading AI model makers have got way more training data
| already than they could do anything with.
|
| Citation needed.
| echelon wrote:
| 1. Nobody has replicated their DeepSeek's results on their
| reported budget yet. Scale.ai's Alexander Wang says they're
| lying and that they have a huge, clandestine H100 cluster.
| HuggingFace is assembling an effort to publicly duplicate
| the paper's claims.
|
| 2. Even if DeepSeek's budget claims are true, they trained
| their model on the outputs of am expensive foundation model
| built from a massive capital outlay. To replicate these
| results, it might require an expensive model upstream.
| throwup238 wrote:
| Is it? Training is only done once, inference requires GPUs
| to scale, especially for a 685B model. And now, there's an
| open source o1 equivalent model that companies can run
| locally, which means that there's a much bigger market for
| underutilized on-prem GPUs.
| zozbot234 wrote:
| The full DeepSeek model is ~700B params or so - _way_ too
| large for most end users to run locally. What some folks are
| running locally is fine-tuned versions of Llama and Qwen,
| that are not going to be directly comparable in any way.
| dgemm wrote:
| I think less of that and more of real risks - Nvidia
| legitimately has the earnings right now. The question is how
| sustainable that is, when most of it is coming from 5 or so
| customers that are both motivated and capable of taking back
| those 90% margins for themselves
| jhickok wrote:
| Regarding their earnings at the moment, I know it doesn't
| mean everything, but a ~50 P/E is still fairly high, although
| not insane. I think Ciscos was over 200 during the dotcom
| bubble. I think your question about the 5 major customers is
| really interesting, and we will continue to see those
| companies peck at custom silicon until they can maybe bridge
| the gap from just running inference to training as well.
| belevme wrote:
| I don't think it's fair to say NVDA is meme stock, having
| reported 35B revenue last quarter.
| master_crab wrote:
| True but with that revenue number it would mean that before
| today it was valued at ~100x revenue. That's pretty bubbly.
| acchow wrote:
| Thats 100x quarterly revenue, or 25x annual revenue.
| danpalmer wrote:
| I'd say it's a meme stock and based on meme revenue. Much of
| the 35B comes from the fact that companies believe Nvidia
| make the best chips, and that they have to have the best
| chips or they'll be out of the game.
|
| DeepSeek supposedly nullifies that last part.
| buffington wrote:
| Didn't DeepSeek train on Nvidia hardware though?
|
| I can't see how DeepSeek hurts Nvidia, if Nvidia is what
| enables DeepSeek.
| amazingamazing wrote:
| that's not entirely relevant.
|
| the simplest way to present the counter argument is:
|
| - suppose you could train the best model with a single
| H100 for an hour. would that hurt or harm nvidia?
|
| - suppose you could serve 1000x users with a 1/1000 the
| amount of gpus. would that hurt or harm nvidia?
|
| the question is how big you think the market size is, and
| how fast you get to saturation. once things are saturated
| efficiency just results in less demand.
| paxys wrote:
| Nvidia's annual revenue in 2024 was $60B. In comparison,
| Apple made $391B. Microsoft made $245B. Amazon made $575B.
| Google made $278B. And Nvidia is worth more than all of them.
| You'd have to go _very_ far down the list to find a company
| with a comparable ratio of revenue or income to market cap as
| Nvidia.
| nl wrote:
| Nvidia's revenue growth rate was 94% and income growth rate
| was 109% for the Oct 2024 quarter. This compares to Apple's
| 6% and -35%.
|
| Nvidia is growing profits faster than income.
|
| Nvidia's net profit margin is 55% (vs Apple 15%) and they
| have an operating income of $21B vs Apple's $29.5
|
| These are some pretty impressive financial results - those
| growth rates are the reason people are bullish on it.
| paxys wrote:
| Yes revenue has grown xx% in the last quarter and year,
| but the stock is valued as if it will keep growing at
| that rate for years to come and no one will challenge
| them. That is the definition of a bubble.
|
| How sound is the investment thesis when a bunch of online
| discussions about a technical paper on a new model can
| cause a 20% overnight selloff? Does Apple drop 20% when
| Samsung announces a new phone?
| amazingamazing wrote:
| to be fair, there's no way these rates will be sustained
| for a decade.
| segasaturn wrote:
| That's the thing. Nvidia's future growth has been
| potentially kneecapped by R1's leaps in efficiency.
| numba888 wrote:
| P/E ratio is better indicator. Price/Earnings. NVidia: 46,
| Microsoft: 35, Apple: 34, Amazon: 50.
|
| As you see NVidia doesn't stand out much, it's even lower
| than Amazon.
| energy123 wrote:
| This is a cookie cutter comment that appears to have been copy
| pasted from a thread about Gamestop or something. DeepSeek R1
| allegedly being almost 50x more compute efficient isn't just a
| "vague rumor". You do this community a disservice by commenting
| before understanding what investors are thinking at the current
| moment.
| paxys wrote:
| Has anyone verified DeepSeek's claims about R1? They have
| literally published one single paper and it has been out for
| a week. Nothing about what they did changed Nvidia's
| fundamentals. In fact there was no additional news over the
| weekend or today morning. The entire market movement is
| because of a single statement by DeepSeek's CEO from over a
| week ago. People sold because other people sold. This is
| exactly how a panic selloff happens.
| energy123 wrote:
| They have not verified the claims but those claims are not
| a "vague rumor". Expectations of discounted cash flows,
| which is primarily what drives large cap stock prices,
| operates on probability, not strange notions of "we must be
| absolutely certain that something is true".
|
| A credible lab making a credible claim to massive
| efficiency improvements is a credible threat to Nvidia's
| future earnings. Hence the stock got sold. It's not more
| complicated than that.
| KiwiJohnno wrote:
| Not a true verification but I have tried the Deepseek R1 7b
| model running locally, it runs on my 6gb laptop GPU and the
| results are impressive.
|
| Its obviously constrained by this hardware and this model
| size as it does some strange things sometimes and it is
| slow (30 secs to respond) but I've got it to do some
| impressive things that GPT4 struggles with or fails on.
|
| Also of note I asked it about Taiwan and it parroted the
| official CCP line about Taiwan being part of China, without
| even the usual delay while it generated the result.
| jdietrich wrote:
| The weights are public. We can't verify their claims about
| the amount of compute used for training, but we can
| trivially verify the claims about inference cost and
| benchmark performance. On both those counts, DeepSeek have
| been entirely honest.
| codingwagie wrote:
| No the reality of AI models fundamentally changed
| segasaturn wrote:
| Correct, Nvidia has been on this bubble-like tragectory since
| before the stock was split last year. I would argue that
| today's drop is a precursor to a much larger crash to come.
| KiwiJohnno wrote:
| Not quite, I believe this sell off was caused by DeepSeek
| showing with their new model that the hardware demands of AI
| are not necessarily as high as everyone has assumed (as
| required by competing models).
|
| I've tried their 7b model, running locally on a 6gb laptop GPU.
| Its not fast, but the results I've had have rivaled GPT4. Its
| impressive.
| xiphias2 wrote:
| I believe you that it had to do with the selloff, but I
| believe that efficiency improvements are good news for
| NVIDIA: each card just got 20x more useful
| amazingamazing wrote:
| each card is not 20x more useful lol. there's no evidence
| yet that the deepseek architecture would even yield a
| substantially (20x) more performant model with more
| compute.
|
| if there's evidence to the contrary I'd love to see. in any
| case I don't think a h800 is even 20x better than a h100
| anyway, so the 20x increase has to be wrong.
| jdietrich wrote:
| We need GPUs for inference, not just training. The Jevons
| Paradox suggests that reducing the cost per token will
| increase the overall demand for inference.
|
| Also, everything we know about LLMs points to an entirely
| predictable correlation between training compute and
| performance.
| amazingamazing wrote:
| the jevons paradox isn't about any particular product or
| company's product, so is irrelevant here. the relevant
| resource here is _compute_ , which is already a
| commodity. secondly, even if it were about GPUs in
| particular, there's no evidence that nvidia would be able
| to sustain such high margins if fewer were necessary for
| equivalent performance. things are currently supply
| constrained, which gives nvidia price optionality.
| Scoundreller wrote:
| Uhhh, isn't it about coal?
| segasaturn wrote:
| That still means that that AI firms don't have to buy as
| many of Nvidia's chips, which is the whole thing that
| Nvidia's price was predicated on. FB, Google and Microsoft
| just had their their billions of dollars in Nvidia GPU
| capex blown out by $5M side-project. Tech firms are
| probably not going to be as generous shelling out whatever
| overinflated price Nvidia was asking for as they were a
| week ago.
| chgs wrote:
| Imagine what you can do with all that Nvidia hardware
| using the deep mind techniques.
| siwakotisaurav wrote:
| None of the models other than the 600b one are R1. They're
| just prev gen models like llama or qwen trained on r1 output
| making them slightly better
| doctorpangloss wrote:
| Yeah but the second comment you see believes they are, and
| belief is truth when it comes to stock market gambling.
| httpz wrote:
| Nvida has a P/E of 47. While it may be a bit high for a
| semiconductor company, it's definitely not a meme stock figure.
| antigeox wrote:
| I just want to know if I can buy a gaming video card at a
| reasonable price or if i should hold off on it. I don't care
| about the AI shit. And yes I'd prefer nvidia because their
| closest competitor can't tape a box together nevermind develop
| and assemble a graphics card.
| oblio wrote:
| Unlikely. Companies have doubled down on AI so expect the fall
| to be long and slow (think 2 years).
| dtquad wrote:
| DeepSeek has humiliated the entire US tech sector. I wonder if
| they will learn from this, fire their useless middle management
| and product managers with sociology degrees, and actually pivot
| to being technology companies?
| browningstreet wrote:
| "Again, just to emphasize this point, all of the decisions
| DeepSeek made in the design of this model only make sense if
| you are constrained to the H800; if DeepSeek had access to
| H100s, they probably would have used a larger training cluster
| with much fewer optimizations specifically focused on
| overcoming the lack of bandwidth."
|
| https://stratechery.com/2025/deepseek-faq/
| 66yatman wrote:
| Under deepseek R1 running on 6 Mac mini alone sure Nvidia can
| feel the pressure
| lvl155 wrote:
| But Apple was up...indexing at its finest.
| zozbot234 wrote:
| Not surprising there - a maxed out Mac Studio is a great AI
| homelab, giving you way more bang for the buck than nVidia
| offerings.
| jhickok wrote:
| Think that was the cause of their stock increase? I feel like
| investors use opportunities like this to pile money into
| safer bets rather than just bail on stocks altogether.
| kiratp wrote:
| This is really dumb.
|
| Deepseek showing that you can do pure online RL for LLMs means we
| now have a clear path to just keep throwing more compute at the
| problem! If anything we made the whole "we are hitting a data
| wall" problem even smaller.
|
| Additionally, its yet another proof point that scaling inference
| compute is a way forward. Models that think for hours or days are
| the future.
|
| As we move further into the regime of long sequence inference,
| compute scales by the square of the sequence length.
|
| The lesson here was not "training is going to be cheaper than we
| thought". It's "we must construct additional pylons uhhh _PUs"
|
| Markets remain irrational and all that...
| aurareturn wrote:
| Nvidia chip demand should increase from DeepSeek's release.
|
| This market doesn't make any sense.
| zozbot234 wrote:
| Didn't DeepSeek also show that pure RL leads to low-quality
| results compared to also doing old-fashioned supervised
| learning on a "problem solving step by step" dataset? I'm not
| sure why people are getting excited about the pure-RL approach,
| seems just overly complicated for no real gain.
| deepsquirrelnet wrote:
| If I'm understanding their paper correctly (I might not be
| but I've spent a little time trying to understand it), they
| showed you only need a small amount of supervised fine tuning
| "SFT" to "seed" the base model, followed by pure RL. Pure RL
| only was their R1-zero model which worked, but produces weird
| artifacts like switching languages or excessive repetition.
|
| The SFT training data is hard to produce, while the RL they
| used was fairly uncomplicated heuristic evaluations and not a
| secondary critic model. So their RL is a simple approach.
|
| If I've said anything wrong, feel free to correct me.
| jaggs wrote:
| The real hidden message is not that bigger compute produces
| better results, but that the average user probably doesn't need
| the top results.
|
| In the same way that medium range laptops are now 'good enough'
| for most people's needs, medium range (e.g. DeepSeek R1x) AI will
| probably be good enough for most business and user needs.
|
| Up till now everyone assumed that only giga-sized server farms
| could produce anything decent. Doesn't seem to be true any more.
| And that's a problem for mega-corps maybe?
| jdietrich wrote:
| _> medium range (e.g. DeepSeek R1x) AI will probably be good
| enough for most business and user needs_
|
| Except R1 isn't "medium range" - it's fully competitive with
| SOTA models at a fraction of the cost. Unless you need
| multimodal capability or you're desperate to wring out the last
| percentage point of performance, there's no good reason to use
| a more expensive model.
|
| The real hidden message is that we're still barely getting
| started. DeepSeek have completely exploded the idea that LLM
| architecture has peaked and we're just in an arms race for more
| compute. 100 engineers found an order of magnitude's worth of
| low-hanging fruit. What will other companies will be able to do
| with a similar architectural approach? What other
| straightforward optimisations are just waiting to be
| implemented? What will R2 look like if they decide to spend
| $60m or $600m on a training run?
| jaggs wrote:
| Yes absolutely. I guess I meant medium range in terms of dev
| and running costs. R1 is a premium product at a corner store
| price. :)
|
| People are also forgetting that High-Flyer's ultimate goal is
| not applications, it's AGI. Hence the open source. They want
| to accelerate that process out in the open as fast as they
| can.
| Kon-Peki wrote:
| If you look at total volume of shares traded, this would be
| somewhere in the range of 200th highest.
|
| If you look at the total monetary value of those shares traded,
| this would be in the top 5, all of which have happened in the
| past 5 years. #1 is probably Tesla on Dec 18 2020 (right before
| it joined the S&P500). It lost ~6% that day.
|
| Don't get me wrong, this is definitely a big day. Just not "lose
| your mind" big. It's clear that most shareholders just sat things
| out.
| kd913 wrote:
| The biggest discussion I have been on having this is the
| implications on Deepseek for say the RoI H100. Will a sudden
| spike in available GPUs and reduction in demand (from efficient
| GPU usage) dramatically shock the cost per hour to rent a GPU.
| This I think is the critical value for measuring the investment
| value for Blackwell now.
|
| The price for a H100 per hour has gone from the peak of $8.42 to
| about $1.80.
|
| A H100 consumes 700W, lets say $0.10 per kwh?
|
| A H100 costs around $30000.
|
| Given deepseek, can the price of this drop further given a much
| larger supply of available GPUs can now be proven to be unlocked
| (Mi300x, H200s, H800s etc...).
|
| Now that LLMs have effectively become commodity, with a
| significant price floor, is this new value ahead of what is
| profitable for the card.
|
| Given the new Blackwell is $70000, is there sufficient
| applications that enable customers to get a RoI on the new card?
|
| Am curious about this as I think I am currently ignorant of the
| types of applications that businesses can use to outweigh the
| costs. I predict that the cost per hour of the GPU dropping such
| that it isn't such a no-brainer investment compared to
| previously. Especially if it is now possible to unlock potential
| from much older platforms running at lower electricity rates.
| jmward01 wrote:
| So, what are investors thinking to warrant this? If it is
| 'DeepSeek means you don't need the compute' that is definitely
| wrong. Making a more efficient x almost always leads to more of x
| being sold/used, not less. In the long term does anyone believe
| we won't keep needing more compute and not less?
|
| I think the market believes that high end compute is not needed
| anymore so the stuff in datacenters suddenly just became 10x
| over-provisioned and it will take a while to fill up that
| capacity. Additionally, things like the mac and AMD unified
| memory architectures and consumer GPUs are all now suddenly able
| to run SOTA models locally. So a triple whammy. The competition
| just caught up, demand is about to drop in the short term for any
| datacenter compute and the market for exotic, high margin, GPUs
| might have just evaporated. At least that is what I think the
| market is thinking. I personally believe this is a short term
| correction since the long term demand is still there and we will
| keep wanting more big compute for a long time.
| amazingamazing wrote:
| selling more does not necessarily mean you make more money.
| more efficiency could lead to less margins even if volume is
| higher.
|
| moreover, even things are incredibly efficient, the bar to
| sufficiently good AI _in practice_ (e.g. applications), might
| be met with commodity compute, pretty much locking nvidia out,
| who generally sells high margin high performance chips to
| whales.
| acchow wrote:
| But the SOTA models basically all suck today. If people don't
| think they suck, definitely in 1 year they'll look back and
| consider those older models unusably bad
| fspeech wrote:
| The key question is: has demand elasticity increased for Nvidia
| cards? An increase in elasticity means people are more willing to
| wait for hardware price to drop because they can do more with
| existing hardware. Elasticity could increase even if demand is
| still growing. Not all growths are equally profitable. Current
| high prices are extremely profitable for Nvidia. If elasticity is
| increasing future growth may not be as profitable as the
| projection from when Deepseek was relatively unknown.
| noname120 wrote:
| Time to buy some Nvidia
| ozten wrote:
| NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng),
| who has previously purchased at least 10,000 A100 shovels... has
| a "side project" where they figured out how to dig really well
| with a shovel and shared their secrets.
|
| The gold rush, wether real or a bubble is still there! NVIDA will
| still sell every shovel they can manufacture, as soon as it is
| available in inventory.
|
| Fortune 100 companies will still want the biggest toolshed to
| invent the next paradigm or to be the first to get to AGI.
| culi wrote:
| Yeah but NVIDIA's amazing digging technique that could only be
| accomplished with NVIDIA shovels is now irrelevant. Meaning
| there are more people selling shovels for the gold rush
| flowerlad wrote:
| A similar efficiency event has occurred in the recent past.
| Blackwell is 25x more energy-efficient for generative AI tasks
| and offer up to 2.5x faster AI training performance overall. When
| Blackwell was announced nobody said "great we will invest less in
| GPUs". Deep Seek is just another efficiency event. Like Blackwell
| it enables you to do more with less.
| numba888 wrote:
| Looks like panic sell off, but main question is:
|
| do models with DeepSeek architecture still scale up?
|
| If yes, then bigger clusters will outperform in near future.
| NVidia wins as tide rises all boats, and them first.
|
| If not, then it's still possible to run several models in
| parallel to do the same, potentially big, job. Just like humans
| team. All we need is to learn how to do it efficiently. This way
| bigger clusters win again.
___________________________________________________________________
(page generated 2025-01-27 23:00 UTC)