[HN Gopher] The impact of competition and DeepSeek on Nvidia
___________________________________________________________________
The impact of competition and DeepSeek on Nvidia
Author : eigenvalue
Score : 560 points
Date : 2025-01-25 15:30 UTC (2 days ago)
(HTM) web link (youtubetranscriptoptimizer.com)
(TXT) w3m dump (youtubetranscriptoptimizer.com)
| eigenvalue wrote:
| Yesterday I wrote up all my thoughts on whether NVDA stock is
| finally a decent short (or at least not a good thing to own at
| this point). I'm a huge bull when it comes to the power and
| potential of AI, but there are just too many forces arrayed
| against them to sustain supernormal profits.
|
| Anyway, I hope people here find it interesting to read, and I
| welcome any debate or discussion about my arguments.
| patrickhogan1 wrote:
| Good article. Maybe I missed it, but I see lots of analysis
| without a clear concluding opinion.
| scsilver wrote:
| Wanted to add a preface: Thank you for your time on this
| article, I appreciate your perspective and experience, hoping
| you can help refine and reign in my bull case.
|
| Where do you expect NVDA's forward and current eps to land?
| What revenue drop off are you expecting in late 2025/2026. Part
| of my bull case for NVDA, continuing, is it's very reasonable
| multiple on insane revenue. An leveling off can be expected,
| but I still feel bullish on it hitting $200+ (5 Trillion market
| cap? on ~195B revenue for Fiscal year 2026 (calendar 2025) at
| 33 EPS) based on this years revenue according to their guidance
| and the guidance of the hyperscalers spending. Finding a sell
| point is a whole different matter to being actively short. I
| can see the case to take some profits, hard for me to go short,
| especially in an inflationary environment (tariffs, electric
| energy, bullying for lower US interest rates).
|
| The scale of production of Grace Hopper and Blackwell amaze me,
| 800k units of Blackwell coming out this quarter, is there even
| production room for AMD to get their chips made? (Looking at
| the new chip factories in Arizona)
|
| R1 might be nice for reducing llm inferencing costs, unsure
| about the local llama one's accuracy (couldnt get it to
| correctly spit out the NFL teams and their associated
| conferences, kept mixing NFL with Euro Football) but I still
| want to train YOLO vision models on faster chips like A100's vs
| T4 (4-5x multiples in speed for me).
|
| Lastly, if the Robot/Autonomous vehicle ML wave hits within the
| next year, (First drones and cars -> factories -> humanoids) I
| think this compute demand can sustain NVDA compute demand.
|
| The real mystery is how we power all this within 2 years...
|
| * This is not financial advice and some of my numbers might be
| a little off, still refining my model and verifying sources and
| numbers
| zippyman55 wrote:
| So at some point we will have too many cannon ball polishing
| factories and it will become apparent the cannon ball trajectory
| is not easily improved on.
| j7ake wrote:
| This was an amazing summary of the landscape of ML currently.
|
| I think the title does the article injustice, or maybe it's too
| long for people to read to appreciate it (eg the deepseek stuff
| can be an article within itself).
|
| Whatever the ones with longer attention span will benefit from
| this read.
|
| Thanks for summarising this up!
| eigenvalue wrote:
| Thanks! I was a bit disappointed that no one saw it on HN
| because I think they'd like it a lot.
| j7ake wrote:
| I think they would like it a lot, but I think the title
| doesn't match the content, and it takes too much reading
| before one realises it goes beyond the title.
|
| Keep it up!
| dang wrote:
| We've changed the title to a different one suggested by the
| author.
| metadat wrote:
| The site is currently offline, here's a snapshot:
|
| https://archive.today/y4utp
| diesel4 wrote:
| Link isn't working. Is there another or a cached version?
| eigenvalue wrote:
| Try again! Just rebooted the server since it's going viral now.
| OutOfHere wrote:
| It seems like a pointless discussion since DeepSeek uses Nvidia
| GPUs after all.
| jjeaff wrote:
| it uses a fractional amount of GPUs though.
| breadwinner wrote:
| As it says in the article, you are talking about a mere
| constant of proportionality, a single multiple. When you're
| dealing with an exponential growth curve, that stuff gets
| washed out so quickly that it doesn't end up matter all that
| much.
|
| Keep in mind that the goal everyone is driving towards is
| AGI, not simply an incremental improvement over the latest
| model from Open AI.
| high_na_euv wrote:
| Why do you assume that exponential growth curve is real?
| ithkuil wrote:
| Which due to the Jevons Paradox may ultimately cause more
| shovels to be sold
| cma wrote:
| Their loss curve with the RL didn't level off much though,
| could be taken a lot further and scaled up to more parameters
| on the big nvidia mega clusters out there. And the
| architecture is heavily tuned to nvidia optimizations.
| UltraSane wrote:
| Jevons Paradox states that increasing efficiency can cause an
| even larger increase in demand.
| dutchbookmaker wrote:
| "wait" I suspect we are all in a bit of denial.
|
| When was the last time the US got their lunch ate in
| technology?
|
| Sputnik might be a bit hyperbolic but after using the model
| all day and as someone who had been thinking of a pro
| subscription, it is hard to grasp the ramifications.
|
| There is just no good reference point that I can think of.
| blackeyeblitzar wrote:
| Yep some CEO said they have 50K GPUs of the prior generation.
| They probably accumulated them through intermediaries that are
| basically helping nvidia sell to sanctioned parties by proxy
| idonotknowwhy wrote:
| Deepseek was there side project. They had a lot of GPUs from
| their crypto mining project.
|
| Then Ethereum turned off PoW mining, so they looked into
| other things to do with their GPUs, and started DeepSeek.
| saagarjha wrote:
| Mining crypto on H100s?
| arcanus wrote:
| > Amazon gets a lot of flak for totally bungling their internal
| AI model development, squandering massive amounts of internal
| compute resources on models that ultimately are not competitive,
| but the custom silicon is another matter
|
| Juicy. Anyone have a link or context to this? I'd not heard of
| this reception to NOVA and related.
| simonw wrote:
| I think Nova may have changed things here. Prior to Nova their
| LLMs were pretty rubbish - Nova only came out in December but
| seems a whole lot better, at least from initial impressions:
| https://simonwillison.net/2024/Dec/4/amazon-nova/
| arcanus wrote:
| Thanks! That's consistent with my impression.
| snowmaker wrote:
| This is an excellent article, basically a patio11 / matt levine
| level breakdown of what's happening with the GPU market.
| lxgr wrote:
| Couldn't agree more! If this is the byproduct, these must be
| some optimized Youtube transcripts :)
| eprparadox wrote:
| link seems to be dead... is this article still up somewhere?
| jazzyjackson wrote:
| It's back up, but just in case:
|
| https://archive.is/y4utp
| eigenvalue wrote:
| Sorry, my blog crashed! Had a stupid bug where it was calling
| GitHub too frequently to pull in updated markdown for the posts
| and kept getting rate limits. Had to rewrite it but it should be
| much better now.
| breadwinner wrote:
| Great article but it seems to have a fatal flaw.
|
| As pointed out in the article, Nvidia has several advantages
| including: - Better Linux drivers than AMD
| - CUDA - pytorch is optimized for Nvidia - High-
| speed interconnect
|
| Each of the advantages is under attack: - George
| Hotz is making better drivers for AMD - MLX, Triton, JAX:
| Higher level abstractions that compile down to CUDA -
| Cerbras and Groq solve the interconnect problem
|
| The article concludes that NVIDIA faces an unprecedented
| convergence of competitive threats. The flaw in the analysis is
| that these threats are not unified. Any serious competitor must
| address ALL of Nvidia's advantages. Instead Nvidia is being
| attacked by multiple disconnected competitors, and each of those
| competitors is only attacking one Nvidia advantage at a time.
| Even if each of those attacks are individually successful, Nvidia
| will remain the only company that has ALL of the advantages.
| toisanji wrote:
| I want the NVIDIA monopoly to end, but there is no real
| competition still. * George Hotz has basically given up on AMD:
| https://x.com/__tinygrad__/status/1770151484363354195
|
| * Groq can't produce more hardware past their "demo". It seems
| like they haven't grown capacity in the years since they
| announced, and they switched to a complete SaaS model and don't
| even sell hardware anymore.
|
| * I dont know enough about MLX, Triton, and JAX,
| simonw wrote:
| That George Hotz tweet is from March last year. He's gone
| back and forth on AMD a bunch more times since then.
| bdangubic wrote:
| is that good or bad?
| simonw wrote:
| Honestly I tried searching his recent tweets for AMD and
| there was way too much noise in there to figure out his
| current position!
| zby wrote:
| " we are going to move it off AMD to our own or partner
| silicon. We have developed it to be very portable."
|
| https://x.com/__tinygrad__/status/1879617702526087346
| infecto wrote:
| Honest question. That sounds more difficult that getting
| things to play with commodity hardware. Maybe I am
| oversimplifying it though.
| whizzter wrote:
| They have their own nn,etc libraries so adapting should
| be fairly focused and AMD drivers have a hilariously bad
| reputation historically among people who program GPU's
| (I've been bitten a couple of times myself by weirdness).
|
| I think you should consider it as, if they're trying to
| avoid Nvidia and make sure their code isn't tied to
| NVidia-isms, and AMD is troublesome enough for basics the
| step to customized solutions is small enough to be
| worthwhile for something even cheaper than AMD.
| solarkraft wrote:
| I consider it a good sign that he hasn't completely given
| up. But it sure all seems shaky.
| roland35 wrote:
| The same Hotz who lasted like 4 weeks at Twitter after
| announcing that he'd fix everything? It doesn't really
| inspire a ton of confidence that he can single handedly
| take down Nvidia...
| bfung wrote:
| It looks like he's close to having own AMD stack, tweet
| linked in the article, Jan 15,2025:
| https://x.com/__tinygrad__/status/1879615316378198516
| saagarjha wrote:
| $1000 bounty? That's like 2 hours of development time at
| market rate lol
| htrp wrote:
| We'll check in again with him in 3 months and he'll still
| be just 1 piece away.
| billconan wrote:
| I also noticed that Groq's Chief Architect now works for
| NVIDIA.
|
| https://research.nvidia.com/person/dennis-abts
| Herring wrote:
| He's setting up a case for shorting the stock, ie if the growth
| or margins drop a little from any of these (often well-funded)
| threats. The accuracy of the article is a function of the
| current valuation.
| eigenvalue wrote:
| Exactly. You just need to see a slight deceleration in
| projected revenue growth (which has been running 120%+ YoY
| recently) and some downward pressure on gross margins, and
| maybe even just some market share loss, and the stock could
| easily fall 25% from that.
| breadwinner wrote:
| AMD P/E ratio is 109, NVDA is 56. Which stock is
| overvalued?
| eigenvalue wrote:
| If it were all so simple, they wouldn't pay hedge fund
| analysts so much money...
| pineaux wrote:
| No thats not true. Hedge funds get paid so well because
| getting a small percentage of a big bag of money is still
| a big bag of money. This statement is more true the
| closer the big bag of money is to infinity.
| daveguy wrote:
| That is extraordinarily simplistic. If NVDA is slowing
| and AMD has gains to realize compared to NVDA, then the
| 10x difference in market cap would imply that AMD is the
| better buy. Which is why I am long in AMD. You can't just
| look at the current P/E delta. You have to look at
| expectations of one vs the other. AMD gaining 2x over
| NVDA means they are approximately equivalently valued. If
| there are unrealized AI related gains all bets are off.
| AMD closing 50% of the gap in market cap value between
| NVDA and AMD means AMD is ~2.5x undervalued.
|
| Disclaimer: long AMD, and not precise on percentages.
| Just illustrating a point.
| flowerlad wrote:
| The point is, it should not be taken for granted that
| NVDA is overvalued. Their P/E is low enough that if
| you're going to state that they are overvalued you have
| to make the case. The article while well written, fails
| to make the case because it has a flaw: it assumes that
| addressing just one of Nvidia's advantages is enough to
| make it crash and that's just not true.
| lxgr wrote:
| If investing were as simple as looking at the P/E, all
| P/Es would already be at 15-20, wouldn't they?
| flowerlad wrote:
| Not saying it is as simple as looking at P/E
| lxgr wrote:
| My point is that you have to make the case for _anything_
| being over /undervalued. The null hypothesis is that the
| market has correctly valued it, after all.
| omgwtfbyobbq wrote:
| In the long run, probably yes, but a particular stock is
| less likely to be accurately value in the short run.
| fldskfjdslkfj wrote:
| If medium to long term you believe the space will
| eventually get commoditized I the bear case is obvious.
| And based on history there's a pretty high likelihood for
| that to happen.
| bdangubic wrote:
| glad you are not my financial adviser :)
| lxgr wrote:
| On the other hand, getting a bigger slice of the existing
| cake as a smaller challenger can be easier than baking a
| bigger cake as the incumbent.
| baq wrote:
| Hey let's buy intel
| dismalaf wrote:
| NVDA is valued at $3.5 trillion, which means investors
| think it will grow to around $1 trillion in yearly
| revenue. Current revenue is around $35 billion per
| quarter, so call it $140 billion yearly. Investors are
| betting on a 7x increase in revenue. Not impossible,
| sounds plausible but you need to assume AMD, INTC, GOOG,
| AMZN, and all the others who make GPUs/TPUs either won't
| take market share or the market will be worth multiple
| trillions per year.
| kimbler wrote:
| I thought the valuation of public companies at 3x
| revenues or 5x earnings has long since sailed?
| idonotknowwhy wrote:
| Intel had a great P/E a couple of years ago as well :)
| hmm37 wrote:
| You have to look at non-gaap numbers, and therefore
| looking at forward PE ratios is necessary. When you look
| at that, AMD is cheaper than NVDA. Moreover, the reason
| why AMD PE ratio looks high is because they bought
| xilinx, and in order to save on taxes, it makes their PE
| ratio look really high.
| htrp wrote:
| rofl Forward PE ....
| 2-3-7-43-1807 wrote:
| > The accuracy of the article is a function of the current
| valuation.
|
| ah ... no ... that's nonsense trying to hide behind stilted
| math lingo.
| dralley wrote:
| >So how is this possible? Well, the main reasons have to do
| with software-- better drivers that "just work" on Linux and
| which are highly battle-tested and reliable (unlike AMD, which
| is notorious for the low quality and instability of their Linux
| drivers)
|
| This does not match my experience from the past ~6 years of
| using AMD graphics on Linux. Maybe things are different with
| AI/Compute, I've never messed with that, but in terms of normal
| consumer stuff the experience of using AMD is vastly superior
| than trying to deal with Nvidia's out-of-tree drivers.
| saagarjha wrote:
| They are.
| thousand_nights wrote:
| > George Hotz is making better drivers for AMD
|
| lol
| saagarjha wrote:
| *George Hotz is making posts online talking about how AMD
| isn't helping him
| latchkey wrote:
| George Hotz tried to extort AMD into giving him $500k in
| free hardware and $2m cash, and they politely declined.
| grajaganDev wrote:
| There is not enough water (to cool data centers) to justify
| NVDA's current valuation.
|
| The same is true of electricity - neither nuclear power nor
| fusion will not be online anytime soon.
| lxgr wrote:
| Those are definitely not the limiting factors here.
|
| Not nearly all data centers are water cooled, and there is
| this amazing technology that can convert sunlight into
| electricity in a relatively straightforward way.
|
| AI workloads (at least training) are just about as
| geographically distributeable as it gets due to not being
| very latency-sensitive, and even if you can't obtain
| sufficient grid interconnection or buffer storage, you can
| always leave them idle at night.
| grajaganDev wrote:
| Right - they are not limiting factors, they are reasons
| that NVDA is overvalued.
|
| Stock price is based on future earnings.
|
| The smart money knows this and is reacting this morning -
| thus the drop in NVDA.
| energy123 wrote:
| Solar microgrids are cheaper and faster than nuclear. New
| nuclear isn't happening on the timescales that matter, even
| assuming significant deregulation.
| grajaganDev wrote:
| Can you back up that solar microgrids will supply enough
| power to justify NVDA's current valuation?
| aorloff wrote:
| The unification of the flaws is the scarcity of H100s
|
| He says this and talks about it in The Fallout section - even
| at BigCos with megabucks the teams are starved for time on the
| Nvidia chips and if these innovations work other teams will use
| them and then boom Nvidia's moat is truncated somehow which
| doesn't look good at such lofty multiples
| isatty wrote:
| Sorry, I don't know who George Hotz is, but why isn't AMD
| making better drivers for AMD?
| adastra22 wrote:
| George Hotz is a hot Internet celebrity that has basically
| accomplished nothing of value but has a large cult following.
| You can safely ignore.
|
| (Famous for hacking the PS3-except he just took credit for a
| separate group's work. And for making a self-driving car in
| his garage--except oh wait that didn't happen either.)
| Den_VR wrote:
| You're not wrong, but after all these years it's fair to
| give benefit of the doubt - geohot may have grown as a
| person. The PS3 affair was incredibly disappointing.
| xuki wrote:
| He was famous before the PS3 hack, he was the first person
| to unlock the original iPhone.
| adastra22 wrote:
| Yes, but it's worth mentioning that the break consisted
| of opening up the phone and soldering on a bypass for the
| carrier card locking logic. That certainly required some
| skills to do, but is not an attack Apple was defending
| against. This unlocking break didn't really lead to
| anything, and was unlike the later software unlocking
| methods that could be widely deployed.
| SirMaster wrote:
| Well he also found novel exploits in multiple later
| iPhone hardware/software models and implemented complete
| jailbreak applications.
| hshshshshsh wrote:
| What about comma.ai?
| sebmellen wrote:
| Comma.ai works really well. I use it every day in my car.
| medler wrote:
| He took an "internship" at Twitter/X with the stated goal
| of removing the login wall, apparently failing to realize
| that the wall was a deliberate product decision, not a
| technical challenge. Now the X login wall is more intrusive
| than ever.
| epolanski wrote:
| > Any serious competitor must address ALL of Nvidia's
| advantages.
|
| Not really, his article focuses on Nvidia's being valued so
| highly by stock markets, he's not saying that Nvidia's destined
| to lose its advantage in the space in the short term.
|
| In any case, I also think that the likes of MSFT/AMZN/etc will
| be able to reduce their capex spending eventually by being able
| to work on a well integrated stack on their own.
| madaxe_again wrote:
| They have an enormous amount of catching up to do, however;
| Nvidia have created an entire AI ecosystem that touches
| almost every aspect of what AI can do. Whatever it is, they
| have a model for it, and a framework and toolkit for working
| with or extending that model - _and the ability to design
| software and hardware in lockstep_. Microsoft and Amazon have
| a very diffuse surface area when it comes to hardware, and
| being a decent generalist doesn't make you a good specialist.
|
| Nvidia are doing phenomenal things with robotics, and that is
| likely to be the next shoe to drop, and they are positioned
| for another catalytic moment similar to that which we have
| seen with LLMS.
|
| I do think we will see some drawback or at least deceleration
| this year while the current situation settles in, but within
| the next three years I think we will see humanoid robots
| popping up all over the place, particularly as labour
| shortages arise due to political trends - and somebody is
| going to have to provide the compute, both local and cloud,
| and the vision, movement, and other models. People will turn
| to the sensible and known choice.
|
| So yeah, what you say is true, but I don't think is going to
| have an impact on the trajectory of nvidia.
| csomar wrote:
| > - Better Linux drivers than AMD
|
| Unless something radically changed in the last couple years, I
| am not sure where you got this from? (I am specifically talking
| about GPUs for computer usage rather than training/inference)
| idonotknowwhy wrote:
| > Unless something radically changed in the last couple
| years, I am not sure where you got this from?
|
| This was the first thing that stuck out to me when I skimmed
| the article, and the reason I decided to invest the time
| reading it all. I can tell the author knows his shit and
| isn't just parroting everyone's praise for AMD Linux drivers.
|
| > (I am specifically talking about GPUs for computer usage
| rather than training/inference)
|
| Same here. I suffered through the Vega 64 after everyone said
| how great it is. So many AMD-specific driver bugs, AMD driver
| devs not wanting to fix them for non-technical reasons, so
| many hard-locks when using less popular software.
|
| The only complaints about Nvidia drivers I found were "it's
| proprietary" and "you have to rebuild the modules when you
| update the kernel" or "doesn't work with wayland".
|
| I'd hesitate to ever touch an AMD GPU again after my
| experience with it, haven't had a single hick-up for years
| after switching to Nvidia.
| csomar wrote:
| Wayland was a requirement for me. I've used an AMD GPU for
| years. I had a bug exactly once with a linux update. But
| has been stable since.
| surajrmal wrote:
| Wayland doesn't matter in the server space though.
| cosmic_cheese wrote:
| Another ding against Nvidia for Linux desktop use is that
| only some distributions either make it easy to install and
| keep the proprietary drivers updated (e.g. Ubuntu) and/or
| ship variants with the proprietary drivers preinstalled
| (Mint, Pop!_OS, etc).
|
| This isn't a barrier for Linux veterans but it adds
| significant resistance for part-time users, even those that
| are technically inclined, compared to the "it just works"
| experience one gets with an Intel/AMD GPU under just about
| every Linux distro.
| fragmede wrote:
| they are, unless you get distracted by things like licensing
| and out of tree drivers and binary blobs. If you'd rather
| pontificate about open source philosophy and rights than get
| stuff done, go right ahead.
| yapyap wrote:
| Geohot still at it?
|
| goat.
| willvarfar wrote:
| A new entrant, with an order of magnitude advantage in e.g.
| cost or availability or exportability, can succeed even with
| poor drivers and no CUDA etc. Its only when you cost nearly as
| much as NVidia that the tooling costs become relevant.
| latchkey wrote:
| George is writing software to directly talk to consumer AMD
| hardware, so that he can sell more Tinyboxes. He won't be doing
| that for enterprise.
|
| Cerbras and Groq need to solve the memory problem. They can't
| scale without adding 10x the hardware.
| slightwinder wrote:
| > - Better Linux drivers than AMD
|
| In which way? As a user who switched from an AMD-GPU to Nvidia-
| GPU, I can only report a continued amount of problems with
| NVIDIAs proprietary driver, and none with AMD. Is this maybe
| about the open source-drivers or usage for AI?
| queuebert wrote:
| Don't forget they bought Mellanox and have their own HBA and
| switch business.
| gnlrtntv wrote:
| > While Apple's focus seems somewhat orthogonal to these other
| players in terms of its mobile-first, consumer oriented, "edge
| compute" focus, if it ends up spending enough money on its new
| contract with OpenAI to provide AI services to iPhone users, you
| have to imagine that they have teams looking into making their
| own custom silicon for inference/training
|
| This is already happening today. Most of the new LLM features
| announced this year are primarily on-device, using the Neural
| Engine, and the rest is in Private Cloud Compute, which is also
| using Apple-trained models, on Apple hardware.
|
| The only features using OpenAI for inference are the ones that
| announce the content came from ChatGPT.
| simonw wrote:
| "if it ends up spending enough money on its new contract with
| OpenAI to provide AI services to iPhone users"
|
| John Gruber says neither Apple nor OpenAI are paying for that
| deal: https://daringfireball.net/linked/2024/06/13/gurman-
| openai-a...
| lxgr wrote:
| Mark Gurman (from Bloomberg) is saying that.
| uncletaco wrote:
| When he says better linux drivers than AMD he's strictly talking
| about for AI, right? Because for video the opposite has been the
| case for as far back as I can remember.
| eigenvalue wrote:
| Yes, AMD drivers work fine for games and things like that.
| Their problem is they basically only focused on games and other
| consumer applications and, as a result, ceded this massive
| growth market to Nvidia. I guess you can sort of give them a
| pass because they did manage to kill their archival Intel in
| data center CPUs but it's a massive strategic failure if you
| look at how much it has cost them.
| simonw wrote:
| This is excellent writing.
|
| Even if you have no interest at all in stock market shorting
| strategies there is _plenty_ of meaty technical content in here,
| including some of the clearest summaries I 've seen anywhere of
| the interesting ideas from the DeepSeek v3 and R1 papers.
| eigenvalue wrote:
| Thanks Simon! I'm a big fan of your writing (and tools) so it
| means a lot coming from you.
| punkspider wrote:
| I was excited as soon as I saw the domain name. Even after a
| few months, this article[1] is still at the top of my mind.
| You have a certain way of writing.
|
| I remember being surprised at first because I thought it
| would feel like a wall of text. But it was such a good read
| and I felt I gained so much.
|
| 1: https://youtubetranscriptoptimizer.com/blog/02_what_i_lear
| ne...
| eigenvalue wrote:
| I really appreciate that, thanks so much!
| nejsjsjsbsb wrote:
| I was put off by the domain by bias against something that
| sounds like a company blog. Especially a "YouTube
| something".
|
| You may get more milage from excellent writing on a
| yourname.com. This is a piece that sells you not this
| product, plus it feels more timeless. In 2050 someone my
| point to this post. Better if it were on your own name.
| eigenvalue wrote:
| I had no idea this would get so much traction. I wanted
| to enhance my organic search ranking of my niche web app,
| not crash the global stock market!
| dabeeeenster wrote:
| Many thanks for writing this - its extremely interesting and
| very well written - I feel like I've been brought up to date
| which is hard in AI world!
| andrewgross wrote:
| > The beauty of the MOE model approach is that you can decompose
| the big model into a collection of smaller models that each know
| different, non-overlapping (at least fully) pieces of knowledge.
|
| I was under the impression that this was not how MoE models work.
| They are not a collection of independent models, but instead a
| way of routing to a subset of active parameters at each layer.
| There is no "expert" that is loaded or unloaded per question. All
| of the weights are loaded in VRAM, its just a matter of which are
| actually loaded to the registers for calculation. As far as I
| could tell from the Deepseek v3/v2 papers, their MoE approach
| follows this instead of being an explicit collection of experts.
| If thats the case, theres no VRAM saving to be had using an MOE
| nor an ability to extract the weights of the expert to run
| locally (aside from distillation or similar).
|
| If there is someone more versed on the construction of MoE
| architectures I would love some help understanding what I missed
| here.
| Kubuxu wrote:
| Not sure about DeepSeek R1, but you are right in regards to
| previous MoE architectures.
|
| It doesn't reduce memory usage, as each subsequent token might
| require different expert buy it reduces per token
| compute/bandwidth usage. If you place experts in different
| GPUs, and run batched inference you would see these benefits.
| andrewgross wrote:
| Is there a concept of an expert that persists across layers?
| I thought each layer was essentially independent in terms of
| the "experts". I suppose you could look at what part of each
| layer was most likely to trigger together and segregate those
| by GPU though.
|
| I could be very wrong on how experts work across layers
| though, I have only done a naive reading on it so far.
| rahimnathwani wrote:
| I suppose you could look at what part of each layer was
| most likely to trigger together and segregate those by GPU
| though
|
| Yes, I think that's what they describe in section 3.4 of
| the V3 paper. Section 2.1.2 talks about "token-to-expert
| affinity". I think there's a layer which calculates these
| affinities (between a token and an expert) and then sends
| the computation to the GPUs with the right experts.
|
| This doesn't sound like it would work if you're running
| just one chat, as you need all the experts loaded at once
| if you want to avoid spending lots of time loading and
| unloading models. But at scale with batches of requests it
| should work. There's some discussion of this in 2.1.2 but
| it's beyond my current ability to comprehend!
| andrewgross wrote:
| Ahh got it, thanks for the pointer. I am surprised there
| is enough correlation there to allow an entire GPU to be
| specialized. I'll have to dig in to the paper again.
| Kubuxu wrote:
| I don't think entire GPU is specialised nor a singular
| token will use the same expert. I think about it as a
| gather-scatter operation at each layer.
|
| Let's say you have an inference batch of 128 chats, at
| layer `i` you take the hidden states, compute their
| routing, scatter them along with the KV for those layers
| among GPUs (each one handling different experts), the
| attention and FF happens on these GPUs (as model params
| are there) and they get gathered again.
|
| You might be able to avoid the gather by performing the
| routing on each of the GPUs, but I'm generally guessing
| here.
| liuliu wrote:
| It does. They have 256 experts per MLP layer, and some
| shared ones. The minimal deployment for decoding (aka.
| token generation) they recommend is 320 GPUs (H800). It
| is all in the DeepSeek v3 paper that everyone should read
| rather than speculating.
| andrewgross wrote:
| Got it. I'll review the paper again for that portion.
| However, it still sounds like the end result is not VRAM
| savings but efficiently and speed improvements.
| liuliu wrote:
| Yeah, if you look DeepSeek v3 paper deeper, each saving
| on each axis is understandable. Combined, they reach some
| magic number people can talk about (10x!): FP8: ~1.6 to
| 2x faster than BF16 / FP16; MLA: cut KV cache size by 4x
| (I think); MTP: converges 2x to 3x faster; DualPipe:
| maybe ~1.2 to 1.5x faster.
|
| If you look deeper, many of these are only applicable to
| training (we already do FP8 for inference, MTP is to
| improve training convergence, and DualPipe is to
| overlapping communication / compute mostly for training
| purpose too). The efficiency improvement on inference
| IMHO is overblown.
| rahimnathwani wrote:
| If you place experts in different GPUs
|
| Right, this is described in the Deepseek V3 paper (section
| 3.4 on pages 18-20).
| metadat wrote:
| _> Another very smart thing they did is to use what is known as a
| Mixture-of-Experts (MOE) Transformer architecture, but with key
| innovations around load balancing. As you might know, the size or
| capacity of an AI model is often measured in terms of the number
| of parameters the model contains. A parameter is just a number
| that stores some attribute of the model; either the "weight" or
| importance a particular artificial neuron has relative to another
| one, or the importance of a particular token depending on its
| context (in the "attention mechanism")._
|
| Has a wide-scale model analysis been performed inspecting the
| parameters and their weights for all popular open / available
| models yet? The impact and effects of disclosed inbound data and
| tuning parameters on individual vector tokens will prove highly
| informative and clarifying.
|
| Such analysis will undoubtedly help semi-literate AI folks level
| up and bridge any gaps.
| naveen99 wrote:
| Deepseek iOS app makes TikTok ban pointless.
| pavelstoev wrote:
| Interesting take. They are now reading our minds vs looking at
| our kids and interiors.
| naveen99 wrote:
| yeah, what's stopping zoom from integrating Deepseek and
| doing an end run around Microsoft teams.
| lxgr wrote:
| Man, do I love myself a deep, well-researched long-form
| contrarian analysis published as a tangent of an already niche
| blog on a Sunday evening! The old web isn't dead yet :)
| eigenvalue wrote:
| Hah thanks, that's my favorite piece of feedback yet on this.
| pavelstoev wrote:
| English economist William Stanley Jevons vs the author of the
| article.
|
| Will NVIDIA be in trouble because of DSR1 ? Interpreting Jevon's
| effect, if LLMs are "steam engines" and DSR1 brings 90%
| efficiency improvement for the same performance, more of it will
| be deployed. This is not considering the increase due to <think>
| tokens.
|
| More NVIDIA GPUs will be sold to support growing use cases of
| more efficient LLMs.
| breadwinner wrote:
| Part of the reason Musk, Zuckerberg, Ellison, Nadella and other
| CEOs are bragging about the number of GPUs they have (or plan to
| have) is to attract talent.
|
| Perplexity CEO says he tried to hire an AI researcher from Meta,
| and was told to 'come back to me when you have 10,000 H100 GPUs'
|
| See https://www.businessinsider.nl/ceo-says-he-tried-to-hire-
| an-...
| mrbungie wrote:
| Maybe DeepSeek ain't it, but I expect a big "box of scraps"[1]
| moment soon. Constraint is mother of invention, and they are
| evading constraints with a promise of never-ending scale.
|
| [1] https://youtu.be/9foB2z_OVHc?si=eZSTMMGYEB3Nb4zI
| rat9988 wrote:
| That's a weird way to read into it.
| TwoFerMaggie wrote:
| This reminds of the joke in physics, in which theoretical
| particle physicists told experimental physicists, over and over
| again, "trust me bro, standard model will be proved at 10x eV,
| we just need a bigger collider bro" after another world's
| biggest collider is built.
|
| Wondering if we are in a similar position with "trust me bro
| AGI will be achieved with 10x more GPUs".
| vonneumannstan wrote:
| The difference is the AI researchers have clear plots showing
| capabilities scaling with GPUs and there's not a sign that it
| is flattening so they actually have a case for saying that
| AGI is possible at N GPUs.
| segasaturn wrote:
| Sauce? How do you even measure "capabilities" in that
| regard, just writing answers to standard tests? Because
| being able to ace a test doesn't mean it's AGI, it means
| its good at taking standard tests.
| vonneumannstan wrote:
| This is the canonical paper. Nothing I've seen seems to
| indicate the curves are flattening, you can ask "scaling
| what" but the trend is clear.
|
| https://arxiv.org/pdf/2001.08361
| jms55 wrote:
| Great article, thanks for writing it! Really great summary of the
| current state of the AI industry for someone like me who's
| outside of it (but tangential, given that I work with GPUs for
| graphics).
|
| The one thing from the article that sticks out to me is that the
| author/people are assuming that deepseek needing 1/45th the
| amount of hardware means that the other 44/45ths large tech
| companies have invested were wasteful.
|
| Does software not scale to meet hardware? I don't see this as
| 44/45ths wasted hardware, but as a free increase in the amount of
| hardware people have. Software needing less hardware means you
| can run even _more_ software without spending more money, not
| that you need less hardware, right? (for the top-end, non-
| embedded use cases).
|
| ---
|
| As an aside, the state of the "AI" industry really freaks me out
| sometimes. Ignoring any sort of short or long term effects on
| society, jobs, people, etc, just the sheer amount of money and
| time invested into this one thing is, insane?
|
| Tons of custom processing chips, interconnects, compilers,
| algorithms, _press releases!_, etc all for one specific field.
| It's like someone taking the last decade of advances in
| computers, software, etc, and shoving it in the space of a year.
| For comparison, Rust 1.0 is 10 years old - I vividly remember the
| release. And even then it took years to propagate out as a
| "thing" that people were interested in and invested significant
| time into. Meanwhile deepseek releases a new model (complete with
| a customer-facing product name and chat interface, instead of
| something boring and technical), and in 5 days it's being
| replicated (to at least some degree) and copied by competitors.
| Google, Apple, Microsoft, etc are all making custom chips and
| investing insane amounts of money into different compilers,
| programming languages, hardware, and research.
|
| It's just, kind of disquieting? Like everyone involved in AI
| lives in another world operating at breakneck speed, with
| billions of dollars involved, and the rest of us are just
| watching from the sidelines. Most of it (LLMs specifically) is no
| longer exciting to me. It's like, what's the point of spending
| time on a non-AI related project? We can spend some time writing
| a nice API and working on a cool feature or making a UI prettier
| and that's great, and maybe with a good amount of contributors
| and solid, sustained effort, we can make a cool project that's
| useful and people enjoy, and earns money to support people if
| it's commercial. But then for AI, github repos with shiny well-
| written readmes pop up overnight, tons of text is being written,
| thought, effort, and billions of dollars get burned or speculated
| on in an instant on new things, as soon as the next marketing
| release is posted.
|
| How can the next advancement in graphics, databases,
| cryptography, etc compete with the sheer amount of societal
| attention AI receives?
|
| Where does that leave writing software for the rest of us?
| mgraczyk wrote:
| The beginning of the article was good, but the analysis of
| DeepSeek and what it means for Nvidia is confused and clearly out
| of the loop. * People have been training models
| at <fp32 precision for many years, I did this in 2021 and it was
| already easy in all the major libraries. * GPU FLOPs are
| used for many things besides training the final released model.
| * Demand for AI is capacity limited, so it's possible and likely
| that increasing AI/FLOP would not substantially reduce the price
| of GPUs
| lysecret wrote:
| Where do you have this "capacity" limit from? I can get as many
| H100s from GCP or wherever as I wish, the only thing that is
| capacity limited are 100k clusters ala ELON+X, but what
| DeepSeek (and the recent evidence of a limit in pure base-model
| scaling) shows is that this might actually not be profitable,
| and we end up with much smaller base models scaled at inference
| time. The moat for Nvidia in this inference time scaling is
| much smaller, also you don't need the humongous clusters for
| that either you can just distribute the inference (and in the
| future run it locally too).
| mgraczyk wrote:
| What's your GPU quota in GCP? How did you get it increased
| that much?
| saagarjha wrote:
| Asking GCP to give you H100s on-demand is nowhere near cost
| efficient.
| aorloff wrote:
| His DeepSeek argument was essentially that experts who look at
| the economics of running these teams (eg. ha ha the engineers
| themselves might dabble) are looking over the hedge at
| DeepSeek's claims and they are really awestruck
| mkalygin wrote:
| This is such a comprehensive analysis, thank you. For someone
| just starting to learn about the field, it's a great way to
| understand what's going on in the industry.
| miraculixx wrote:
| If we are to get to AGI why do we need to train on all data?
| That's silly, and all we get is compression and probabliatic
| retrieval.
|
| Intelligence by definition is not compression, but ability to
| think and act according to new data, based on experience.
|
| Trully AGI models will work on the this principle, not on best
| compression of as much data as possible.
|
| We need a new approach.
| eigenvalue wrote:
| Actually, compression is an incredibly good way to think about
| intelligence. If you understand something really well then you
| can compress it a lot. If you can compress most of human
| knowledge effectively without much reconstruction error while
| shrinking it down by 99.5%, then you must have in the process
| arrived at a coherent and essentially correct world model,
| which is the basis of effective cognition.
| chpatrick wrote:
| "If you can't explain it to a six year old, you don't
| understand it yourself." -> "If you can compress knowledge, you
| understand it."
| AnotherGoodName wrote:
| Fwiw there's highly cited papers that literally map AGI to
| compression. As in they map to the same thing and people write
| papers on this fact that are widely respected. Basically a
| prediction engine can be used to make a compression tool and an
| AI equally.
|
| The tldr; if given inputs and a system that can accurately
| predict the next sequence you can either compress that data
| using that prediction (arithmetic coding) or you can take
| actions based on that prediction to achieve an end goal mapping
| predictions of new inputs to possible outcomes and then taking
| the path to a goal (AGI). They boil down to one and the same.
| So it's weird to have someone state they are not the same when
| it's widely accepted they absolutely are.
| jwan584 wrote:
| The point about using FP32 for training is wrong. Mixed precision
| (FP16 multiplies, FP32 accumulates) has been use for years - the
| original paper came out in 2017.
| eigenvalue wrote:
| Fair enough, but that still uses a lot more memory during
| training than what DeepSeek is doing.
| suraci wrote:
| DeepSeek is not the black swan
|
| NVDA was overpriced a lot already even without r1, the market is
| full of air GPUs hiding in the capex of tech giants like MSFT.
|
| If orders are canceled or delivery fails for any reason, NVDA's
| EPS would be pulled back to its fundamentally justified level
|
| or if all those air GPUs are produced and delivered in recent
| years, and the demand keeps rising? well, that will be a crazy
| world then
|
| it's a finance game, not related with the real world
| naiv wrote:
| I used to own several adult companies in the past. Incredible
| huge margins and then along came Pornhub and we could barely
| survive after it as we did not adapt.
|
| With Deepseek this is now the 'Pornhub of AI' moment. Adapt or
| die.
| logicchains wrote:
| Curious what Pornhub did better, if you're able to say. Provide
| content at much lower cost, like DeepSeek?
| naiv wrote:
| Yes. close to free content
|
| They understood the Dmca brilliantly so they did bulk cheap
| content purchases and hid behind the Dmca for all non
| licensed content which was "uploaded by users". They did bulk
| purchases of cheap content from some studios but that was
| just a fraction
|
| Of course their risk of going advertise revenue only was high
| and in the beginning mostly only cam providers would
| advertise
|
| Our problem was that we had contracts and close relationships
| with all the big studios so going the Dmca route would have
| severed these ties for an unknown risk. In hindsight not
| creating a company which did abuse the Dmca was the right
| decision. I am very loyal and it would have felt like
| cheating
|
| Now it's a different story after the credit card shake down
| when they had to remove millions of videos and be able to
| provide 2257 documentation for each video
| nejsjsjsbsb wrote:
| That analogy would be if right if a startup could dredge beach
| sand and pump put trillions of AI chips.
|
| What actually happened was a better algorithm was created and
| people are betting against the main game in town for running
| said algorithm.
|
| If someone came up with a CPU-superior AI that'd be worrying
| for NVidia.
| naiv wrote:
| Groq lpu interference chip?
| nejsjsjsbsb wrote:
| You heard my 26khz whistle!
| homarp wrote:
| see also https://news.ycombinator.com/item?id=42839650
| chvid wrote:
| For sure NVIDIA is priced for perfection perhaps more than any of
| the other of similar market value.
|
| I think two threats are the biggest:
|
| First Apple. TSMC's largest customer. They are already making
| their own GPUs for their data centers. If they were to sell these
| to others they would be a major competitor.
|
| You would have the same GPU stack on your on phone, laptop, pc,
| and data center. Already big developer mind share. Also useful in
| a world where LLMs run (in part) on the end user's local machine
| (like Apple Intelligence).
|
| Second is China - Huawei, Deepseek etc.
|
| Yes - there will be no GPUs from Huawei in the US in this decade.
| And the Chinese won't win in a big massive battle. Rather it is
| going to be death by a thousand cuts.
|
| Just as what happened with the Huawei Mate 60. It is only sold in
| China but today Apple is loosing business big time in China.
|
| In the same manner OpenAi and Microsoft will have their business
| hurt by Deepseek even if Deepseek was completely banned in the
| west.
|
| Likely we will see news on Chinese AI accelerators this year and
| I wouldn't be surprised if we soon saw Chinese hyperscalars
| offering cheaper GPU cloud compute than the west due to a
| combination of cheaper energy, labor cost, and sheer scale.
|
| Lastly AMD is no threat to NVIDIA as they are far behind and
| follow the same path with little way of differentiating
| themselves.
| Giorgi wrote:
| Looks like huge astroturfing effort from CCP. I am seeing these
| coordinated propaganda inside every AI related sub on reddit, on
| social media and now - here.
| chasd00 wrote:
| Yeah I get that feeling too. Lots of old school astroturfing
| going on.
| dartos wrote:
| This just in.
|
| Competition lowers the value of monopolies.
| manojlds wrote:
| >With the advent of the revolutionary Chain-of-Thought ("COT")
| models introduced in the past year, most noticeably in OpenAI's
| flagship O1 model (but very recently in DeepSeek's new R1 model,
| which we will talk about later in much more detail), all that
| changed. Instead of the amount of inference compute being
| directly proportional to the length of the output text generated
| by the model (scaling up for larger context windows, model size,
| etc.), these new COT models also generate intermediate "logic
| tokens"; think of this as a sort of scratchpad or "internal
| monologue" of the model while it's trying to solve your problem
| or complete its assigned task.
|
| Is this right? I thought CoT was a prompting method and are we
| calling the reasoning models as CoT models?
| veesahni wrote:
| Reasoning models are a result of the learnings from CoT
| prompting.
| s1mplicissimus wrote:
| I'm curious what are the key differences between "a reasoning
| model" and good old CoT prompting. Is there any reason to
| believe that the fundamental limitations of prompting don't
| apply to "reasoning models"? (hallucinations, plainly wrong
| output, bias towards to training data mean etc.)
| itchyjunk wrote:
| The level of sophistication for CoT model varies. "good old
| CoT prompting" is you hoping the model generates some
| reasoning tokens prior to the final answer. When it did,
| the answers tended to be better for certain class of
| problems. But you had no control over what type of
| reasoning tokes it was generating. There were hypothesis
| that just having a <pause> tokens in between generated
| better answers as it allowed n+1 steps to generate an
| answer over n. I would consider Meta's "continuous chain of
| thought" to be on the other end of "good old CoT prompting"
| where they are passing back the next tokens from the latent
| space back to the model getting a "BHF" like effect. Who
| knows what's happening with O3 and Anthropics O3 like
| models.. The problems you mentioned is very broad and not
| limited to prompting. Reasoning models tend to outperform
| older models on math problems. So I'd assume it does reduce
| hallucination on certain class of problems.
| kimbler wrote:
| Nvidia seem to be one step ahead of this and you can see their
| platform efforts are pushing towards creating large volumes of
| compute that are easy to manage for whatever your compute
| requirements are, be that training, inference or whatever comes
| next and whatever form. People are maybe tackling some of these
| areas in isolation but you do not want to build datacenters where
| everything is ringfenced per task or usage.
| colinnordin wrote:
| Great article.
|
| > _Now, you still want to train the best model you can by
| cleverly leveraging as much compute as you can and as many
| trillion tokens of high quality training data as possible, but
| that 's just the beginning of the story in this new world; now,
| you could easily use incredibly huge amounts of compute just to
| do inference from these models at a very high level of confidence
| or when trying to solve extremely tough problems that require
| "genius level" reasoning to avoid all the potential pitfalls that
| would lead a regular LLM astray._
|
| I think this is the most interesting part. We always knew a huge
| fraction of the compute would be on inference rather than
| training, but it feels like the newest developments is pushing
| this even further towards inference.
|
| Combine that with the fact that you can run the full R1 (680B)
| distributed on 3 consumer computers [1].
|
| If most of NVIDIAs moat is in being able to efficiently
| interconnect thousands of GPUs, what happens when that is only
| important to a small fraction of the overall AI compute?
|
| [1]: https://x.com/awnihannun/status/1883276535643455790
| tomrod wrote:
| Conversely, how much larger can you scale if frontier models
| only currently need 3 consumer computers?
|
| Imagine having 300. Could you build even better models? Is
| DeepSeek the right team to deliver that, or can OpenAI, Meta,
| HF, etc. adapt?
|
| Going to be an interesting few months on the market. I think
| OpenAI lost a LOT in the board fiasco. I am bullish on HF. I
| anticipate Meta will lose folks to brain drain in response to
| management equivocation around company values. I don't put much
| stock into Google or Microsoft's AI capabilities, they are the
| new IBMs and are no longer innovating except at obvious
| margins.
| danaris wrote:
| This assumes no (or very small) diminishing returns effect.
|
| I don't pretend to know much about the minutiae of LLM
| training, but it wouldn't surprise me at all if throwing
| massively more GPUs at this particular training paradigm only
| produces marginal increases in output quality.
| tomrod wrote:
| I believe the margin to expand is on CoT, where tokens can
| grow dramatically. If there is value in putting more
| compute towards it, there may still be returns to be
| captured on that margin.
| stormfather wrote:
| Google is silently catching up fast with Gemini. They're also
| pursuing next gen architectures like Titan. But most
| importantly, the frontier of AI capabilities is shifting
| towards using RL at inference (thinking) time to perform
| tasks. Who has more data than Google there? They have a
| gargantuan database of queries paired with subsequent web
| nav, actions, follow up queries etc. Nobody can recreate
| this, Bing failed to get enough marketshare. Also, when you
| think of RL talent, which company comes to mind? I think
| Google has everyone checkmated already.
| moffkalast wrote:
| Never underestimate Google's ability to fall flat on their
| face when it comes to shipping products.
| _DeadFred_ wrote:
| How quickly the narrative went from 'Google silently has
| the most advanced AI but they are afraid to release it' to
| 'Google is silently catching up' all using the same 'core
| Google competencies' to infer Google's position of
| strength. Wonder what the next lower level of Google
| silently leveraging their strength will be?
| stormfather wrote:
| Google is clearly catching up. Have you tried the recent
| Gemini models? Have you tried deep research? Google is
| like a ship that is hard to turn around but also hard to
| stop once in motion.
| shwaj wrote:
| Can you say more about using RL at inference time, ideally
| with a pointer to read more about it? This doesn't fit into
| my mental model, in a couple of ways. The main way is right
| in the name: "learning" isn't something that happens at
| inference time; inference is generating results from
| already-trained models. Perhaps you're conflating RL with
| multistage (e.g. "chain of thought") inference? Or maybe
| you're talking about feeding the result of inference-time
| interactions with the user back into subsequent rounds of
| training? I'm curious to hear more.
| stormfather wrote:
| I wasn't clear. Model weights aren't changing at
| inference time. I meant at inference time the model will
| output a sequence of thoughts and actions to perform
| tasks given to it by the user. For instance, to answer a
| question it will search the web, navigate through some
| sites, scroll, summarize, etc. You can model this as a
| game played by emitting a sequence of actions in a
| browser. RL is the technique you want to train this
| component. To scale this up you need to have a massive
| amount of examples of sequences of actions taken in the
| browser, the outcome it led to, and a label for if that
| outcome was desirable or not. I am saying that by
| recording users googling stuff and emailing each other
| for decades Google has this massive dataset to train
| their RL powered browser using agent. Deepseek proving
| that simple RL ca be cheaply applied to a frontier LLM
| and have reasoning organically emerge makes this approach
| more obviously viable.
| onlyrealcuzzo wrote:
| If you watch this video, it explains well what the major
| difference is between DeepSeek and existing LLMs:
| https://www.youtube.com/watch?v=DCqqCLlsIBU
|
| It seems like there is MUCH to gain by migrating to this
| approach - and it _theoretically_ should not cost more to
| switch to that approach than vs the rewards to reap.
|
| I expect all the major players are already working full-steam
| to incorporate this into their stacks as quickly as possible.
|
| IMO, this seems incredibly bad to Nvidia, and incredibly good
| to everyone else.
|
| I don't think this seems particularly bad for ChatGPT.
| They've built a strong brand. This should just help them
| reduce - by far - one of their largest expenses.
|
| They'll have a slight disadvantage to say Google - who can
| much more easily switch from GPU to CPU. ChatGPT _could_ have
| some growing pains there. Google would not.
| wolfhumble wrote:
| > I don't think this seems particularly bad for ChatGPT.
| They've built a strong brand. This should just help them
| reduce - by far - one of their largest expenses.
|
| Often expenses like that are keeping your competitors away.
| onlyrealcuzzo wrote:
| Yes, but it typically doesn't matter if someone can reach
| parity or even surpass you - they have to surpass you by
| a step function to take a significant number of your
| users.
|
| This is a step function in terms of efficiency (which
| presumably will be incorporated into ChatGPT within
| months), but not in terms of end user experience. It's
| only slightly better there.
| ReptileMan wrote:
| One data point but my subscription for chatgpt is
| cancelled every time. So I made every month decision to
| resub. And because the cost of switching is essentially
| zero - the moment a better service is up there I will
| switch in an instant.
| onlyrealcuzzo wrote:
| There are obviously people like you, but I hope you
| realize this is not the typical user.
| tomrod wrote:
| That is a fantastic video, BTW.
| simpaticoder wrote:
| _> Imagine having 300._
|
| Would it not be useful to have multiple independent AIs
| observing and interacting to build a model of the world? I'm
| thinking something roughly like the "councelors" in the
| Civilization games, giving defense/economic/cultural advice,
| but generalized over any goal-oriented scenario (and
| including one to take the "user" role). A group of AIs with
| specific roles interacting with each other seems like a good
| area to explore, especially now given the downward
| scalability of LLMs.
| tomrod wrote:
| Yes; to my understanding that is MoE.
| JoshTko wrote:
| This is exactly where Deepseeks enhancements come into
| play. Essentially deepseek lets the model think out loud
| via chain of thought (o1 and Claude also do this) but DS
| also does not supervise the chain of thought, and simply
| rewards CoT that get the answer correctly. This is just one
| of the half dozen training optimization that Deepseek has
| come up with.
| neuronic wrote:
| > NVIDIAs moat
|
| Offtopic, but your comment finally pushed me over the edge to
| semantic satiation [1] regarding the word "moat". It is
| incredible how this word turned up a short while ago and now it
| seems to be a key ingredient of every second comment.
|
| [1] https://en.wikipedia.org/wiki/Semantic_satiation
| mikestew wrote:
| _It is incredible how this word turned up a short while
| ago..._
|
| I'm sure if I looked, I could find quotes from Warren Buffet
| (the recognized originator of the term) going back a few
| decades. But your point stands.
| mikeyouse wrote:
| Yeah, he's been talking about "economic moats" since at
| least the 1990s. At least since 1995;
|
| https://www.berkshirehathaway.com/letters/1995.html
| pillefitz wrote:
| Nobody claimed it's a new word. Still, the frequency
| increased 100x over the last days, subjectively speaking.
| kccqzy wrote:
| The earliest occurrence of the word "moat" that I could
| find online from Buffett is from 1986:
| https://www.berkshirehathaway.com/letters/1986.html That
| shareholder letter is charmingly old-school.
|
| Unfortunately letters before 1977 weren't available online
| so I wasn't able to search.
|
| It also helps that I've been to several cities with an
| actual moat so this word is familiar to me.
| fastasucan wrote:
| The word moat was first used in english in the 15th century
| https://www.merriam-webster.com/dictionary/moat
| ljw1004 wrote:
| I'm struggling to understand how a moat can have a CRACK in
| it.
| cwmoore wrote:
| https://en.wikipedia.org/wiki/Frequency_illusion
| bn-l wrote:
| Link has all the params but running at 4 bit quant.
| tw1984 wrote:
| > If most of NVIDIAs moat is in being able to efficiently
| interconnect thousands of GPUs
|
| nah. it moat is CUDA and millions of devs using CUDA aka the
| ecosystem
| mupuff1234 wrote:
| But if it's not combined with super high end chips with
| massive margins that moat is not worth anywhere close to 3T
| USD.
| ReptileMan wrote:
| And then some chineese startup create an amazing compiler
| that takes cuda and moves it to X (AMD, Intel, Asic) and we
| are back at square one.
|
| So far it seems that the best investment is in RAM producers.
| Unlike compute the ram requirements seem to be stubborn.
| 01100011 wrote:
| Don't forget that "CUDA" involves more than language
| constructs and programming paradigms.
|
| With NVDA, you get tools to deploy at scale, maximize
| utilization, debug errors and perf issues, share HW between
| workflows, etc. These things are not cheap to develop.
| Symmetry wrote:
| It might not be cheap to develop them but if you can save
| $10B in hardware costs by doing so you're probably
| looking at positive ROI.
| a_wild_dandan wrote:
| Running a 680-billion parameter frontier model on a few Macs
| (at 13 tok/s!) is nuts. That'a _two years_ after ChatGPT was
| released. That rate of progress just blows my mind.
| brandonpelfrey wrote:
| Great article. I still feel like very few people are viewing the
| Deepseek effects in the right light. If we are 10x more efficient
| it's not that we use 1/10th the resources we did before, we
| expand to have 10x the usage we did before. All technology
| products have moved this direction. Where there is capacity, we
| will use it. This argument would not work if we were close to AGI
| or something and didn't need more, but I don't think we're
| actually close to that at all.
| VHRanger wrote:
| Correct. This effect is known in economics since forever - new
| technology has
|
| - An "income effect". You use the thing more because it's
| cheaper - new usecases come up
|
| - A "substitution effect." You use other things more because of
| the savings.
|
| I got into this on labor economics here [1] - you have
| counterintuitive examples with ATMs actually increasing the
| number of bank branches for several decades.
|
| [1]: https://singlelunch.com/2019/10/21/the-economic-effects-
| of-a...
| neuronic wrote:
| Would this not mean we need much much more training data to
| fully utilize the now "free" capacities?
| vonneumannstan wrote:
| It's pretty clear that the reasoning models are using mass
| amounts of synthetic data so it's not a bottleneck.
| jnwatson wrote:
| This is called Jevons Paradox.
|
| https://en.wikipedia.org/wiki/Jevons_paradox.
| aurareturn wrote:
| Yep. I've been harping on this. DeepSeek is bullish for Nvidia.
| ReptileMan wrote:
| >DeepSeek is bullish for Nvidia.
|
| DeepSeek is bullish for the semiconductor industry as a
| whole. Whether it is for Nvidia remains to be seen. Intel was
| in Nvidia position in 2007 and they didn't want to trade
| margins for volumes in the phone market. And there they are
| today.
| aurareturn wrote:
| Why wouldn't it be for Nvidia? Explain more.
| mvdtnz wrote:
| Great, now I can rewrite 10x more emails or solve 10x more
| graduate level programming tasks (mostly incorrectly). Brave
| new world.
| p0w3n3d wrote:
| > which require low-latency responses, such as content
| moderation, fraud detection, _dynamic pricing_ , etc.
|
| Is it even legal to give different prices to different customers?
| jnwatson wrote:
| Of course it is. That how the airlines stay in business.
| p0w3n3d wrote:
| However imagine entering a store where the camera looks up
| your face in shared database and profiles you as a person who
| will pay higher prices - and the prices are displayed near
| you according to your profile...
| esafak wrote:
| It depends on what basis. You can't discriminate based on
| protected classes.
| typeofhuman wrote:
| I'm rooting for DeepSeek (or any competitor) against OpenAI
| because I don't like Sam Altman. I'm confident in admitting it.
| 1970-01-01 wrote:
| The enemy of your enemy is only temporarily your friend.
| typeofhuman wrote:
| Wise words from the epoch of time.
| TypingOutBugs wrote:
| As a European I really don't see the difference between US
| and Chinese tech right now - the last week from Trump has
| made me feel more threatened from the US than I ever have
| been by China (Greenland, living in a Nordic country with
| treaties to defend it).
|
| I appreciate China has censorship, but the US is going that
| way too (recent "issues" for search terms). Might be
| different scales now, but I think it'll happen. I don't care
| as much if a Chinese company wins the LLM space than I did
| last year.
| rwoerz wrote:
| Indeed! Just ask DeepSeek something about Tiananmen or
| Taiwan. Answering seems to be an absolute "no-brainer" for
| it.
| liuliu wrote:
| This is a humble and informed acrticle (comparing to others
| written by financial analysts the past a few days). But still
| have the flaw of over-estimating efficiency of deploying a 687B
| MoE model on commodity hardware (to use locally, cloud providers
| will do efficient batching and it is different): you cannot do
| that on any single Apple hardware (need to at least hook up 2 M2
| Ultra). You can barely deploy that on desktop computers just
| because non-register DDR5 can have 64GiB per stick (so you are
| safe with 512 RAM). Now coming to PCIe bandwidth: 37B per token
| activation means exactly that, each activation requires new set
| of 37B weights, so you need to transfer 18GiB per token into VRAM
| (assuming 4-bit quant). PCIe 5 (5090) have 64GB/s transfer speed
| so your upper bound is limited to 4 tok/s with a well balanced
| propose built PC (and custom software). For programming tasks
| that usually requires ~3000 tokens for thinking, we are looking
| at 12 mins per interaction.
| lvass wrote:
| Is it really 37B different parameters for each token? Even with
| the "multi-token prediction system" that the article mentions?
| liuliu wrote:
| I don't think anyone uses MTP for inference right now. Even
| if you use MTP for drafting, you need to batching in the next
| round to "verify" it is the right token, if that happens you
| need to activate more experts.
|
| DELETED: If you don't use MTP for drafting, and use MTP to
| skip generations, sure. But you also need to evaluate your
| use case to make sure you don't get penalized for doing that.
| Their evaluation in the paper don't use MTP for generation.
|
| EDIT: Actually, you cannot use MTP other than drafting
| because you need to fill in these KV caches. So, during
| generation, you cannot save your compute with MTP (you save
| memory bandwidth, but this is more complicated for MoE model
| due to more activated experts).
| pjdesno wrote:
| The description of DeepSeek reminds me of my experience in
| networking in the late 80s - early 90s.
|
| Back then a really big motivator for Asynchronous Transfer Mode
| (ATM) and fiber-to-the-home was the promise of video on demand,
| which was a huge market in comparison to the Internet of the day.
| Just about all the work in this area ignored the potential of
| advanced video coding algorithms, and assumed that broadcast TV-
| quality video would require about 50x more bandwidth than today's
| SD Netflix videos, and 6x more than 4K.
|
| What made video on the Internet possible wasn't a faster
| Internet, although the 10-20x increase every decade certainly
| helped - it was smarter algorithms that used orders of magnitude
| less bandwidth. In the case of AI, GPUs keep getting faster, but
| it's going to take a hell of a long time to achieve a 10x
| improvement in performance per cm^2 of silicon. Vastly improved
| training/inference algorithms may or may not be possible
| (DeepSeek seems to indicate the answer is "may") but there's no
| physical limit preventing them from being discovered, and the
| disruption when someone invents a new algorithm can be nearly
| immediate.
| TMWNN wrote:
| >but there's no physical limit preventing them from being
| discovered, and the disruption when someone invents a new
| algorithm can be nearly immediate.
|
| The rise of the net is Jevons paradox fulfilled. The orders of
| magnitude less bandwidth needed per cat video drove much more
| than that in overall growth in demand for said videos. During
| the dotcom bubble's collapse, bandwidth use kept going up.
|
| Even if there is a near-term bear case for NVDA (dotcom
| bubble/bust), history indicates a bull case for the sector
| overall and related investments such as utilities (the entire
| history of the tech sector from 1995 to today).
| accra4rx wrote:
| Love those analogies . This is one of main reason I love hacker
| news / reddit . Honest golden experiences
| AlanYx wrote:
| Another aspect that reinforces your point is that the ATM push
| (and subsequent downfall) was not just bandwidth-motivated but
| also motivated by a belief that ATM's QoS guarantees were
| necessary. But it turned out that software improvements,
| notably MPLS to handle QoS, were all that was needed.
| pjdesno wrote:
| Nah, it's mostly just buffering :-)
|
| Plus the cell phone industry paved the way for VOIP by
| getting everyone used to really, really crappy voice quality.
| Generations of Bell Labs and Bellcore engineers would rather
| have resigned than be subjected to what's considered
| acceptable voice quality nowadays...
| hedgehog wrote:
| Yes, I think most video on the Internet is HLS and similar
| approaches which are about as far from the ATM circuit-
| switching approach as it gets. For those unfamiliar HLS is
| pretty much breaking the video into chunks to download over
| plain HTTP.
| nyarlathotep_ wrote:
| >> Plus the cell phone industry paved the way for VOIP by
| getting everyone used to really, really crappy voice
| quality
|
| What accounts for this difference? Is there something
| inherently worse about the nature of cell phone
| infrastructure over land-line use?
|
| I'm totally naive on such subjects.
|
| I'm just old enough to remember landlines being widespread,
| but nearly all of my phone calls have been via cell since
| the mid 00s, so I can't judge quality differences given the
| time that's passed.
| hnuser123456 wrote:
| Because at some point, someone decided that 8 kbps makes
| for an acceptable audio stream per subscriber. And at
| first, the novelty of being able to call anyone anywhere,
| even with this awful quality, was novel enough that
| people would accept it. And most people did until the
| carriers decided they could allocate a little more with
| VoLTE, if it works on your phone in your area.
| ipdashc wrote:
| > Because at some point, someone decided that 8 kbps
| makes for an acceptable audio stream per subscriber.
|
| Has it not been like this for a very long time? I was
| under the impression that "voice frequency" being defined
| as up to 4 kHz was a very old standard - after all,
| (long-distance) phone calls have always been multiplexed
| through coaxial or microwave links. And it follows that
| 8kbps is all you need to losslessly digitally sample
| that.
|
| I assumed it was jitter and such that lead to lower
| quality of VoIP/cellular, but that's a total guess. Along
| with maybe compression algorithms that try to squeeze the
| stream even tighter than 8kbps? But I wouldn't have
| figured it was the 8kHz sample rate at fault, right?
| hnuser123456 wrote:
| Sure, if you stop after "nobody's vocal coords make
| noises above 4khz in normal conversation", but the
| rumbling of the vocal coords isn't the entire audio data
| which is present in-person. Clicks of the tongue and
| smacking of the lips make much higher frequencies, and
| higher sample rates capture the timbre/shape of the
| soundwave instead of rounding it down to a smooth sine
| wave. Discord defaults to 64kbps, but you can push it up
| to 96kbps or 128kbps with nitro membership, and it's not
| hard to hear an improvement with the higher bitrates. And
| if you've ever used bluetooth audio, you know the
| difference in quality between the bidirectional call
| profile, and the unidirectional music profile, and wished
| to have the bandwidth of the music profile with the low
| latency of the call profile.
| WalterBright wrote:
| I've noticed this when talking on the phone with someone
| with a significant accent.
|
| 1. it takes considerable work on my part to understand it
| on a cell phone
|
| 2. it's much easier on POTS
|
| 3. it's not a problem on VOIP
|
| 4. no issues in person
|
| With all the amazing advances in cell phones, the voice
| quality of cellular is stuck in the 90's.
| bayindirh wrote:
| I generally travel to Europe, and it baffles why I can't
| use VoLTE there (maybe my roaming doesn't allow that),
| and fallback to 3G for voice calls.
|
| At home, I use VoLTE and the sound is almost impeccable,
| very high quality, but in the places I roam to, what I
| get is FM quality 3G sound.
|
| It's not that cellular network is incapable of that sound
| quality, but I don't get to experience it except my home
| country. Interesting, indeed.
| tlb wrote:
| And memory. In the heyday of ATM (late 90s) a few megabytes
| was quite expensive for a set-top box, so you couldn't buffer
| many seconds of compressed video.
|
| Also, the phone companies had a pathological aversion to
| understanding Moore's law, because it suggested they'd have
| to charge half as much for bandwidth every 18 months. Long
| distance rates had gone down more like 50%/decade, and even
| that was too fast.
| aurareturn wrote:
| Doesn't your point about video compression tech support
| Nvidia's bull case?
|
| Better video compression led to an explosion in video
| consumption on the Internet, leading to much more revenue for
| companies like Comcast, Google, T-Mobile, Verizon, etc.
|
| More efficient LLMs lead to much more AI usage. Nvidia, TSMC,
| etc will benefit.
| vFunct wrote:
| I agree that advancements like DeepSeek, like transformer
| models before it, is just going to end up increasing demand.
|
| It's very shortsighted to think we're going to need fewer
| chips because the algorithms got better. The system became
| more efficient, which causes induced demand.
| snailmailstare wrote:
| It improves TSMC' case.. Paying Nvidia would be like paying
| Cray for every smartphone that is faster than a supercomputer
| of old.
| 9rx wrote:
| Yes, over the long haul, probably. But as far as individual
| investors go they might not like that Nvidia.
|
| Anyone currently invested is presumably in because they like
| the insanely high profit margin, and this is apt to quash
| that. There is now much less reason to give your first born
| to get your hands on their wares. Comcast, Google, T-Mobile,
| Verizon, etc., and especially those not named Google, have
| nothingburger margins in comparison.
|
| If you are interested in what they can do with volume, then
| there is still a lot of potential. They may even be more
| profitable on that end than a margin play could ever hope
| for. But that interest is probably not from the same person
| who currently owns the stock, it being a change in territory,
| and there is apt to be a lot of instability as stock changes
| hands from the one group to the next.
| onlyrealcuzzo wrote:
| No - because this eliminates entirely or shifts the majority
| of work from GPU to CPU - and Nvidia does not sell CPUs.
|
| If the AI market gets 10x bigger, and GPU work gets 50%
| smaller (which is still 5x larger than today) - but Nvidia is
| priced on 40% growth for the next ten years (28x larger) -
| there is a price mismatch.
|
| It is _theoretically_ possible for a massive reduction in GPU
| usage or shift from GPU to CPU to benefit Nvidia if that
| causes the market to grow enough - but it seems unlikely.
|
| Also, _I believe_ (someone please correct if wrong) DeepSeek
| is claiming a 95% overall reduction in GPU usage compared to
| traditional methods (not the 50% in the example above).
|
| If true, that is a death knell for Nvidia's growth story
| after the current contracts end.
| munksbeer wrote:
| I can see close to zero possibility that the majority of
| the work will be shifted to the CPU. Anything a CPU can do
| can just be done better with specialised GPU hardware.
| lokar wrote:
| People have been saying the exact same thing about other
| workloads for years, and always been wrong. Mostly
| claiming custom chips or FPGAs will beat out general
| purpose CPUs.
| Vegenoid wrote:
| Then why do we have powerful CPUs instead of a bunch of
| specialized hardware? It's because the value of a CPU is
| in its versatility and ubiquity. If a CPU can do a thing
| good enough, then most programs/computers will do that
| thing on a CPU instead of having the increased complexity
| and cost of a GPU, even if a GPU would do it better.
| chrisco255 wrote:
| We have both? Modern computing devices like smart phones
| use SoCs with integrated GPUs. GPUs aren't really
| specialized hardware, either, they are general purpose
| hardware useful in many scenarios (built for graphics
| originally but clearly useful in other domains including
| AI).
| ozten wrote:
| > Anything a CPU can do can just be done better
|
| Nope. Anything inheriantly serial is better off on the
| CPU due to caching and it's architecture.
|
| Many things that are highly parallizable are getting GPU
| enabled. Games and ML are GPU by default, but many things
| are migrating to CUDA.
|
| You need both for cheap, high performance computing. They
| are different workloads.
| aurareturn wrote:
| No - because this eliminates entirely or shifts the
| majority of work from GPU to CPU - and Nvidia does not sell
| CPUs.
|
| I'm not even sure how to reply to this. GPUs are
| fundamentally much more efficient for AI inference than
| CPUs.
| snailmailstare wrote:
| I think SIMD is not so much better than SIMT for solved
| problems as a level in claiming a problem as solved.
| pjdesno wrote:
| No, it doesn't.
|
| Not only are 10-100x changes disruptive, but the players who
| don't adopt them quickly are going to be the ones who
| continue to buy huge amounts of hardware to pursue old
| approaches, and it's hard for incumbent vendors to avoid
| catering to their needs, up until it's too late.
|
| When everyone gets up off the ground after the play is over,
| Nvidia might still be holding the ball but it might just as
| easily be someone else.
| mandevil wrote:
| It lead to more revenue for the industry as a whole. But not
| necessarily for the individual companies that bubbled the
| hardest: Cisco stock is still to this day lower than it was
| at peak in 2000, to point to a significant company that sold
| actual physical infra products necessary for the internet and
| still around and profitable to this day. (Some companies that
| bubbled did quite well, AMZN is like 75x from where it was in
| 2000. But that's a totally different company that captured an
| enormous amount of value from AWS that was not visible to the
| market in 2000, so it makes sense.)
|
| If stock market-cap is (roughly) the market's aggregated best
| guess of future profits integrated over all time, discounted
| back to the present at some (the market's best guess of the
| future?) rate, then increasing uncertainty about the
| predicted profits 5-10 years from now can have enormous
| influence on the stock. Does NVDA have an AWS within it now?
| aurareturn wrote:
| >It lead to more revenue for the industry as a whole. But
| not necessarily for the individual companies that bubbled
| the hardest: Cisco stock is still to this day lower than it
| was at peak in 2000, to point to a significant company that
| sold actual physical infra products necessary for the
| internet and still around and profitable to this day. (Some
| companies that bubbled did quite well, AMZN is like 75x
| from where it was in 2000. But that's a totally different
| company that captured an enormous amount of value from AWS
| that was not visible to the market in 2000, so it makes
| sense.)
|
| Cisco in 1994: $3.
|
| Cisco after dotcom bubble: $13.
|
| So is Nvidia's stock price closer to 1994 or 2001?
| fspeech wrote:
| If you normalize Nvidia's gross margin and take into account
| of competitors sure. But its current high margin is driven by
| Big Tech FOMO. Do keep in mind that 90% margin or 10x cost to
| 50% margin or 2x cost is a 5x price reduction.
| aurareturn wrote:
| So why would DeepSeek decrease FOMO? It should increase it
| if anything.
| Vegenoid wrote:
| Because DeepSeek demonstrates that loads of compute isn't
| necessary for high-performing models, and so we won't
| need as much and as powerful of hardware as was
| previously thought, which is what Nivida's valuation is
| based on?
| vFunct wrote:
| I worked on a network that used a protocol very similar to ATM
| (actually it was the first Iridium satellite network). An
| internet based on ATM would have been amazing. You're basically
| guaranteeing a virtual switched circuit, instead of the packets
| we have today. The horror of packet switching is all the
| buffering it needs, since it doesn't guarantee circuits.
|
| Bandwidth is one thing, but the real benefit is that ATM also
| guaranteed minimal latencies. You could now shave off another
| 20-100ms of latency for your FaceTime calls, which is subtle
| but game changing. Just instant-on high def video
| communications, as if it were on closed circuits to the next
| room.
|
| For the same reasons, the AI analogy could benefit from both
| huge processing as well as stronger algorithms.
| lxgr wrote:
| > You're basically guaranteeing a virtual switched circuit
|
| Which means you need state (and the overhead that goes with
| it) for each connection _within the network_. That 's
| horribly inefficient, and precisely the reason packet-
| switching won.
|
| > An internet based on ATM would have been amazing.
|
| No, we'd most likely be paying by the socket connection (as
| somebody has to pay for that state keeping overhead), which
| sounds horrible.
|
| > You could now shave off another 20-100ms of latency for
| your FaceTime calls, which is subtle but game changing.
|
| Maybe on congested Wi-Fi (where even circuit switching would
| struggle) or poorly managed networks (including shitty ISP-
| supplied routers suffering from horrendous bufferbloat).
| Definitely not on the majority of networks I've used in the
| past years.
|
| > The horror of packet switching is all the buffering it
| needs [...]
|
| The ideal buffer size is exactly the bandwidth-delay product.
| That's really not a concern these days anymore. If anything,
| buffers are much too large, causing unnecessary latency;
| that's where bufferbloat-aware scheduling comes in.
| vFunct wrote:
| The cost for interactive video would be a requirement of
| 10x bandwidth, basically to cover idle time. Not efficient
| but not impossible, and definitely wouldn't change ISP
| business models.
|
| The latency benefit would outweigh the cost. Just
| absolutely instant video interaction.
| foobarian wrote:
| It is fascinating to think that before digital circuits
| phone calls were accomplished by an end-to-end electrical
| connection between the handsets. What luxury that must
| have been! If only those ancestors of ours had modems and
| computers to use those excellent connections for low-
| latency gaming... :-)
| thijson wrote:
| I remember my professor saying how the fixed packet size in
| ATM (53 bytes) was a committee compromise. North America
| wanted 64 bytes, Europe wanted 32 bytes. The committee chose
| around the midway point.
| wtallis wrote:
| 53 byte frames is what results in the exact compromise of
| 48 bytes for the _payload_ size.
| pjdesno wrote:
| Man, I saw a presentation on Iridium when I was at Motorola
| in the early 90s, maybe 92? Not a marketing presentation -
| one where an engineer was talking, and had done their own
| slides.
|
| What I recall is that it was at a time when Internet folks
| had made enormous advances in understanding congestion
| behavior in computer networks, and other folks (e.g. my
| division of Motorola) had put a lot of time into
| understanding the limited burstiness you get with silence
| suppression for packetized voice, and these folks knew
| nothing about it.
| paulddraper wrote:
| I love algorithms as much the next guy, but not really.
|
| DCT was developed in 1972 and has a compression ratio of 100:1.
|
| H.264 compresses 2000:1.
|
| And standard resolution (480p) is ~1/30th the resolution of 4k.
|
| ---
|
| I.e. Standard resolution with DCT is smaller than 4k with
| H.264.
|
| Even high-definition (720p) with DCT is only twice the
| bandwidth of 4k H.264.
|
| Modern compression has allowed us to add a bunch more pixels,
| but it was hardly a requirement for internet video.
| wtallis wrote:
| The web didn't go from streaming 480p straight to 4k. There
| were a couple of intermediate jumps in pixel count that were
| enabled in large part by better compression. Notably, there
| was a time period where it was important to ensure your
| computer had hardware support for H.264 decode, because it
| was taxing on low-power CPUs to do at 1080p and you weren't
| going to get streamed 1080p content in any simpler, less
| efficient codec.
| paulddraper wrote:
| Right.
|
| Modern compression algorithms were developed but not even
| computationally available for some of the time.
| WhitneyLand wrote:
| DCT is not an algorithm at all, it's a mathematical
| transform.
|
| It doesn't have a compression ratio.
| paulddraper wrote:
| > DCT compression, also known as block compression,
| compresses data in sets of discrete DCT blocks.[3] DCT
| blocks sizes including 8x8 pixels for the standard DCT, and
| varied integer DCT sizes between 4x4 and 32x32
| pixels.[1][4] The DCT has a strong energy compaction
| property,[5][6] capable of achieving high quality at high
| data compression ratios.[7][8] However, blocky compression
| artifacts can appear when heavy DCT compression is applied.
|
| https://en.wikipedia.org/wiki/Discrete_cosine_transform
| foobarian wrote:
| I'm sure it helped, but yeah, not only e2e bandwidth but also
| the total network throughput increased by vast orders of
| magnitude.
| eigenvalue wrote:
| Yes, that is a very apt analogy!
| tuna74 wrote:
| I always like the "look" of high bit rate Mpeg2 video. Download
| HD japanese TV content from 2005-2010 and it still looks really
| good.
| TheCondor wrote:
| It seems more stark even. The energy costs that are current and
| then projected for AI are _staggering_. At the same time, I
| think it has been MS that has been publishing papers on LLMs
| that are smaller (so called small language models) but more
| targeted and still achieving a fairly high "accuracy rate."
|
| Didn't TMSC say that SamA came for a visit and said they needed
| $7T in investment to keep up with the pending demand needs.
|
| This stuff is all super cool and fun to play with, I'm not a
| nay sayer but it almost feels like these current models are
| "bubble sort" and who knows how it will look if "quicksort" for
| them becomes invented.
| aurareturn wrote:
| Perhaps most devastating is DeepSeek's recent efficiency
| breakthrough, achieving comparable model performance at
| approximately 1/45th the compute cost. This suggests the entire
| industry has been massively over-provisioning compute resources.
|
| I wrote in another thread why DeepSeek should increase demand for
| chips, not lower.
|
| 1. More efficient LLMs should lead to more usage, which means
| more AI chip demand. Jevon's Paradox.
|
| 2. Even if DeepSeek is 45x more efficient (it is not), models
| will just become 45x+ bigger. It won't stay small.
|
| 3. To build a moat, OpenAI and American AI companies need to up
| their datacenter spending even more.
|
| 4. DeepSeek's breakthrough is in distilling models. You still
| need a ton of compute to train the foundational model to distill.
|
| 5. DeepSeek's conclusion in their paper says more compute is
| needed for next break through.
|
| 6. DeepSeek's model is trained on GPT4o/Sonnet outputs. Again,
| this reaffirms the fact that in order to take the next step, you
| need to continue to train better models. Better models will
| generate better data for next-gen models.
|
| I think DeepSeek hurts OpenAI/Anthropic/Google/Microsoft. I think
| DeepSeek helps TSMC/Nvidia. Combined with the
| emergence of more efficient inference architectures through
| chain-of-thought models, the aggregate demand for compute could
| be significantly lower than current projections assume.
|
| This is misguided. Let's think logically about this.
|
| More thinking = smarter models
|
| Faster hardware = more thinking
|
| More/newer Nvidia GPUs, better TSMC nodes = faster hardware
|
| Therefore, you can conclude that Nvidia and TSMC demand should go
| up because of CoT models. In 2025, CoT models are clearly
| bottlenecked by not having enough compute. The
| economics here are compelling: when DeepSeek can match GPT-4
| level performance while charging 95% less for API calls, it
| suggests either NVIDIA's customers are burning cash unnecessarily
| or margins must come down dramatically.
|
| Or that in order to build a moat, OpenAI/Anthropic/Google and
| other laps need to double down on even more compute.
| outside1234 wrote:
| But Microsoft hosts 3rd party models too, and cheaper models
| means more usage, which means more $$$ to scaled cloud
| providers right?
| clvx wrote:
| it means they can serve more with what they have if they
| implement models with deepseek's optimizations. More usage
| doesn't mean Nvidia will get the same margins when cloud
| providers scale out with this innovation.
| AnotherGoodName wrote:
| I agree with this.
|
| Fwiw many of the improvements in Deepseek were already in other
| 'can run on your personal computer' AI's such as Meta's Llama.
| Deepseek is actually very similar to Llama in efficiency.
| People were already running that on home computers with M3's.
|
| A couple of examples; Meta's multi-token prediction was
| specifically implemented as a huge efficiency improvement that
| was taken up by Deepseek. REcurrent ADaption (READ) was another
| big win by Meta that Deepseek utilized. Multi-head Latent
| Attention is another technique, not pioneered by Meta but used
| by both Deepseek and Llama.
|
| Anyway Deepseek isn't some independent revolution out of
| nowhere. It's actually very very similar to the existing state
| of the art and just bundles a whole lot of efficiency gains in
| one model. There's no secret sauce here. It's much better than
| what openAI has but that's because openAI seem to have
| forgotten 'The Bitter Lesson'. They have been going at things
| in an extremely brute force way.
|
| Anyway why do i point out that Deepseek is very similar to
| something like Llama? Because Meta's spending 100's of billions
| on chips to run it. It's pretty damn efficient, especially
| compared to openAI but they are still spending billions on
| datacenter build-outs.
| crubier wrote:
| > openAI seem to have forgotten 'The Bitter Lesson'. They
| have been going at things in an extremely brute force way.
|
| Isn't the point of 'The Bitter Lesson' precisely that in the
| end, brute force wins, and hand-crafted optimizations like
| the ones you mention llama and deepseek use are bound to lose
| in the end?
| AnotherGoodName wrote:
| Imho the tldr is that the wins are always from 'scaling
| search and learning'.
|
| Any customisations that aren't related to the above are
| destined to be overtaken by someone that can improve the
| scaling of compute. OpenAI do not seem to be doing as much
| to improve the scaling of the compute in software terms
| (they are doing a lot in hardware terms admitedly). They
| have models at the top of the charts for various benchmarks
| right now but it feels like a temporary win from chasing
| those benchmarks outside of the focus of scaling compute.
| macawfish wrote:
| This is exactly where project digits comes in. Nvidia needs to
| pivot toward being a local inference platform if they want to
| survive the next shift.
| skizm wrote:
| I'm wondering if there's a (probably illegal) strategy in the
| making here: - Wait till NVDA rebounds in
| price. - Create an OpenAI "competitor" that is powered by
| Llama or a similar open weights model. - Obscure the fact
| that the company runs on this open tech and make it seem like
| you've developed your own models, but don't outright lie.
| - Release an app and whitepaper (whitepaper looks and sounds
| technical, but is incredibly light on details, you only need to
| fool some new-grad stock analysts). - Pay some shady
| click farms to get your app to the top of Apples charts (you only
| need it to be there for like 24 hours tops). - Collect
| profits from your NVDA short positions.
| tw1984 wrote:
| this is exactly what DeepSeek is doing, the only difference is
| they built the real model, not a fake one.
| startupsfail wrote:
| - Fail at the above.
|
| I don't think this is what happened with DeepSeek. It seems
| that they've genuinely optimized their model for efficiency and
| used GPUs properly (tiled FP8 trick and FP8 training). And came
| out on top.
|
| The impact on the NVIDIA stock is ridiculous. DeepSeek took the
| advantage of flexible GPU architecture (unlike inflexible
| hardware acceleration).
| mmiliauskas wrote:
| This is what I still don't understand, how much of what they
| claim has been actually replicated? From what I understand
| the "50x cheaper" inference is coming from their pricing
| page, but is it actually 50x cheaper than the best open
| source models?
| zamadatix wrote:
| 50x cheaper than OpenAI's pricing on an open source model
| which doesn't require giving that quality level up. The
| best open source models were much closer in pricing but
| V3/R1 are that way while being a results topper.
| fairity wrote:
| DeepSeek just further reinforces the idea that there is a first-
| move _disadvantage_ in developing AI models.
|
| When someone can replicate your model for 5% of the cost in 2
| years, I can only see 2 rational decisions:
|
| 1) Start focusing on cost efficiency today to reduce the
| advantage of the second mover (i.e. trade growth for
| profitability)
|
| 2) Figure out how to build a real competitive moat through one or
| more of the following: economies of scale, network effects,
| regulatory capture
|
| On the second point, it seems to me like the only realistic
| strategy for companies like OpenAI is to turn themselves into a
| platform that benefits from direct network effects. Whether
| that's actually feasible is another question.
| Mistletoe wrote:
| I feel like AI tech just reverse scales and reverse flywheels,
| unlike the tech giant walls and moats now, and I think that is
| wonderful. OpenAI has really never made sense from a financial
| standpoint and that is healthier for humans. There's no network
| effect because there's no social aspect to AI chatbots. I can
| hop on DeepSeek from Google Gemini or OpenAI at ease because I
| don't have to have friends there and/or convince them to move.
| AI is going to be a race to the bottom that keeps prices low to
| zero. In fact I don't know how they are going to monetize it at
| all.
| tw1984 wrote:
| > DeepSeek just further reinforces the idea that there is a
| first-move disadvantage in developing AI models.
|
| you are assuming that what DeepSeek achieved can be reasonably
| easily replicated by other companies. then the question is when
| all big techs and tons of startups in China and the US are
| involved, how come none of those companies succeeded?
|
| deepseek is unique.
| 11101010001100 wrote:
| Deepseek is unique, but the US has consistently
| underestimated Chinese R&D, which is not a winning strategy
| in iterated games.
| rightbyte wrote:
| There seem to be a 100 fold uptick in jingoists in the last
| 3-4 years which makes my head hurt but I think there is no
| consistent "underestimation" in academic circles? I think I
| have read articles about the up and coming Chinese STEM for
| like 20 years.
| coliveira wrote:
| Yes, for people in academia the trend is clear, but it
| seems that WallStreet didn't believe this was possible.
| They assume that spending more money is all you need to
| dominate technology. Wrong! Technology is about human
| potential. If you have less money but bigger investment
| in people you'll win the technological race.
| rightbyte wrote:
| I think Wall Street is in for surprise as they have been
| profiting from liquidating the inefficiency of worker
| trust and loyalty for quite some time now.
|
| It think they think American engineering excellence was
| due to neoliberal inginuenity visavi the USSR, not the
| engineers and the transfer of academic legacy from
| generation to generation.
| coliveira wrote:
| This is even more apparent when large tech corporations
| are, supposedly, in a big competition but at the same
| time firing thousands of developers and scientists. Are
| they interested in making progress or just reducing
| costs?
| corimaith wrote:
| What does DeepSeek or really High Flyer do that is
| particularly exceptional regarding employees? HFT and
| other elite law or Hedge funds are known to have pretty
| zany benefits.
| 11101010001100 wrote:
| Precisely. This is the view from the ivory tower.
| corimaith wrote:
| That doesn't the calculus regarding the actions you would
| pick externally, in fact it only strengthens the point for
| increased tech restrictions and more funding.
| rightbyte wrote:
| Unique, ye, but isn't their method open? I read something
| about a group replicating a smaller variant of their main
| model.
| ghostzilla wrote:
| Which brings the question, if LLMs are an asset of such
| strategic value, why did China allow the DeepSeek to be
| released?
|
| I see two possibilities here, either that the CCP is not
| that all-reaching as we think, or that the value of the
| technology isn't critical, and that the release was further
| cleared with the CCP and maybe even timed to come right
| after Trump's announcement of American AI supremacy.
| rightbyte wrote:
| It is hard to estimate how much it is "didn't care",
| "didn't know" or "did it" I think. Rather pointless
| unless there are public party discussion about it to
| read.
| fairity wrote:
| It's early innings, and supporting the open source
| community could be viewed by the CCP as an effective way
| to undermine the US's lead in AI.
|
| In a way, their strategy could be:
|
| 1) Let the US invest $1 trillion in R&D
|
| 2) Support the open source community such that their
| capability to replicate these models only marginally lags
| the private sector
|
| 3) When R&D costs are more manageable, lean in and play
| catch up
| creato wrote:
| I really doubt there was any intention behind it at all.
| I bet deepseek themselves are surprised at the impact
| this is having, and probably regret releasing so much
| information into the open.
| lenerdenator wrote:
| It will be assumed by the American policy establishment
| that this represents what the CCP doesn't consider
| important, meaning that they have even better stuff in
| store. It will also be assumed that this was timed to
| take a dump on Trump's announcement, like you said.
|
| And it did a great job. Nvidia stock's sunk, and
| investors are going to be asking if it's really that
| smart to give American AI companies their money when the
| Chinese can do something similar for significantly less
| money.
| jerjerjer wrote:
| We have one success after ~two years of ChatGPT hype (and
| therefore subsequent replication attempts). That's as fast as
| it gets.
| aurareturn wrote:
| This is wrong. First mover advantage is strong. This is why
| OpenAI is much bigger than Mixtral despite what you said.
|
| First mover advantage acquired and keeps subscribers.
|
| No one really cares if you matched GPT4o one year later. OpenAI
| has had a full year to optimize the model, build tools around
| the model, and used the model to generate better data for their
| next generation foundational model.
| jaynate wrote:
| They also burnt a hell of a lot more cash. That's a
| disadvantage.
| itissid wrote:
| OpenAI does not have a business model that is cashflow
| positive at this point and/or a product that gives them a
| significant leg up in the same moat sense Office/Teams might
| give to Microsoft.
| aurareturn wrote:
| Companies in the mobile era took a decade or more to become
| profitable. For example, Uber and Airbnb.
|
| Why do you expect OpenAI to become profitable after 3 years
| of chatgpt?
| meiraleal wrote:
| Nobody expects it but what we know for sure is that they
| have burnt billions of dollars. If other startups can get
| there spending millions, the fact is that openai won't
| ever be profitable.
|
| And more important (for us), let the hiring frenzy start
| again :)
| aurareturn wrote:
| They have a ton of revenue and high gross margins. They
| burn billions because they need to keep training ever
| better models until the market slows and competition
| consolidates.
| fairity wrote:
| The counter argument is that they won't be able to
| sustain those gross margins when the market matures
| because they don't have an effective moat.
|
| In this world, R&D costs and gross margin/revenue are
| inextricably correlated.
| aurareturn wrote:
| When the market matures, there will be fewer competitors
| so they won't need to sustain the level of investment.
|
| The market always consolidates when it matures. Every
| time. The market always consolidates into 2-3 big
| players. Often a duopoly. OpenAI is trying to be one of
| the two or three companies left standing.
| physicsguy wrote:
| Interest rates have an effect too, Uber and Airbnb were
| starting in a much more fundraising friendly time.
| dplgk wrote:
| What is OpenAI's first-mover moat? I switched to Claude with
| absolutely no friction or moat-jumping.
| xxpor wrote:
| What is Google's first mover moat? I switched to
| Bing/DuckDuckGo with absolutely no friction or moat
| jumping.
|
| Brands are incredibly powerful when talking about consumer
| goods.
| bpt3 wrote:
| Google's moat _was_ significantly better results than the
| competition for about 2 decades.
|
| Your analogy is valid at this time, but proves the GP's
| point, not yours.
| fairity wrote:
| I think it's worth double clicking here. _Why_ did Google
| have significantly better search results for a long time?
|
| 1) There was a data flywheel effect, wherein Google was
| able to improve search results by analyzing the vast
| amount of user activity on its site.
|
| 2) There were real economies of scale in managing the
| cost of data centers and servers
|
| 3) Their advertising business model benefited from
| network effects, wherein advertisers don't want to bother
| giving money to a search engine with a much smaller user
| base. This profitability funded R&D that competitors
| couldn't match.
|
| There are probably more that I'm missing, but I think the
| primary takeaway is that Google's scale, in and of
| itself, led to a better product.
|
| Can the same be said for OpenAI? I can't think of any
| strong economies of scale or network effects for them,
| but maybe I'm missing something. Put another way, how
| does OpenAI's product or business model get significantly
| better as more people use their service?
| aurareturn wrote:
| They have more data on what people want from models?
|
| Their SOTA models can generate better synthetic data for
| the next training run - leading to a flywheel effect?
| rayval wrote:
| In theory, the more people use the product, the more
| OpenAI knows what they are asking about and what they do
| after the first result, the better it can align its model
| to deliver better results.
|
| A similar dynamic occurred in the early days of search
| engines.
| visarga wrote:
| I call it the experience flywheel. Humans come with
| problems, AI asistant generates some ideas, human tries
| them out and comes back to iterate. The model gets
| feedback on prior ideas. So you could say AI tested an
| idea in the real world, using a human. This happens many
| times over for 300M users at OpenAI. They put a trillion
| tokens into human brains, and as many into their logs.
| The influence is bidirectional. People adapt to the
| model, and the model adapts to us.. But that is in
| theory.
|
| In practice I never heard OpenAI mention how they use
| chat logs for improving the model. They are either afraid
| to say, for privacy reasons, or want to keep it secret
| for technical advantage. But just think about the
| billions of sessions per month. A large number of them
| contain extensive problem solving. So the LLMs can
| collect experience, and use it to improve problem
| solving. This makes them into a flywheel of human
| experience.
| nyrikki wrote:
| You are forgetting a bit, I worked in some of the large
| datacenters where both Google and Yahoo had cages.
|
| 1) Google copied the hotmail model of strapping commodity
| PC components to cheap boards and building software to
| deal with complexity.
|
| 2) Yahoo had a much larger cage, filled with very very
| expensive and large DEC machines, with one poor guy
| sitting in a desk in there almost full time rebooting the
| systems etc....I hope he has any hearing left today.
|
| 3) Just right before the .com crash, I was in a cage next
| to Google's racking dozens of brand new Netra T1s, which
| were pretty slow and expensive...that company I was
| working for died in the crash.
|
| Look at Google's web page:
|
| https://www.webdesignmuseum.org/gallery/google-1999
|
| Compare that to Yahoo:
|
| https://www.webdesignmuseum.org/gallery/yahoo-in-1999
|
| Or the company they originaly tried to sell google to
| Excite:
|
| https://www.webdesignmuseum.org/gallery/excite-2001
|
| Google grew to be profitable because they controlled
| costs, invested in software vs service contracts and
| enterprise gear, had a simple non-intrusive text based ad
| model etc...
|
| Most of what you mention above was well after that model
| focused on users and thrift allowed them to scale and is
| survivorship bias. Internal incentives that directed
| capitol expenditures to meet the mission vs protect
| peoples back was absolutely a related to their survival.
|
| Even though it was a metasearch, my personal preference
| was SavvySearch until it was bought and killed or what
| ever that story way.
|
| OpenAI is far more like Yahoo than Google.
| WalterBright wrote:
| > I hope he has any hearing left today
|
| I opted for a fanless graphics board, for just that
| reason.
| talldayo wrote:
| > What is Google's first mover moat?
|
| AdSense
| eikenberry wrote:
| Google wasn't the first mover in search. They were at
| least second if not third.
| aurareturn wrote:
| OpenAI has a lot more revenue than Claude.
|
| Late in 2024, OpenAI had $3.7b in revenue. Meanwhile,
| Claude's mobile app hit $1 million in revenue around the
| same time.
| apwell23 wrote:
| > Late in 2024, OpenAI had $3.7b in revenue
|
| Where do they report these ?
|
| edit i found it here
| https://www.cnbc.com/2024/09/27/openai-sees-5-billion-
| loss-t...
|
| "OpenAI sees roughly $5 billion loss this year on $3.7
| billion in revenue"
| kpennell wrote:
| almost everyone I know is the same. 'Claude seems to be
| better and can take more data' is what I hear a lot.
| ed wrote:
| One moat will eventually come in the form of personal
| knowledge about you - consider talking with a close friend
| of many years vs a stranger
| kgc wrote:
| Couldn't you just copy all your conversations over?
| moralestapia wrote:
| *sigh*
|
| This broken record again.
|
| Just observe reality. OpenAI is leading, by far.
|
| All these "OpenAI has no moat" arguments will only make
| sense whenever there's a material, _observable_ (as in not
| imaginary), shift on their market share.
| ransom1538 wrote:
| I moved 100% over to deepseek. No switch cost. Zero.
| lxgr wrote:
| > First mover advantage acquired and keeps subscribers.
|
| Does it? As a chat-based (Claude Pro, ChatGPT Plus etc.)
| user, LLMs have zero stickiness to me right now, and the APIs
| hardly can be called moats either.
| distances wrote:
| If it's for mass consumer market then it does matter. Ask
| any non-technical person around you. High chance is that
| they know ChatGPT but can't name a single other AI model or
| service. Gemini, just a distant maybe. Claude, definitely
| not -- I'm positive I'm hard pressed to find anyone in my
| _technical_ friends who knows about Claude.
| xmodem wrote:
| They probably know CoPilot as the thing Microsoft is
| trying to shove down their throat...
| boringg wrote:
| Your making some big assumptions projecting into the future.
| One that deepseek takes market position, two that the
| information they have released is honest regarding training
| usage, spend etc.
|
| Theres a lot more still to unpack and I don't expect this to
| stay solely in the tech realm. Seems to politically sensitive.
| meiraleal wrote:
| DeepSeek is profitable, openai is not. That big expensive moat
| won't help much when the competition knows how to fly.
| aurareturn wrote:
| DeepSeek is not profitable. As far as I know, they don't have
| any significant revenue from their models. Meanwhile, OpenAI
| has $3.7b in revenue last reported and has high gross
| margins.
| meiraleal wrote:
| tell that to the stock market then, it might change the
| graph direction back to green.
| aurareturn wrote:
| I'm doing the best I can.
| 11101010001100 wrote:
| I think this is just a(nother) canary for many other markets in
| the US v China game of monopoly. One weird effect in all this is
| that US Tech may go on to be over valued (i.e., disconnect from
| fundamentals) for quite some time.
| btbuildem wrote:
| I always appreciate reading a take from someone who's well versed
| in the domains they have opinions about.
|
| I think longer-term we'll eat up any slack in efficiency by
| throwing more inference demands at it -- but the shift is
| tectonic. It's a cultural thing. People got acclimated to
| shlepping around morbidly obese node packages and stringing
| together enormous python libraries - meanwhile the deepseek guys
| out here carving bits and bytes into bare metal. Back to FP!
| vonneumannstan wrote:
| This is a bizarre take. First Deepseek no doubt is still using
| the same bloated Python ML packages as everyone else. Second
| since this is "open source" it's pretty clear that the big labs
| are just going to replicate this basically immediately and with
| their already massive compute advantages put models out that
| are extra OOM larger/better/etc/ than what Deepseek can
| possibly put out. Theres just no reason to think that e.g. a
| 10x increase in training efficiency does anything but increase
| the size of the next model generation by 10x.
| christkv wrote:
| All this is good news for all of us. Bad news probably for
| Nvidia's margins long term but who cares. If we can train and
| inference in less cycles and watts that is awesome.
| qwertox wrote:
| Considering the fact that current models were trained on top-
| notch books, those read and studied by the most brilliant
| engineers, the models are pretty dumb.
|
| They are more like the thing which enabled computers to work with
| and digest text instead of just code. The fact that they can
| parrot pretty interesting relationships from the texts they've
| consumed kind of proofs that they are capable of statistically
| "understanding" what we're trying to talk with them about, so
| it's a pretty good interface.
|
| But going back to the really valuable content of the books
| they've been trained on, they just don't understand it. There's
| other AI which needs to get created which can really learn the
| concepts taught in those books instead of just the words and the
| value of the proximities between them.
|
| To learn that other missing part will require hardware just as
| uniquely powerful and flexible as what Nvidia has to offer. Those
| companies now optimizing for inference and LLM training will be
| good at it and have their market share, but they need to ensure
| that their entire stack is as capable of Nvidia's stack, if they
| also want to be part of future developments. I don't know if
| Tenstorrent or Groq are capable of doing this, but I doubt it.
| lenerdenator wrote:
| I think it's more than just the market effect on "established" AI
| players like Nvidia.
|
| I don't think it's necessarily a coincidence that DeepSeek
| dropped within a short time frame of the announcement of the AI
| investment initiative by the Trump administration.
|
| The idea is to get the money from investors who want to earn a
| return. Lower capex is attractive to investors, and DS drops
| capex dramatically. It makes Chinese AI talent look like the
| smart, safe bet. Nothing like DS could happen in China unless the
| powers-that-be knew about it and got some level of control. I'm
| also willing to bet that this isn't the best they've got.
|
| They're saying "we can deliver the same capabilities for far
| less, and we're not going to threaten you with a tariff for not
| complying".
| robomartin wrote:
| Despite the fact that this article is very well written and
| certainly contains high quality information, I choose to remain
| skeptical as it pertains to Nvidia's position in the market. I'll
| come right out and say that my experience likely makes me see
| this from a biased position.
|
| The premise is simple: Business is warfare. Anything you can do
| to damage or slow down the market leader gives you more time to
| get caught up. FUD is a powerful force.
|
| My bias comes from having been the subject of such attacks in my
| prior tech startup. Our technology was destroying the offerings
| of the market leading multi-billion-dollar global company that
| pretty much owned the sector. The natural processes of such a
| beast caused them not to be able to design their way out of a
| paper bag. We clearly had an advantage. The problem was that we
| did not have the deep pockets necessary to flood the market with
| it and take them out.
|
| What did they do?
|
| The started a FUD campaign.
|
| They went to every single large customer and our resellers (this
| was a hardware/software product) a month or two before the two
| main industry tradeshows, and lied to them. They promised that
| they would show market-leading technology "in just a couple of
| months" and would add comments like "you might want to put your
| orders on hold until you see this". We had multi-million dollar
| orders held for months in anticipation of these product
| unveilings.
|
| And, sure enough, they would announce the new products with a
| great marketing push at the next tradeshow. All demos were
| engineered and manipulated to deceive, all of them. Yet, the
| incredible power of throwing millions of dollars at this effort
| delivered what they needed, FUD.
|
| The problem with new products is that it takes months for them to
| be properly validated. So, if the company that had frozen a $5MM
| order for our products decides to verify the claims of our
| competitor, it typically took around four months. In four months,
| they would discover that the new shiny object was shit and less
| stellar than what they were told. I other words, we won. Right?
|
| No!
|
| The mega-corp would then reassure them that they iterated vast
| improvements into the design and those would be presented --I kid
| you not-- at the next tradeshow. Spending millions of dollars
| they, at this point, denied us of millions of dollars of revenue
| for approximately one year. FUD, again.
|
| The next tradeshow came and went and the same cycle repeats...it
| would take months for customers to realize the emperor had no
| clothes. It was brutal to be on the receiving end of this without
| the financial horsepower to be able to break through the FUD. It
| was a marketing arms race and we were unprepared to win it. In
| this context, the idea that a better mouse trap always wins is
| just laughable.
|
| This did not end well. They were not going to survive another FUD
| cycle. Reality eventually comes into play. Except that, in this
| case, 2008 happened. The economic implosion caught us in serious
| financial peril due to the damage done by the FUD campaign.
| Ultimately, it was not survivable and I had to shut down the
| company.
|
| It took this mega-corp another five years to finally deliver a
| product that approximated what we had and another five years
| after that to match and exceed it. I don't even want to imagine
| how many hundreds of millions they spent on this.
|
| So, long way of saying: China wants to win. No company in China
| is independent from government forces. This is, without a doubt,
| a war for supremacy in the AI world. It is my opinion that, while
| the technology, as described, seems to make sense, it is highly
| likely that this is yet another form of a FUD campaign to gain
| time. If they can deny Nvidia (and others) the orders needed to
| maintain the current pace, they gain time to execute on a
| strategy that could give them the advantage.
|
| Time will tell.
| samiv wrote:
| I think the biggest threat for future NVIDIa right now is their
| own current success.
|
| Their software platforms and CUDA are a very strong moat against
| everyone else. I don't see any beating them on that front right
| now.
|
| The problem is that I'm afraid that all that money sloshing
| inside the company is rotting the culture and that will
| compromise future development. - Grifters are
| filling out positions in many orgs only trying to milk it as much
| as possible. - Old employees become complacent with their
| nice RSU packages Rest & Vest.
|
| NVIDIA used to be extremely nimble and was way fighting way above
| it's weight class. Prior to Mellanox acquisition only around 10k
| employees and after another 10k more.
|
| If there's a real threat to their position at the top of the AI
| offerings will they be able to roll up the sleeves and get back
| to work or will the organizations be unable to move ahead.
|
| Long term I think it's inevitable that China will take over the
| technology leadership. They have the population and they have the
| education programs and the skill to do this. At the same time in
| the old western democracies things are becoming stagnant and I
| even dare to say that the younger generations are declining. In
| my native country the educational system has collapsed, over 20%
| kids that finish elementary school cannot read or write. They can
| mouth-breath and scroll TikTok though but just barely since their
| attention span is about the same as gold fish.
| _DeadFred_ wrote:
| LOL. This isn't rot, it is reaching the end goal, the people
| doing the work reach the rewards they were working towards. Rot
| would imply somehow management should prevent rest and vest but
| that is the exact model that they acquired their talent on. You
| would have to remove capitalism from companies when companies
| win at capitalism making it all just a giant rug pull for
| employees.
| scudsworth wrote:
| what a compelling domain name. it compels me not to click on it
| indymike wrote:
| This story could be applied to every tech breakthrough. We start
| where the breakthrough is moated by hardware, access to
| knowledge, and IP. Over time:
|
| - Competition gets crucial features into cheaper hardware
|
| - Work-arounds for most IP are discovered
|
| - Knowledge finds a way out of the castle
|
| This leads to a "Cambrian explosion" of new devices and software
| that usually gives rise to some game-changing new ways to use the
| new technology. I'm not sure where we all thought this somehow
| wouldn't apply to AI. We've seen the pattern with almost every
| new technology you can think of. It's just how it works. Only the
| time it takes for patents to expire changes this... so long as
| everyone respects the patent.
| _DeadFred_ wrote:
| It's still wild to me that toasters have always been $20 but
| extremely expensive lasers, digital chips, amps, motors, LCD
| screens worked their way down to $20 CD players.
| indymike wrote:
| So... Electric toasters came to market in the 1920s, priced
| from $15, eventually getting as low as $5. Adjusting for
| inflation, that $15 toaster cost $236.70 in 2025 USD. Today's
| $15 toaster would be about 90C/ in 1920s dollars... so it
| follows the story.
| lxgr wrote:
| The most important part for me is:
|
| > DeepSeek is a tiny Chinese company that reportedly has under
| 200 employees. The story goes that they started out as a quant
| trading hedge fund similar to TwoSigma or RenTec, but after Xi
| Jinping cracked down on that space, they used their math and
| engineering chops to pivot into AI research.
|
| I guess now we have the answer to the question that countless
| people have already asked: Where could we be if we figured out
| how to get most math and physics PhDs to work on things other
| than picking up pennies in front of steamrollers (a.k.a. HFT)
| again?
| rfoo wrote:
| This is completely fake though. It was more like their founder
| decided to start a branch to do AI research. It was well
| planned, they bought significantly more GPUs than they can use
| for quant research even before they start to do anything AI.
|
| There was a crack down on algorithmic trading, but it didn't
| had much impact and IMO someone higher up definitely does not
| want to kill these trading firms.
| lxgr wrote:
| The optimal amount of algorithmic trading is definitely more
| than none (I appreciate liquidity and price quality as much
| as the next guy), but arguably there's a case here that we've
| overshot a bit.
| rightbyte wrote:
| The price data I (we?) get is 15 minute delayed. I would
| guess most of the profiteering is from consumers not
| knowing the last transaction prices? I.e. an artificially
| created edge by the broker who then sells the API to clean
| their hands of the scam.
| lxgr wrote:
| Real-time price data is indeed not free, but widely
| available even in retail brokerages. I've never seen a 15
| minute delay in any US based trade, and I think I can
| even access level 2 data a limited number of times on
| most exchanges (not that it does me much good as a retail
| investor).
|
| > I would guess most of the profiteering is from
| consumers not knowing the last transaction prices?
|
| No, not at all. And I wouldn't even necessarily call it
| profiteering. Ironically, as a retail investor you even
| _benefit_ from hedge funds and HFTs being a counterpart
| to your trades: You get on average better (and worst case
| as good) execution from PFOF.
|
| Institutional investors (which include pension funds,
| insurances etc.) are a different story.
| rightbyte wrote:
| OK ty I guess I got it wrong. I thought it was way more
| common than for my scrappy bank.
| doctorpangloss wrote:
| Who knows? That too is a bunch of mythmaking. One thing's for
| sure, there are no moats or secrets.
| auntienomen wrote:
| DeepSeek is a subsidiary of a relatively successful Chinese
| quant trading firm. It was the boss' weird passion project,
| after he made a few billion yuan from his other passion,
| trading. The whole thing was funded by quant trading profits,
| which kind of undermines your argument. Maybe we should just
| let extremely smart people work on the things that catch their
| interest?
| lxgr wrote:
| Interest of extremely smart people is often is strongly
| correlated with potential profits, and these are very much
| correlated with policy, which in the case of financial
| regulation shapes market structures.
|
| Another way of saying this: It's a well-known fact that
| complicated puzzles with a potentially huge reward attached
| to them attract the brightest people, so I'm arguing that we
| should be very conscious of the types of puzzles we
| implicitly come up with, and consider this an externality to
| be accounted for.
|
| HFT is, to a large extent, a product of policy, in particular
| Reg NMS, based on the idea that we need to have many
| competing exchanges to make our markets more efficient. This
| has worked well in breaking down some inefficiencies, but has
| created a whole set of new ones, which are the basis of HFT
| being possible in the first place.
|
| There are various ideas on whether different ways of
| investing might be more efficient, but these largely focus on
| benefits to investors (i.e. less money being "drained away"
| by HFT). What I'm arguing is that the "draining" might not
| even be the biggest problem, but rather that the people doing
| it could instead contribute to equally exciting, non-zero sum
| games instead.
|
| We definitely want to keep around the the part of HFT that
| contributes to more efficient resource allocation (an
| inherently hard problem), but wouldn't it be great if we
| could avoid the part that only works around the kinks of a
| particular market structure emergent from a particular piece
| of regulation?
| godelski wrote:
| Interestingly a lot of the math and physics people in the ML
| community are considered "grumpy researchers." A joke apparent
| with this starter pack[0].
|
| From my personal experience (undergrad physics, worked as
| engineer, came to CS & ML because I liked the math), there's a
| lot of pushback. - I've been told that the math
| doesn't matter/you don't need math. - I've heard very
| prominent researchers say "fuck theorists" - I've seen
| papers routinely rejected for improving training techniques
| with reviewers say "just tune a large model" - I see
| papers that show improvements when conditioning comparisons on
| compute restraints because "not enough datasets" or "but does
| it scale" (these questions can always be asked but require
| exponentially more work) - I've been told I'm gatekeeping
| for saying "you don't need math to make good models, but you
| need it to know why your models are wrong" (yes, this is a
| reference) - when pointing out math or statistical errors
| I'm told it doesn't matter - and much more.
|
| I've heard this from my advisor, dissertation committee,
| bosses[1], peers, and others (of course, HN). If my experience
| is short of being rare, I think it explains the grumpy
| group[2]. But I'm also not too surprised with how common it is
| in CS for people to claim that everything is easy or that leet
| code is proof of competence (as opposed to evidence).
|
| I think unfortunately the problem is a bit bigger, but it isn't
| unsolvable. Really, it is "easily" solvable since it just
| requires us to make different decisions. Meaning _each and
| every one of us_ has a direct impact on making this change.
| Maybe I'm grumpy because I want to see this better world. Maybe
| I'm grumpy because I know it is possible. Maybe I'm grumpy
| because it is my job to see problems and try to fix them lol
|
| [0] https://bsky.app/starter-
| pack/roydanroy.bsky.social/3lba5lii... (not perfect, but
| there's a high correlation and I don't think that's a
| coincidence)
|
| [1] Even after _demonstrating_ how my points directly improve
| the product, more than doubling performance on _customer_ data.
|
| [2] not to mention the way experiments are done, since it is
| stressed in physicists that empirics is not enough.
| https://www.youtube.com/watch?v=hV41QEKiMlM
| lxgr wrote:
| Is this in academia?
|
| Arguably, the emergence of quant hedge funds and private AI
| research companies is at least as much a symptom of the
| dysfunctions of academia (and society's compensation of
| academics on dimensions monetary and beyond) as it is of the
| ability of Wall Street and Silicon Valley to treat former
| scientists better than that.
| godelski wrote:
| > Is this in academia?
|
| Yes and no. Industry AI research is currently tightly
| coupled with academic research. Most of the big papers you
| see are either directly from the big labs or in
| partnership. Not even labs like Stanford have sufficient
| compute to train GPT from scratch (maybe enough for
| DeepSeek). Here's Fei-Fei Li discussing the issue. Stanford
| has something like 300 GPUs[1]? And those have to be split
| across labs.
|
| The thing is that there's always a pipeline. Academic does
| most of the low level research, say TRL[2] 1-4,
| partnerships happen between 4-6, and industry takes over
| the rest. (with some wiggleroom on these numbers). Much of
| ML academic research right now is tuning large models, made
| by big labs. This isn't low TRL. Additionally, a lot of
| research is rejected for not out-performing technologies
| that are already at TRL 5-7. See Mamba for a recent
| example. You could also point to KANs, which are probably
| around TRL 3. > Arguably, the emergence of
| quant hedge funds and private AI research companies is at
| least as much a symptom of the dysfunctions of academia
|
| Which is where I, again, both agree and disagree. It is not
| _just_ a symptom of the dysfunction of academia, but _also_
| industry. The reason I pointed out the grumpy researchers
| is because a lot of these people have been discussing
| techniques that DeepSeek used, long before they were used.
| DeepSeek looks like what happens when you set these people
| free. Which is my argument, that we should do that. Scale
| Maximalists (also alled "Bitter Lesson Maximalists", but I
| dislike the term) have been dominating ML research, and
| DeepSeek shows that scale isn't enough. So will hopefully
| give the mathy people more weight. But then again, is not
| the common way monopolies fall is because they become too
| arrogant and incestuous?
|
| So mostly, I agree, I'm just pointing out that there is a
| bit more subtly and I think we need to recognize that to
| make progress. There are a lot of physicists and mathy
| people who like ML and have been doing research in the area
| but are often pushed out because of the thinking I listed.
| Though part of the success of the quant industry is
| recognizing that the strong math and modeling skills of
| physicists generalize pretty well and you go after people
| who recognize that an equation that describes a spring
| isn't only useful for springs, but is useful for anything
| that oscillates. That understanding of math at that level
| is very powerful and boy are there a lot of people that
| want the opportunity to demonstrate this in ML, they just
| never get similar GPU access.
|
| [0] https://www.ft.com/content/d5f91c27-3be8-454a-bea5-bb8f
| f2a85...
|
| [1] https://archive.is/20241125132313/https://www.thewrap.c
| om/un...
|
| [2]
| https://en.wikipedia.org/wiki/Technology_readiness_level
| hn_throwaway_99 wrote:
| I'm curious if someone more informed than me can comment on this
| part:
|
| > Besides things like the rise of humanoid robots, which I
| suspect is going to take most people by surprise when they are
| rapidly able to perform a huge number of tasks that currently
| require an unskilled (or even skilled) human worker (e.g., doing
| laundry ...
|
| I've always said that the real test for humanoid AI is folding
| laundry, because it's an incredibly difficult problem. And I'm
| not talking about giving a machine clothing piece-by-piece
| flattened so it just has to fold, I'm talking about saying to a
| robot "There's a dryer full of clothes. Go fold it into separate
| piles (e.g. underwear, tops, bottoms) and don't mix the husband's
| clothes with the wife's". That is, something most humans in the
| developed world have to do a couple times a week.
|
| I've been following some of the big advances in humanoid robot
| AI, but the above task _still_ seems miles away given current
| tech. So is the author 's quote just more unsubstantiated hype
| that I'm constantly bombarded with in the AI space, or have there
| been advancements recently in robot AI that I'm unaware of?
| ieee2 wrote:
| I saw such robot's demos doing exactly that on youtube/x - not
| very precisely yet, but almost sufficiently enough. And it is
| just a beginning. Considering that majority of the laundry is
| very similar (shirts, t-shirts, trousers, etc..) I think this
| will be solved soon with enough training.
| hn_throwaway_99 wrote:
| Can you share what you've seen? Because from what I've seen,
| I'm far from convinced. E.g. there is this,
| https://youtube.com/shorts/CICq5klTomY , which nominally does
| what I've described. Still, as impressive as that is, I think
| the distance from what that robot does to what a human can do
| is a _lot_ farther than it seems. Besides noticing that the
| folded clothes are more like a neatly arranged pile, what
| about all the edge cases? What about static cling? Can it
| match socks? What if something gets stuck in the dryer?
|
| I'm just very wary of looking at that video and saying "Look!
| It's 90% of the way there! And think how fast AI advances!",
| because that critical last 10% can often be harder than the
| first 90% and then some.
| Nition wrote:
| First problem with that demo is that putting all your
| clothes in a dryer is a very American thing. Much of the
| world pegs their washing on a line.
| rattray wrote:
| https://physicalintelligence.company is working on this - see a
| demo where their robot does ~exactly what you said, I believe
| based on a "generalist" model (not pretrained on the tasks):
| https://www.youtube.com/watch?v=J-UTyb7lOEw
| delusional wrote:
| There are so many cuts in that 1 minute video, Jesus Christ.
| You'd think it was produced for TikTok.
| hnuser123456 wrote:
| 2 months ago, Boston Dynamics' Atlas was barely able to put
| solid objects in open cubbies. [1] Folding, hanging, and
| dresser drawer operation appears to be a few years out still.
|
| https://www.youtube.com/watch?v=F_7IPm7f1vI
| rashidae wrote:
| While Nvidia's valuation may feel bloated due to AI hype, AMD
| might be the smarter play.
| UncleOxidant wrote:
| Even if DeepSeek has figured out how to do more (or at least as
| much) with less, doesn't the Jevons Paradox come into play? GPU
| sales would actually increase because even smaller companies
| would get the idea that they can compete in a space that only 6
| months ago we assumed would be the realm of the large mega tech
| companies (the Metas, Googles, OpenAIs) since the small players
| couldn't afford to compete. Now that story is in question since
| DeepSeek only has ~200 employees and claims to be able to train a
| competitive model for about 20X less than the big boys spend.
| samvher wrote:
| My interpretation is that yes in the long haul, lower
| energy/hardware requirements might increase demand rather than
| decrease it. But right now, DeepSeek has demonstrated that the
| current bottleneck to progress is _not_ compute, which
| decreases the near term pressure on buying GPUs at any cost,
| which decreases NVIDIA's stock price.
| kemiller wrote:
| Short term, I 100% agree, but remains to be seen what "short"
| means. According to at least some benchmarks, Deepseek is
| _two full orders of magnitude_ cheaper for comparable
| performance. Massive. But that opens the door for much more
| elaborate "architectures" (chain of thought,
| architect/editor, multiple choice) etc, since it's possible
| to run it over and over to get better results, so raw speed &
| latency will still matter.
| yifanl wrote:
| It does, but proving that it can be done with cheaper (and more
| importantly for NVidia), lower margin chips breaks the spell
| that NVidia will just be eating everybody's lunch until the end
| of time.
| aurareturn wrote:
| If demand for AI chips will increase due to Jevon's paradox,
| why would Nvidia's chips become cheaper?
|
| In the long run, yes, they will be cheaper due to more
| competition and better tech. But next month? It will be more
| expensive.
| yifanl wrote:
| The usage of existing but cheaper nvidia chips to make
| models of similar quality is the main takeaway.
|
| It'll be much harder to convince people to buy the latest
| and greatest with this out there.
| UncleOxidant wrote:
| The sweet spot for running local LLMs (from what I'm
| seeing on forums like r/localLlama) is 2 to 4 3090s each
| with 24GB of VRAM. NVidia (or AMD or Intel) would clean
| up if they offered a card with 3090 level performance but
| with 64GB of VRAM. Doesn't have to be the leading edge
| GPU, just a decent GPU with lots of VRAM. This is kind of
| what Digits will be (though the memory bandwidth is going
| to be slower with because it'll be DDR5) and kind of what
| AMD's Strix Halo is aiming for - unified memory systems
| where the CPU & GPU have access to the same large pool of
| memory.
| redlock wrote:
| The issue here is that, even with a lot of VRAM, you may
| be able to run the model, but with a large context, it
| will still be too slow. (For example, running LLaMA 70B
| with a 30k+ context prompt takes minutes to process.)
| aurareturn wrote:
| The usage of existing but cheaper nvidia chips to make
| models of similar quality is the main takeaway.
|
| So why not buy a more expensive Nvidia chip to run a
| better model?
| yifanl wrote:
| Is there still evidence that more compute = better model?
| aurareturn wrote:
| Yes. Plenty of evidence.
|
| The DeepSeek R1 model people are freaking out about, runs
| better with more compute because it's a chain of thoughts
| model.
| Vegenoid wrote:
| Because if you don't have infinite money, considering
| whether to buy a thing is about the ratio of price to
| performance, not just performance. If you can get enough
| performance for your needs out of a cheaper chip, you buy
| the cheaper chip.
| hodder wrote:
| Jevons paradox isn't some iron law like gravity.
| trgn wrote:
| feels like it is in tech. any gains in hardware or algorithm
| advance, immediately get consumed by increase in data
| retention and software bloat.
| gamblor956 wrote:
| Important to note: the $5 million alleged cost is just the cpu
| compute cost for the final version of the model; it's not the
| cumulative cost of the research to date.
|
| The analogous costs would be what OpenAI spent to go from GPT 4
| to GPT 4o (i.e., to develop the reasoning model from the most
| up-to-date LLM model). $5 million is still less than what
| OpenAI spent but it's not a magnitude lower. (OpenAI spent up
| to $100 million on GPT4 but a fraction of that to get GPT 4o.
| Will update comment if I can find numbers for 4o before edit
| window closes)
| fspeech wrote:
| It doesn't make sense to compare individual models. A better
| way is to look at total compute consumed, normalized by the
| output. In the end what counts is the cost of providing
| tokens.
| fspeech wrote:
| But why would the customers accept the high prices and high
| gross margin of Nvidia if they no longer fear missing out with
| insufficient hardware?
| tedunangst wrote:
| Selling 100 chips for $1 profit is less profitable than selling
| 20 chips for $10 profit.
| mackid wrote:
| Microsoft did a bunch of research into low-bit weights for
| models. I guess OAI didn't look at this work.
|
| https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7...
| highfrequency wrote:
| The R1 paper (https://arxiv.org/pdf/2501.12948) emphasizes their
| success with reinforcement learning without requiring any
| supervised data (unlike RLHF for example). They note that this
| works well for math and programming questions with verifiable
| answers.
|
| What's totally unclear is what data they used for this
| reinforcement learning step. How many math problems of the right
| difficulty with well-defined labeled answers are available on the
| internet? (I see about 1,000 historical AIME questions, maybe
| another factor of 10 from other similar contests). Similarly,
| they mention LeetCode - it looks like there are around 3000
| LeetCode questions online. Curious what others think - maybe the
| reinforcement learning step requires far less data than I would
| guess?
| mrinterweb wrote:
| The vast majority of Nvidia's current value is tied to their
| dominance in AI hardware. That value could be threatened if LLMs
| could be trained and or ran efficiently using a CPU or a quantum
| chip. I don't understand enough about the capabilities of quantum
| computing to know if running or training a LLM would be possible
| using a quantum chip, but if it becomes possible, NVDA stock is
| unlikely to fair well (unless they are making the new chip).
| tempeler wrote:
| First of all, I don't invest in Nvidia and like Olygopols. But it
| is too early to talk about Nvidia's future. People are just
| betting and wishing about Nvidia's future. No one knows people's
| what people will do in the future. what they will think? It's
| just guessing and betting. Their real competitor is not Deepseek.
| Did AMD or others release something new and compete with Nvidia's
| products? If NVDIA will be the market leader, this means they
| will lead the price. Being Olygopol is something like that. They
| don't need to compete for the price of competitors.
| wtcactus wrote:
| To me, this seems like we are back again in 1953 and a company
| just announced they are now capable of building one of IBM's 5
| computers for 10% of the price.
|
| I really don't understand the rationale of "We can now train GPT
| 4o for 10% the price, so that will bring demand for GPUs down.".
| If I can train GPT 4o for 10% the price, and I have a budget of
| 1B USD, that means I'm now going to use the same budget and train
| my model for 10x as long (or 10x bigger).
|
| At the same time, a lot of small players that couldn't properly
| train a model before, because the starting point was simply out
| of their reach, will now be able to purchase equipment that's
| capable of something of note, and they will buy even more GPUs.
|
| P.S. Yes, I know that the original quote "I think there is a
| world market for maybe five computers", was taken out of context.
|
| P.S.S. In this rationale, I'm also operating under the assumption
| that Deepseek numbers are real. Which, given the track record of
| Chinese companies, is probably not true.
| ozten wrote:
| NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng),
| who has previously purchased at least 10,000 A100 shovels... has
| a "side project" where they figured out how to dig really well
| with a shovel and shared their secrets.
|
| The gold rush, wether real or a bubble is still there! NVIDA will
| still sell every shovel they can manufacture, as soon as it is
| available in inventory.
|
| Fortune 100 companies will still want the biggest toolshed to
| invent the next paradigm or to be the first to get to AGI.
| 0n0n0m0uz wrote:
| Please tell me if I am wrong. I know very little details and
| heard a few headlines and my hasty conclusion is that this
| development clearly shows the exponential nature of AI
| development in terms of how people are able to piggyback from the
| resources, time and money of the previous iteration. They used
| the output from chatgpt as the input to their model. Is this
| true, more or less accurate or off base?
___________________________________________________________________
(page generated 2025-01-27 23:01 UTC)