[HN Gopher] The impact of competition and DeepSeek on Nvidia
       ___________________________________________________________________
        
       The impact of competition and DeepSeek on Nvidia
        
       Author : eigenvalue
       Score  : 560 points
       Date   : 2025-01-25 15:30 UTC (2 days ago)
        
 (HTM) web link (youtubetranscriptoptimizer.com)
 (TXT) w3m dump (youtubetranscriptoptimizer.com)
        
       | eigenvalue wrote:
       | Yesterday I wrote up all my thoughts on whether NVDA stock is
       | finally a decent short (or at least not a good thing to own at
       | this point). I'm a huge bull when it comes to the power and
       | potential of AI, but there are just too many forces arrayed
       | against them to sustain supernormal profits.
       | 
       | Anyway, I hope people here find it interesting to read, and I
       | welcome any debate or discussion about my arguments.
        
         | patrickhogan1 wrote:
         | Good article. Maybe I missed it, but I see lots of analysis
         | without a clear concluding opinion.
        
         | scsilver wrote:
         | Wanted to add a preface: Thank you for your time on this
         | article, I appreciate your perspective and experience, hoping
         | you can help refine and reign in my bull case.
         | 
         | Where do you expect NVDA's forward and current eps to land?
         | What revenue drop off are you expecting in late 2025/2026. Part
         | of my bull case for NVDA, continuing, is it's very reasonable
         | multiple on insane revenue. An leveling off can be expected,
         | but I still feel bullish on it hitting $200+ (5 Trillion market
         | cap? on ~195B revenue for Fiscal year 2026 (calendar 2025) at
         | 33 EPS) based on this years revenue according to their guidance
         | and the guidance of the hyperscalers spending. Finding a sell
         | point is a whole different matter to being actively short. I
         | can see the case to take some profits, hard for me to go short,
         | especially in an inflationary environment (tariffs, electric
         | energy, bullying for lower US interest rates).
         | 
         | The scale of production of Grace Hopper and Blackwell amaze me,
         | 800k units of Blackwell coming out this quarter, is there even
         | production room for AMD to get their chips made? (Looking at
         | the new chip factories in Arizona)
         | 
         | R1 might be nice for reducing llm inferencing costs, unsure
         | about the local llama one's accuracy (couldnt get it to
         | correctly spit out the NFL teams and their associated
         | conferences, kept mixing NFL with Euro Football) but I still
         | want to train YOLO vision models on faster chips like A100's vs
         | T4 (4-5x multiples in speed for me).
         | 
         | Lastly, if the Robot/Autonomous vehicle ML wave hits within the
         | next year, (First drones and cars -> factories -> humanoids) I
         | think this compute demand can sustain NVDA compute demand.
         | 
         | The real mystery is how we power all this within 2 years...
         | 
         | * This is not financial advice and some of my numbers might be
         | a little off, still refining my model and verifying sources and
         | numbers
        
       | zippyman55 wrote:
       | So at some point we will have too many cannon ball polishing
       | factories and it will become apparent the cannon ball trajectory
       | is not easily improved on.
        
       | j7ake wrote:
       | This was an amazing summary of the landscape of ML currently.
       | 
       | I think the title does the article injustice, or maybe it's too
       | long for people to read to appreciate it (eg the deepseek stuff
       | can be an article within itself).
       | 
       | Whatever the ones with longer attention span will benefit from
       | this read.
       | 
       | Thanks for summarising this up!
        
         | eigenvalue wrote:
         | Thanks! I was a bit disappointed that no one saw it on HN
         | because I think they'd like it a lot.
        
           | j7ake wrote:
           | I think they would like it a lot, but I think the title
           | doesn't match the content, and it takes too much reading
           | before one realises it goes beyond the title.
           | 
           | Keep it up!
        
         | dang wrote:
         | We've changed the title to a different one suggested by the
         | author.
        
         | metadat wrote:
         | The site is currently offline, here's a snapshot:
         | 
         | https://archive.today/y4utp
        
       | diesel4 wrote:
       | Link isn't working. Is there another or a cached version?
        
         | eigenvalue wrote:
         | Try again! Just rebooted the server since it's going viral now.
        
       | OutOfHere wrote:
       | It seems like a pointless discussion since DeepSeek uses Nvidia
       | GPUs after all.
        
         | jjeaff wrote:
         | it uses a fractional amount of GPUs though.
        
           | breadwinner wrote:
           | As it says in the article, you are talking about a mere
           | constant of proportionality, a single multiple. When you're
           | dealing with an exponential growth curve, that stuff gets
           | washed out so quickly that it doesn't end up matter all that
           | much.
           | 
           | Keep in mind that the goal everyone is driving towards is
           | AGI, not simply an incremental improvement over the latest
           | model from Open AI.
        
             | high_na_euv wrote:
             | Why do you assume that exponential growth curve is real?
        
           | ithkuil wrote:
           | Which due to the Jevons Paradox may ultimately cause more
           | shovels to be sold
        
           | cma wrote:
           | Their loss curve with the RL didn't level off much though,
           | could be taken a lot further and scaled up to more parameters
           | on the big nvidia mega clusters out there. And the
           | architecture is heavily tuned to nvidia optimizations.
        
           | UltraSane wrote:
           | Jevons Paradox states that increasing efficiency can cause an
           | even larger increase in demand.
        
           | dutchbookmaker wrote:
           | "wait" I suspect we are all in a bit of denial.
           | 
           | When was the last time the US got their lunch ate in
           | technology?
           | 
           | Sputnik might be a bit hyperbolic but after using the model
           | all day and as someone who had been thinking of a pro
           | subscription, it is hard to grasp the ramifications.
           | 
           | There is just no good reference point that I can think of.
        
         | blackeyeblitzar wrote:
         | Yep some CEO said they have 50K GPUs of the prior generation.
         | They probably accumulated them through intermediaries that are
         | basically helping nvidia sell to sanctioned parties by proxy
        
           | idonotknowwhy wrote:
           | Deepseek was there side project. They had a lot of GPUs from
           | their crypto mining project.
           | 
           | Then Ethereum turned off PoW mining, so they looked into
           | other things to do with their GPUs, and started DeepSeek.
        
             | saagarjha wrote:
             | Mining crypto on H100s?
        
       | arcanus wrote:
       | > Amazon gets a lot of flak for totally bungling their internal
       | AI model development, squandering massive amounts of internal
       | compute resources on models that ultimately are not competitive,
       | but the custom silicon is another matter
       | 
       | Juicy. Anyone have a link or context to this? I'd not heard of
       | this reception to NOVA and related.
        
         | simonw wrote:
         | I think Nova may have changed things here. Prior to Nova their
         | LLMs were pretty rubbish - Nova only came out in December but
         | seems a whole lot better, at least from initial impressions:
         | https://simonwillison.net/2024/Dec/4/amazon-nova/
        
           | arcanus wrote:
           | Thanks! That's consistent with my impression.
        
       | snowmaker wrote:
       | This is an excellent article, basically a patio11 / matt levine
       | level breakdown of what's happening with the GPU market.
        
         | lxgr wrote:
         | Couldn't agree more! If this is the byproduct, these must be
         | some optimized Youtube transcripts :)
        
       | eprparadox wrote:
       | link seems to be dead... is this article still up somewhere?
        
         | jazzyjackson wrote:
         | It's back up, but just in case:
         | 
         | https://archive.is/y4utp
        
       | eigenvalue wrote:
       | Sorry, my blog crashed! Had a stupid bug where it was calling
       | GitHub too frequently to pull in updated markdown for the posts
       | and kept getting rate limits. Had to rewrite it but it should be
       | much better now.
        
       | breadwinner wrote:
       | Great article but it seems to have a fatal flaw.
       | 
       | As pointed out in the article, Nvidia has several advantages
       | including:                  - Better Linux drivers than AMD
       | - CUDA        - pytorch is optimized for Nvidia        - High-
       | speed interconnect
       | 
       | Each of the advantages is under attack:                  - George
       | Hotz is making better drivers for AMD        - MLX, Triton, JAX:
       | Higher level abstractions that compile down to CUDA        -
       | Cerbras and Groq solve the interconnect problem
       | 
       | The article concludes that NVIDIA faces an unprecedented
       | convergence of competitive threats. The flaw in the analysis is
       | that these threats are not unified. Any serious competitor must
       | address ALL of Nvidia's advantages. Instead Nvidia is being
       | attacked by multiple disconnected competitors, and each of those
       | competitors is only attacking one Nvidia advantage at a time.
       | Even if each of those attacks are individually successful, Nvidia
       | will remain the only company that has ALL of the advantages.
        
         | toisanji wrote:
         | I want the NVIDIA monopoly to end, but there is no real
         | competition still. * George Hotz has basically given up on AMD:
         | https://x.com/__tinygrad__/status/1770151484363354195
         | 
         | * Groq can't produce more hardware past their "demo". It seems
         | like they haven't grown capacity in the years since they
         | announced, and they switched to a complete SaaS model and don't
         | even sell hardware anymore.
         | 
         | * I dont know enough about MLX, Triton, and JAX,
        
           | simonw wrote:
           | That George Hotz tweet is from March last year. He's gone
           | back and forth on AMD a bunch more times since then.
        
             | bdangubic wrote:
             | is that good or bad?
        
               | simonw wrote:
               | Honestly I tried searching his recent tweets for AMD and
               | there was way too much noise in there to figure out his
               | current position!
        
               | zby wrote:
               | " we are going to move it off AMD to our own or partner
               | silicon. We have developed it to be very portable."
               | 
               | https://x.com/__tinygrad__/status/1879617702526087346
        
               | infecto wrote:
               | Honest question. That sounds more difficult that getting
               | things to play with commodity hardware. Maybe I am
               | oversimplifying it though.
        
               | whizzter wrote:
               | They have their own nn,etc libraries so adapting should
               | be fairly focused and AMD drivers have a hilariously bad
               | reputation historically among people who program GPU's
               | (I've been bitten a couple of times myself by weirdness).
               | 
               | I think you should consider it as, if they're trying to
               | avoid Nvidia and make sure their code isn't tied to
               | NVidia-isms, and AMD is troublesome enough for basics the
               | step to customized solutions is small enough to be
               | worthwhile for something even cheaper than AMD.
        
               | solarkraft wrote:
               | I consider it a good sign that he hasn't completely given
               | up. But it sure all seems shaky.
        
             | roland35 wrote:
             | The same Hotz who lasted like 4 weeks at Twitter after
             | announcing that he'd fix everything? It doesn't really
             | inspire a ton of confidence that he can single handedly
             | take down Nvidia...
        
           | bfung wrote:
           | It looks like he's close to having own AMD stack, tweet
           | linked in the article, Jan 15,2025:
           | https://x.com/__tinygrad__/status/1879615316378198516
        
             | saagarjha wrote:
             | $1000 bounty? That's like 2 hours of development time at
             | market rate lol
        
             | htrp wrote:
             | We'll check in again with him in 3 months and he'll still
             | be just 1 piece away.
        
           | billconan wrote:
           | I also noticed that Groq's Chief Architect now works for
           | NVIDIA.
           | 
           | https://research.nvidia.com/person/dennis-abts
        
         | Herring wrote:
         | He's setting up a case for shorting the stock, ie if the growth
         | or margins drop a little from any of these (often well-funded)
         | threats. The accuracy of the article is a function of the
         | current valuation.
        
           | eigenvalue wrote:
           | Exactly. You just need to see a slight deceleration in
           | projected revenue growth (which has been running 120%+ YoY
           | recently) and some downward pressure on gross margins, and
           | maybe even just some market share loss, and the stock could
           | easily fall 25% from that.
        
             | breadwinner wrote:
             | AMD P/E ratio is 109, NVDA is 56. Which stock is
             | overvalued?
        
               | eigenvalue wrote:
               | If it were all so simple, they wouldn't pay hedge fund
               | analysts so much money...
        
               | pineaux wrote:
               | No thats not true. Hedge funds get paid so well because
               | getting a small percentage of a big bag of money is still
               | a big bag of money. This statement is more true the
               | closer the big bag of money is to infinity.
        
               | daveguy wrote:
               | That is extraordinarily simplistic. If NVDA is slowing
               | and AMD has gains to realize compared to NVDA, then the
               | 10x difference in market cap would imply that AMD is the
               | better buy. Which is why I am long in AMD. You can't just
               | look at the current P/E delta. You have to look at
               | expectations of one vs the other. AMD gaining 2x over
               | NVDA means they are approximately equivalently valued. If
               | there are unrealized AI related gains all bets are off.
               | AMD closing 50% of the gap in market cap value between
               | NVDA and AMD means AMD is ~2.5x undervalued.
               | 
               | Disclaimer: long AMD, and not precise on percentages.
               | Just illustrating a point.
        
               | flowerlad wrote:
               | The point is, it should not be taken for granted that
               | NVDA is overvalued. Their P/E is low enough that if
               | you're going to state that they are overvalued you have
               | to make the case. The article while well written, fails
               | to make the case because it has a flaw: it assumes that
               | addressing just one of Nvidia's advantages is enough to
               | make it crash and that's just not true.
        
               | lxgr wrote:
               | If investing were as simple as looking at the P/E, all
               | P/Es would already be at 15-20, wouldn't they?
        
               | flowerlad wrote:
               | Not saying it is as simple as looking at P/E
        
               | lxgr wrote:
               | My point is that you have to make the case for _anything_
               | being over /undervalued. The null hypothesis is that the
               | market has correctly valued it, after all.
        
               | omgwtfbyobbq wrote:
               | In the long run, probably yes, but a particular stock is
               | less likely to be accurately value in the short run.
        
               | fldskfjdslkfj wrote:
               | If medium to long term you believe the space will
               | eventually get commoditized I the bear case is obvious.
               | And based on history there's a pretty high likelihood for
               | that to happen.
        
               | bdangubic wrote:
               | glad you are not my financial adviser :)
        
               | lxgr wrote:
               | On the other hand, getting a bigger slice of the existing
               | cake as a smaller challenger can be easier than baking a
               | bigger cake as the incumbent.
        
               | baq wrote:
               | Hey let's buy intel
        
               | dismalaf wrote:
               | NVDA is valued at $3.5 trillion, which means investors
               | think it will grow to around $1 trillion in yearly
               | revenue. Current revenue is around $35 billion per
               | quarter, so call it $140 billion yearly. Investors are
               | betting on a 7x increase in revenue. Not impossible,
               | sounds plausible but you need to assume AMD, INTC, GOOG,
               | AMZN, and all the others who make GPUs/TPUs either won't
               | take market share or the market will be worth multiple
               | trillions per year.
        
               | kimbler wrote:
               | I thought the valuation of public companies at 3x
               | revenues or 5x earnings has long since sailed?
        
               | idonotknowwhy wrote:
               | Intel had a great P/E a couple of years ago as well :)
        
               | hmm37 wrote:
               | You have to look at non-gaap numbers, and therefore
               | looking at forward PE ratios is necessary. When you look
               | at that, AMD is cheaper than NVDA. Moreover, the reason
               | why AMD PE ratio looks high is because they bought
               | xilinx, and in order to save on taxes, it makes their PE
               | ratio look really high.
        
               | htrp wrote:
               | rofl Forward PE ....
        
           | 2-3-7-43-1807 wrote:
           | > The accuracy of the article is a function of the current
           | valuation.
           | 
           | ah ... no ... that's nonsense trying to hide behind stilted
           | math lingo.
        
         | dralley wrote:
         | >So how is this possible? Well, the main reasons have to do
         | with software-- better drivers that "just work" on Linux and
         | which are highly battle-tested and reliable (unlike AMD, which
         | is notorious for the low quality and instability of their Linux
         | drivers)
         | 
         | This does not match my experience from the past ~6 years of
         | using AMD graphics on Linux. Maybe things are different with
         | AI/Compute, I've never messed with that, but in terms of normal
         | consumer stuff the experience of using AMD is vastly superior
         | than trying to deal with Nvidia's out-of-tree drivers.
        
           | saagarjha wrote:
           | They are.
        
         | thousand_nights wrote:
         | > George Hotz is making better drivers for AMD
         | 
         | lol
        
           | saagarjha wrote:
           | *George Hotz is making posts online talking about how AMD
           | isn't helping him
        
             | latchkey wrote:
             | George Hotz tried to extort AMD into giving him $500k in
             | free hardware and $2m cash, and they politely declined.
        
         | grajaganDev wrote:
         | There is not enough water (to cool data centers) to justify
         | NVDA's current valuation.
         | 
         | The same is true of electricity - neither nuclear power nor
         | fusion will not be online anytime soon.
        
           | lxgr wrote:
           | Those are definitely not the limiting factors here.
           | 
           | Not nearly all data centers are water cooled, and there is
           | this amazing technology that can convert sunlight into
           | electricity in a relatively straightforward way.
           | 
           | AI workloads (at least training) are just about as
           | geographically distributeable as it gets due to not being
           | very latency-sensitive, and even if you can't obtain
           | sufficient grid interconnection or buffer storage, you can
           | always leave them idle at night.
        
             | grajaganDev wrote:
             | Right - they are not limiting factors, they are reasons
             | that NVDA is overvalued.
             | 
             | Stock price is based on future earnings.
             | 
             | The smart money knows this and is reacting this morning -
             | thus the drop in NVDA.
        
           | energy123 wrote:
           | Solar microgrids are cheaper and faster than nuclear. New
           | nuclear isn't happening on the timescales that matter, even
           | assuming significant deregulation.
        
             | grajaganDev wrote:
             | Can you back up that solar microgrids will supply enough
             | power to justify NVDA's current valuation?
        
         | aorloff wrote:
         | The unification of the flaws is the scarcity of H100s
         | 
         | He says this and talks about it in The Fallout section - even
         | at BigCos with megabucks the teams are starved for time on the
         | Nvidia chips and if these innovations work other teams will use
         | them and then boom Nvidia's moat is truncated somehow which
         | doesn't look good at such lofty multiples
        
         | isatty wrote:
         | Sorry, I don't know who George Hotz is, but why isn't AMD
         | making better drivers for AMD?
        
           | adastra22 wrote:
           | George Hotz is a hot Internet celebrity that has basically
           | accomplished nothing of value but has a large cult following.
           | You can safely ignore.
           | 
           | (Famous for hacking the PS3-except he just took credit for a
           | separate group's work. And for making a self-driving car in
           | his garage--except oh wait that didn't happen either.)
        
             | Den_VR wrote:
             | You're not wrong, but after all these years it's fair to
             | give benefit of the doubt - geohot may have grown as a
             | person. The PS3 affair was incredibly disappointing.
        
             | xuki wrote:
             | He was famous before the PS3 hack, he was the first person
             | to unlock the original iPhone.
        
               | adastra22 wrote:
               | Yes, but it's worth mentioning that the break consisted
               | of opening up the phone and soldering on a bypass for the
               | carrier card locking logic. That certainly required some
               | skills to do, but is not an attack Apple was defending
               | against. This unlocking break didn't really lead to
               | anything, and was unlike the later software unlocking
               | methods that could be widely deployed.
        
               | SirMaster wrote:
               | Well he also found novel exploits in multiple later
               | iPhone hardware/software models and implemented complete
               | jailbreak applications.
        
             | hshshshshsh wrote:
             | What about comma.ai?
        
             | sebmellen wrote:
             | Comma.ai works really well. I use it every day in my car.
        
             | medler wrote:
             | He took an "internship" at Twitter/X with the stated goal
             | of removing the login wall, apparently failing to realize
             | that the wall was a deliberate product decision, not a
             | technical challenge. Now the X login wall is more intrusive
             | than ever.
        
         | epolanski wrote:
         | > Any serious competitor must address ALL of Nvidia's
         | advantages.
         | 
         | Not really, his article focuses on Nvidia's being valued so
         | highly by stock markets, he's not saying that Nvidia's destined
         | to lose its advantage in the space in the short term.
         | 
         | In any case, I also think that the likes of MSFT/AMZN/etc will
         | be able to reduce their capex spending eventually by being able
         | to work on a well integrated stack on their own.
        
           | madaxe_again wrote:
           | They have an enormous amount of catching up to do, however;
           | Nvidia have created an entire AI ecosystem that touches
           | almost every aspect of what AI can do. Whatever it is, they
           | have a model for it, and a framework and toolkit for working
           | with or extending that model - _and the ability to design
           | software and hardware in lockstep_. Microsoft and Amazon have
           | a very diffuse surface area when it comes to hardware, and
           | being a decent generalist doesn't make you a good specialist.
           | 
           | Nvidia are doing phenomenal things with robotics, and that is
           | likely to be the next shoe to drop, and they are positioned
           | for another catalytic moment similar to that which we have
           | seen with LLMS.
           | 
           | I do think we will see some drawback or at least deceleration
           | this year while the current situation settles in, but within
           | the next three years I think we will see humanoid robots
           | popping up all over the place, particularly as labour
           | shortages arise due to political trends - and somebody is
           | going to have to provide the compute, both local and cloud,
           | and the vision, movement, and other models. People will turn
           | to the sensible and known choice.
           | 
           | So yeah, what you say is true, but I don't think is going to
           | have an impact on the trajectory of nvidia.
        
         | csomar wrote:
         | > - Better Linux drivers than AMD
         | 
         | Unless something radically changed in the last couple years, I
         | am not sure where you got this from? (I am specifically talking
         | about GPUs for computer usage rather than training/inference)
        
           | idonotknowwhy wrote:
           | > Unless something radically changed in the last couple
           | years, I am not sure where you got this from?
           | 
           | This was the first thing that stuck out to me when I skimmed
           | the article, and the reason I decided to invest the time
           | reading it all. I can tell the author knows his shit and
           | isn't just parroting everyone's praise for AMD Linux drivers.
           | 
           | > (I am specifically talking about GPUs for computer usage
           | rather than training/inference)
           | 
           | Same here. I suffered through the Vega 64 after everyone said
           | how great it is. So many AMD-specific driver bugs, AMD driver
           | devs not wanting to fix them for non-technical reasons, so
           | many hard-locks when using less popular software.
           | 
           | The only complaints about Nvidia drivers I found were "it's
           | proprietary" and "you have to rebuild the modules when you
           | update the kernel" or "doesn't work with wayland".
           | 
           | I'd hesitate to ever touch an AMD GPU again after my
           | experience with it, haven't had a single hick-up for years
           | after switching to Nvidia.
        
             | csomar wrote:
             | Wayland was a requirement for me. I've used an AMD GPU for
             | years. I had a bug exactly once with a linux update. But
             | has been stable since.
        
               | surajrmal wrote:
               | Wayland doesn't matter in the server space though.
        
             | cosmic_cheese wrote:
             | Another ding against Nvidia for Linux desktop use is that
             | only some distributions either make it easy to install and
             | keep the proprietary drivers updated (e.g. Ubuntu) and/or
             | ship variants with the proprietary drivers preinstalled
             | (Mint, Pop!_OS, etc).
             | 
             | This isn't a barrier for Linux veterans but it adds
             | significant resistance for part-time users, even those that
             | are technically inclined, compared to the "it just works"
             | experience one gets with an Intel/AMD GPU under just about
             | every Linux distro.
        
           | fragmede wrote:
           | they are, unless you get distracted by things like licensing
           | and out of tree drivers and binary blobs. If you'd rather
           | pontificate about open source philosophy and rights than get
           | stuff done, go right ahead.
        
         | yapyap wrote:
         | Geohot still at it?
         | 
         | goat.
        
         | willvarfar wrote:
         | A new entrant, with an order of magnitude advantage in e.g.
         | cost or availability or exportability, can succeed even with
         | poor drivers and no CUDA etc. Its only when you cost nearly as
         | much as NVidia that the tooling costs become relevant.
        
         | latchkey wrote:
         | George is writing software to directly talk to consumer AMD
         | hardware, so that he can sell more Tinyboxes. He won't be doing
         | that for enterprise.
         | 
         | Cerbras and Groq need to solve the memory problem. They can't
         | scale without adding 10x the hardware.
        
         | slightwinder wrote:
         | > - Better Linux drivers than AMD
         | 
         | In which way? As a user who switched from an AMD-GPU to Nvidia-
         | GPU, I can only report a continued amount of problems with
         | NVIDIAs proprietary driver, and none with AMD. Is this maybe
         | about the open source-drivers or usage for AI?
        
         | queuebert wrote:
         | Don't forget they bought Mellanox and have their own HBA and
         | switch business.
        
       | gnlrtntv wrote:
       | > While Apple's focus seems somewhat orthogonal to these other
       | players in terms of its mobile-first, consumer oriented, "edge
       | compute" focus, if it ends up spending enough money on its new
       | contract with OpenAI to provide AI services to iPhone users, you
       | have to imagine that they have teams looking into making their
       | own custom silicon for inference/training
       | 
       | This is already happening today. Most of the new LLM features
       | announced this year are primarily on-device, using the Neural
       | Engine, and the rest is in Private Cloud Compute, which is also
       | using Apple-trained models, on Apple hardware.
       | 
       | The only features using OpenAI for inference are the ones that
       | announce the content came from ChatGPT.
        
         | simonw wrote:
         | "if it ends up spending enough money on its new contract with
         | OpenAI to provide AI services to iPhone users"
         | 
         | John Gruber says neither Apple nor OpenAI are paying for that
         | deal: https://daringfireball.net/linked/2024/06/13/gurman-
         | openai-a...
        
           | lxgr wrote:
           | Mark Gurman (from Bloomberg) is saying that.
        
       | uncletaco wrote:
       | When he says better linux drivers than AMD he's strictly talking
       | about for AI, right? Because for video the opposite has been the
       | case for as far back as I can remember.
        
         | eigenvalue wrote:
         | Yes, AMD drivers work fine for games and things like that.
         | Their problem is they basically only focused on games and other
         | consumer applications and, as a result, ceded this massive
         | growth market to Nvidia. I guess you can sort of give them a
         | pass because they did manage to kill their archival Intel in
         | data center CPUs but it's a massive strategic failure if you
         | look at how much it has cost them.
        
       | simonw wrote:
       | This is excellent writing.
       | 
       | Even if you have no interest at all in stock market shorting
       | strategies there is _plenty_ of meaty technical content in here,
       | including some of the clearest summaries I 've seen anywhere of
       | the interesting ideas from the DeepSeek v3 and R1 papers.
        
         | eigenvalue wrote:
         | Thanks Simon! I'm a big fan of your writing (and tools) so it
         | means a lot coming from you.
        
           | punkspider wrote:
           | I was excited as soon as I saw the domain name. Even after a
           | few months, this article[1] is still at the top of my mind.
           | You have a certain way of writing.
           | 
           | I remember being surprised at first because I thought it
           | would feel like a wall of text. But it was such a good read
           | and I felt I gained so much.
           | 
           | 1: https://youtubetranscriptoptimizer.com/blog/02_what_i_lear
           | ne...
        
             | eigenvalue wrote:
             | I really appreciate that, thanks so much!
        
             | nejsjsjsbsb wrote:
             | I was put off by the domain by bias against something that
             | sounds like a company blog. Especially a "YouTube
             | something".
             | 
             | You may get more milage from excellent writing on a
             | yourname.com. This is a piece that sells you not this
             | product, plus it feels more timeless. In 2050 someone my
             | point to this post. Better if it were on your own name.
        
               | eigenvalue wrote:
               | I had no idea this would get so much traction. I wanted
               | to enhance my organic search ranking of my niche web app,
               | not crash the global stock market!
        
           | dabeeeenster wrote:
           | Many thanks for writing this - its extremely interesting and
           | very well written - I feel like I've been brought up to date
           | which is hard in AI world!
        
       | andrewgross wrote:
       | > The beauty of the MOE model approach is that you can decompose
       | the big model into a collection of smaller models that each know
       | different, non-overlapping (at least fully) pieces of knowledge.
       | 
       | I was under the impression that this was not how MoE models work.
       | They are not a collection of independent models, but instead a
       | way of routing to a subset of active parameters at each layer.
       | There is no "expert" that is loaded or unloaded per question. All
       | of the weights are loaded in VRAM, its just a matter of which are
       | actually loaded to the registers for calculation. As far as I
       | could tell from the Deepseek v3/v2 papers, their MoE approach
       | follows this instead of being an explicit collection of experts.
       | If thats the case, theres no VRAM saving to be had using an MOE
       | nor an ability to extract the weights of the expert to run
       | locally (aside from distillation or similar).
       | 
       | If there is someone more versed on the construction of MoE
       | architectures I would love some help understanding what I missed
       | here.
        
         | Kubuxu wrote:
         | Not sure about DeepSeek R1, but you are right in regards to
         | previous MoE architectures.
         | 
         | It doesn't reduce memory usage, as each subsequent token might
         | require different expert buy it reduces per token
         | compute/bandwidth usage. If you place experts in different
         | GPUs, and run batched inference you would see these benefits.
        
           | andrewgross wrote:
           | Is there a concept of an expert that persists across layers?
           | I thought each layer was essentially independent in terms of
           | the "experts". I suppose you could look at what part of each
           | layer was most likely to trigger together and segregate those
           | by GPU though.
           | 
           | I could be very wrong on how experts work across layers
           | though, I have only done a naive reading on it so far.
        
             | rahimnathwani wrote:
             | I suppose you could look at what part of each layer was
             | most likely to trigger together and segregate those by GPU
             | though
             | 
             | Yes, I think that's what they describe in section 3.4 of
             | the V3 paper. Section 2.1.2 talks about "token-to-expert
             | affinity". I think there's a layer which calculates these
             | affinities (between a token and an expert) and then sends
             | the computation to the GPUs with the right experts.
             | 
             | This doesn't sound like it would work if you're running
             | just one chat, as you need all the experts loaded at once
             | if you want to avoid spending lots of time loading and
             | unloading models. But at scale with batches of requests it
             | should work. There's some discussion of this in 2.1.2 but
             | it's beyond my current ability to comprehend!
        
               | andrewgross wrote:
               | Ahh got it, thanks for the pointer. I am surprised there
               | is enough correlation there to allow an entire GPU to be
               | specialized. I'll have to dig in to the paper again.
        
               | Kubuxu wrote:
               | I don't think entire GPU is specialised nor a singular
               | token will use the same expert. I think about it as a
               | gather-scatter operation at each layer.
               | 
               | Let's say you have an inference batch of 128 chats, at
               | layer `i` you take the hidden states, compute their
               | routing, scatter them along with the KV for those layers
               | among GPUs (each one handling different experts), the
               | attention and FF happens on these GPUs (as model params
               | are there) and they get gathered again.
               | 
               | You might be able to avoid the gather by performing the
               | routing on each of the GPUs, but I'm generally guessing
               | here.
        
               | liuliu wrote:
               | It does. They have 256 experts per MLP layer, and some
               | shared ones. The minimal deployment for decoding (aka.
               | token generation) they recommend is 320 GPUs (H800). It
               | is all in the DeepSeek v3 paper that everyone should read
               | rather than speculating.
        
               | andrewgross wrote:
               | Got it. I'll review the paper again for that portion.
               | However, it still sounds like the end result is not VRAM
               | savings but efficiently and speed improvements.
        
               | liuliu wrote:
               | Yeah, if you look DeepSeek v3 paper deeper, each saving
               | on each axis is understandable. Combined, they reach some
               | magic number people can talk about (10x!): FP8: ~1.6 to
               | 2x faster than BF16 / FP16; MLA: cut KV cache size by 4x
               | (I think); MTP: converges 2x to 3x faster; DualPipe:
               | maybe ~1.2 to 1.5x faster.
               | 
               | If you look deeper, many of these are only applicable to
               | training (we already do FP8 for inference, MTP is to
               | improve training convergence, and DualPipe is to
               | overlapping communication / compute mostly for training
               | purpose too). The efficiency improvement on inference
               | IMHO is overblown.
        
           | rahimnathwani wrote:
           | If you place experts in different GPUs
           | 
           | Right, this is described in the Deepseek V3 paper (section
           | 3.4 on pages 18-20).
        
       | metadat wrote:
       | _> Another very smart thing they did is to use what is known as a
       | Mixture-of-Experts (MOE) Transformer architecture, but with key
       | innovations around load balancing. As you might know, the size or
       | capacity of an AI model is often measured in terms of the number
       | of parameters the model contains. A parameter is just a number
       | that stores some attribute of the model; either the  "weight" or
       | importance a particular artificial neuron has relative to another
       | one, or the importance of a particular token depending on its
       | context (in the "attention mechanism")._
       | 
       | Has a wide-scale model analysis been performed inspecting the
       | parameters and their weights for all popular open / available
       | models yet? The impact and effects of disclosed inbound data and
       | tuning parameters on individual vector tokens will prove highly
       | informative and clarifying.
       | 
       | Such analysis will undoubtedly help semi-literate AI folks level
       | up and bridge any gaps.
        
       | naveen99 wrote:
       | Deepseek iOS app makes TikTok ban pointless.
        
         | pavelstoev wrote:
         | Interesting take. They are now reading our minds vs looking at
         | our kids and interiors.
        
           | naveen99 wrote:
           | yeah, what's stopping zoom from integrating Deepseek and
           | doing an end run around Microsoft teams.
        
       | lxgr wrote:
       | Man, do I love myself a deep, well-researched long-form
       | contrarian analysis published as a tangent of an already niche
       | blog on a Sunday evening! The old web isn't dead yet :)
        
         | eigenvalue wrote:
         | Hah thanks, that's my favorite piece of feedback yet on this.
        
       | pavelstoev wrote:
       | English economist William Stanley Jevons vs the author of the
       | article.
       | 
       | Will NVIDIA be in trouble because of DSR1 ? Interpreting Jevon's
       | effect, if LLMs are "steam engines" and DSR1 brings 90%
       | efficiency improvement for the same performance, more of it will
       | be deployed. This is not considering the increase due to <think>
       | tokens.
       | 
       | More NVIDIA GPUs will be sold to support growing use cases of
       | more efficient LLMs.
        
       | breadwinner wrote:
       | Part of the reason Musk, Zuckerberg, Ellison, Nadella and other
       | CEOs are bragging about the number of GPUs they have (or plan to
       | have) is to attract talent.
       | 
       | Perplexity CEO says he tried to hire an AI researcher from Meta,
       | and was told to 'come back to me when you have 10,000 H100 GPUs'
       | 
       | See https://www.businessinsider.nl/ceo-says-he-tried-to-hire-
       | an-...
        
         | mrbungie wrote:
         | Maybe DeepSeek ain't it, but I expect a big "box of scraps"[1]
         | moment soon. Constraint is mother of invention, and they are
         | evading constraints with a promise of never-ending scale.
         | 
         | [1] https://youtu.be/9foB2z_OVHc?si=eZSTMMGYEB3Nb4zI
        
         | rat9988 wrote:
         | That's a weird way to read into it.
        
         | TwoFerMaggie wrote:
         | This reminds of the joke in physics, in which theoretical
         | particle physicists told experimental physicists, over and over
         | again, "trust me bro, standard model will be proved at 10x eV,
         | we just need a bigger collider bro" after another world's
         | biggest collider is built.
         | 
         | Wondering if we are in a similar position with "trust me bro
         | AGI will be achieved with 10x more GPUs".
        
           | vonneumannstan wrote:
           | The difference is the AI researchers have clear plots showing
           | capabilities scaling with GPUs and there's not a sign that it
           | is flattening so they actually have a case for saying that
           | AGI is possible at N GPUs.
        
             | segasaturn wrote:
             | Sauce? How do you even measure "capabilities" in that
             | regard, just writing answers to standard tests? Because
             | being able to ace a test doesn't mean it's AGI, it means
             | its good at taking standard tests.
        
               | vonneumannstan wrote:
               | This is the canonical paper. Nothing I've seen seems to
               | indicate the curves are flattening, you can ask "scaling
               | what" but the trend is clear.
               | 
               | https://arxiv.org/pdf/2001.08361
        
       | jms55 wrote:
       | Great article, thanks for writing it! Really great summary of the
       | current state of the AI industry for someone like me who's
       | outside of it (but tangential, given that I work with GPUs for
       | graphics).
       | 
       | The one thing from the article that sticks out to me is that the
       | author/people are assuming that deepseek needing 1/45th the
       | amount of hardware means that the other 44/45ths large tech
       | companies have invested were wasteful.
       | 
       | Does software not scale to meet hardware? I don't see this as
       | 44/45ths wasted hardware, but as a free increase in the amount of
       | hardware people have. Software needing less hardware means you
       | can run even _more_ software without spending more money, not
       | that you need less hardware, right? (for the top-end, non-
       | embedded use cases).
       | 
       | ---
       | 
       | As an aside, the state of the "AI" industry really freaks me out
       | sometimes. Ignoring any sort of short or long term effects on
       | society, jobs, people, etc, just the sheer amount of money and
       | time invested into this one thing is, insane?
       | 
       | Tons of custom processing chips, interconnects, compilers,
       | algorithms, _press releases!_, etc all for one specific field.
       | It's like someone taking the last decade of advances in
       | computers, software, etc, and shoving it in the space of a year.
       | For comparison, Rust 1.0 is 10 years old - I vividly remember the
       | release. And even then it took years to propagate out as a
       | "thing" that people were interested in and invested significant
       | time into. Meanwhile deepseek releases a new model (complete with
       | a customer-facing product name and chat interface, instead of
       | something boring and technical), and in 5 days it's being
       | replicated (to at least some degree) and copied by competitors.
       | Google, Apple, Microsoft, etc are all making custom chips and
       | investing insane amounts of money into different compilers,
       | programming languages, hardware, and research.
       | 
       | It's just, kind of disquieting? Like everyone involved in AI
       | lives in another world operating at breakneck speed, with
       | billions of dollars involved, and the rest of us are just
       | watching from the sidelines. Most of it (LLMs specifically) is no
       | longer exciting to me. It's like, what's the point of spending
       | time on a non-AI related project? We can spend some time writing
       | a nice API and working on a cool feature or making a UI prettier
       | and that's great, and maybe with a good amount of contributors
       | and solid, sustained effort, we can make a cool project that's
       | useful and people enjoy, and earns money to support people if
       | it's commercial. But then for AI, github repos with shiny well-
       | written readmes pop up overnight, tons of text is being written,
       | thought, effort, and billions of dollars get burned or speculated
       | on in an instant on new things, as soon as the next marketing
       | release is posted.
       | 
       | How can the next advancement in graphics, databases,
       | cryptography, etc compete with the sheer amount of societal
       | attention AI receives?
       | 
       | Where does that leave writing software for the rest of us?
        
       | mgraczyk wrote:
       | The beginning of the article was good, but the analysis of
       | DeepSeek and what it means for Nvidia is confused and clearly out
       | of the loop.                 * People have been training models
       | at <fp32 precision for many years, I did this in 2021 and it was
       | already easy in all the major libraries.       * GPU FLOPs are
       | used for many things besides training the final released model.
       | * Demand for AI is capacity limited, so it's possible and likely
       | that increasing AI/FLOP would not substantially reduce the price
       | of GPUs
        
         | lysecret wrote:
         | Where do you have this "capacity" limit from? I can get as many
         | H100s from GCP or wherever as I wish, the only thing that is
         | capacity limited are 100k clusters ala ELON+X, but what
         | DeepSeek (and the recent evidence of a limit in pure base-model
         | scaling) shows is that this might actually not be profitable,
         | and we end up with much smaller base models scaled at inference
         | time. The moat for Nvidia in this inference time scaling is
         | much smaller, also you don't need the humongous clusters for
         | that either you can just distribute the inference (and in the
         | future run it locally too).
        
           | mgraczyk wrote:
           | What's your GPU quota in GCP? How did you get it increased
           | that much?
        
           | saagarjha wrote:
           | Asking GCP to give you H100s on-demand is nowhere near cost
           | efficient.
        
         | aorloff wrote:
         | His DeepSeek argument was essentially that experts who look at
         | the economics of running these teams (eg. ha ha the engineers
         | themselves might dabble) are looking over the hedge at
         | DeepSeek's claims and they are really awestruck
        
       | mkalygin wrote:
       | This is such a comprehensive analysis, thank you. For someone
       | just starting to learn about the field, it's a great way to
       | understand what's going on in the industry.
        
       | miraculixx wrote:
       | If we are to get to AGI why do we need to train on all data?
       | That's silly, and all we get is compression and probabliatic
       | retrieval.
       | 
       | Intelligence by definition is not compression, but ability to
       | think and act according to new data, based on experience.
       | 
       | Trully AGI models will work on the this principle, not on best
       | compression of as much data as possible.
       | 
       | We need a new approach.
        
         | eigenvalue wrote:
         | Actually, compression is an incredibly good way to think about
         | intelligence. If you understand something really well then you
         | can compress it a lot. If you can compress most of human
         | knowledge effectively without much reconstruction error while
         | shrinking it down by 99.5%, then you must have in the process
         | arrived at a coherent and essentially correct world model,
         | which is the basis of effective cognition.
        
         | chpatrick wrote:
         | "If you can't explain it to a six year old, you don't
         | understand it yourself." -> "If you can compress knowledge, you
         | understand it."
        
         | AnotherGoodName wrote:
         | Fwiw there's highly cited papers that literally map AGI to
         | compression. As in they map to the same thing and people write
         | papers on this fact that are widely respected. Basically a
         | prediction engine can be used to make a compression tool and an
         | AI equally.
         | 
         | The tldr; if given inputs and a system that can accurately
         | predict the next sequence you can either compress that data
         | using that prediction (arithmetic coding) or you can take
         | actions based on that prediction to achieve an end goal mapping
         | predictions of new inputs to possible outcomes and then taking
         | the path to a goal (AGI). They boil down to one and the same.
         | So it's weird to have someone state they are not the same when
         | it's widely accepted they absolutely are.
        
       | jwan584 wrote:
       | The point about using FP32 for training is wrong. Mixed precision
       | (FP16 multiplies, FP32 accumulates) has been use for years - the
       | original paper came out in 2017.
        
         | eigenvalue wrote:
         | Fair enough, but that still uses a lot more memory during
         | training than what DeepSeek is doing.
        
       | suraci wrote:
       | DeepSeek is not the black swan
       | 
       | NVDA was overpriced a lot already even without r1, the market is
       | full of air GPUs hiding in the capex of tech giants like MSFT.
       | 
       | If orders are canceled or delivery fails for any reason, NVDA's
       | EPS would be pulled back to its fundamentally justified level
       | 
       | or if all those air GPUs are produced and delivered in recent
       | years, and the demand keeps rising? well, that will be a crazy
       | world then
       | 
       | it's a finance game, not related with the real world
        
       | naiv wrote:
       | I used to own several adult companies in the past. Incredible
       | huge margins and then along came Pornhub and we could barely
       | survive after it as we did not adapt.
       | 
       | With Deepseek this is now the 'Pornhub of AI' moment. Adapt or
       | die.
        
         | logicchains wrote:
         | Curious what Pornhub did better, if you're able to say. Provide
         | content at much lower cost, like DeepSeek?
        
           | naiv wrote:
           | Yes. close to free content
           | 
           | They understood the Dmca brilliantly so they did bulk cheap
           | content purchases and hid behind the Dmca for all non
           | licensed content which was "uploaded by users". They did bulk
           | purchases of cheap content from some studios but that was
           | just a fraction
           | 
           | Of course their risk of going advertise revenue only was high
           | and in the beginning mostly only cam providers would
           | advertise
           | 
           | Our problem was that we had contracts and close relationships
           | with all the big studios so going the Dmca route would have
           | severed these ties for an unknown risk. In hindsight not
           | creating a company which did abuse the Dmca was the right
           | decision. I am very loyal and it would have felt like
           | cheating
           | 
           | Now it's a different story after the credit card shake down
           | when they had to remove millions of videos and be able to
           | provide 2257 documentation for each video
        
         | nejsjsjsbsb wrote:
         | That analogy would be if right if a startup could dredge beach
         | sand and pump put trillions of AI chips.
         | 
         | What actually happened was a better algorithm was created and
         | people are betting against the main game in town for running
         | said algorithm.
         | 
         | If someone came up with a CPU-superior AI that'd be worrying
         | for NVidia.
        
           | naiv wrote:
           | Groq lpu interference chip?
        
             | nejsjsjsbsb wrote:
             | You heard my 26khz whistle!
        
       | homarp wrote:
       | see also https://news.ycombinator.com/item?id=42839650
        
       | chvid wrote:
       | For sure NVIDIA is priced for perfection perhaps more than any of
       | the other of similar market value.
       | 
       | I think two threats are the biggest:
       | 
       | First Apple. TSMC's largest customer. They are already making
       | their own GPUs for their data centers. If they were to sell these
       | to others they would be a major competitor.
       | 
       | You would have the same GPU stack on your on phone, laptop, pc,
       | and data center. Already big developer mind share. Also useful in
       | a world where LLMs run (in part) on the end user's local machine
       | (like Apple Intelligence).
       | 
       | Second is China - Huawei, Deepseek etc.
       | 
       | Yes - there will be no GPUs from Huawei in the US in this decade.
       | And the Chinese won't win in a big massive battle. Rather it is
       | going to be death by a thousand cuts.
       | 
       | Just as what happened with the Huawei Mate 60. It is only sold in
       | China but today Apple is loosing business big time in China.
       | 
       | In the same manner OpenAi and Microsoft will have their business
       | hurt by Deepseek even if Deepseek was completely banned in the
       | west.
       | 
       | Likely we will see news on Chinese AI accelerators this year and
       | I wouldn't be surprised if we soon saw Chinese hyperscalars
       | offering cheaper GPU cloud compute than the west due to a
       | combination of cheaper energy, labor cost, and sheer scale.
       | 
       | Lastly AMD is no threat to NVIDIA as they are far behind and
       | follow the same path with little way of differentiating
       | themselves.
        
       | Giorgi wrote:
       | Looks like huge astroturfing effort from CCP. I am seeing these
       | coordinated propaganda inside every AI related sub on reddit, on
       | social media and now - here.
        
         | chasd00 wrote:
         | Yeah I get that feeling too. Lots of old school astroturfing
         | going on.
        
       | dartos wrote:
       | This just in.
       | 
       | Competition lowers the value of monopolies.
        
       | manojlds wrote:
       | >With the advent of the revolutionary Chain-of-Thought ("COT")
       | models introduced in the past year, most noticeably in OpenAI's
       | flagship O1 model (but very recently in DeepSeek's new R1 model,
       | which we will talk about later in much more detail), all that
       | changed. Instead of the amount of inference compute being
       | directly proportional to the length of the output text generated
       | by the model (scaling up for larger context windows, model size,
       | etc.), these new COT models also generate intermediate "logic
       | tokens"; think of this as a sort of scratchpad or "internal
       | monologue" of the model while it's trying to solve your problem
       | or complete its assigned task.
       | 
       | Is this right? I thought CoT was a prompting method and are we
       | calling the reasoning models as CoT models?
        
         | veesahni wrote:
         | Reasoning models are a result of the learnings from CoT
         | prompting.
        
           | s1mplicissimus wrote:
           | I'm curious what are the key differences between "a reasoning
           | model" and good old CoT prompting. Is there any reason to
           | believe that the fundamental limitations of prompting don't
           | apply to "reasoning models"? (hallucinations, plainly wrong
           | output, bias towards to training data mean etc.)
        
             | itchyjunk wrote:
             | The level of sophistication for CoT model varies. "good old
             | CoT prompting" is you hoping the model generates some
             | reasoning tokens prior to the final answer. When it did,
             | the answers tended to be better for certain class of
             | problems. But you had no control over what type of
             | reasoning tokes it was generating. There were hypothesis
             | that just having a <pause> tokens in between generated
             | better answers as it allowed n+1 steps to generate an
             | answer over n. I would consider Meta's "continuous chain of
             | thought" to be on the other end of "good old CoT prompting"
             | where they are passing back the next tokens from the latent
             | space back to the model getting a "BHF" like effect. Who
             | knows what's happening with O3 and Anthropics O3 like
             | models.. The problems you mentioned is very broad and not
             | limited to prompting. Reasoning models tend to outperform
             | older models on math problems. So I'd assume it does reduce
             | hallucination on certain class of problems.
        
       | kimbler wrote:
       | Nvidia seem to be one step ahead of this and you can see their
       | platform efforts are pushing towards creating large volumes of
       | compute that are easy to manage for whatever your compute
       | requirements are, be that training, inference or whatever comes
       | next and whatever form. People are maybe tackling some of these
       | areas in isolation but you do not want to build datacenters where
       | everything is ringfenced per task or usage.
        
       | colinnordin wrote:
       | Great article.
       | 
       | > _Now, you still want to train the best model you can by
       | cleverly leveraging as much compute as you can and as many
       | trillion tokens of high quality training data as possible, but
       | that 's just the beginning of the story in this new world; now,
       | you could easily use incredibly huge amounts of compute just to
       | do inference from these models at a very high level of confidence
       | or when trying to solve extremely tough problems that require
       | "genius level" reasoning to avoid all the potential pitfalls that
       | would lead a regular LLM astray._
       | 
       | I think this is the most interesting part. We always knew a huge
       | fraction of the compute would be on inference rather than
       | training, but it feels like the newest developments is pushing
       | this even further towards inference.
       | 
       | Combine that with the fact that you can run the full R1 (680B)
       | distributed on 3 consumer computers [1].
       | 
       | If most of NVIDIAs moat is in being able to efficiently
       | interconnect thousands of GPUs, what happens when that is only
       | important to a small fraction of the overall AI compute?
       | 
       | [1]: https://x.com/awnihannun/status/1883276535643455790
        
         | tomrod wrote:
         | Conversely, how much larger can you scale if frontier models
         | only currently need 3 consumer computers?
         | 
         | Imagine having 300. Could you build even better models? Is
         | DeepSeek the right team to deliver that, or can OpenAI, Meta,
         | HF, etc. adapt?
         | 
         | Going to be an interesting few months on the market. I think
         | OpenAI lost a LOT in the board fiasco. I am bullish on HF. I
         | anticipate Meta will lose folks to brain drain in response to
         | management equivocation around company values. I don't put much
         | stock into Google or Microsoft's AI capabilities, they are the
         | new IBMs and are no longer innovating except at obvious
         | margins.
        
           | danaris wrote:
           | This assumes no (or very small) diminishing returns effect.
           | 
           | I don't pretend to know much about the minutiae of LLM
           | training, but it wouldn't surprise me at all if throwing
           | massively more GPUs at this particular training paradigm only
           | produces marginal increases in output quality.
        
             | tomrod wrote:
             | I believe the margin to expand is on CoT, where tokens can
             | grow dramatically. If there is value in putting more
             | compute towards it, there may still be returns to be
             | captured on that margin.
        
           | stormfather wrote:
           | Google is silently catching up fast with Gemini. They're also
           | pursuing next gen architectures like Titan. But most
           | importantly, the frontier of AI capabilities is shifting
           | towards using RL at inference (thinking) time to perform
           | tasks. Who has more data than Google there? They have a
           | gargantuan database of queries paired with subsequent web
           | nav, actions, follow up queries etc. Nobody can recreate
           | this, Bing failed to get enough marketshare. Also, when you
           | think of RL talent, which company comes to mind? I think
           | Google has everyone checkmated already.
        
             | moffkalast wrote:
             | Never underestimate Google's ability to fall flat on their
             | face when it comes to shipping products.
        
             | _DeadFred_ wrote:
             | How quickly the narrative went from 'Google silently has
             | the most advanced AI but they are afraid to release it' to
             | 'Google is silently catching up' all using the same 'core
             | Google competencies' to infer Google's position of
             | strength. Wonder what the next lower level of Google
             | silently leveraging their strength will be?
        
               | stormfather wrote:
               | Google is clearly catching up. Have you tried the recent
               | Gemini models? Have you tried deep research? Google is
               | like a ship that is hard to turn around but also hard to
               | stop once in motion.
        
             | shwaj wrote:
             | Can you say more about using RL at inference time, ideally
             | with a pointer to read more about it? This doesn't fit into
             | my mental model, in a couple of ways. The main way is right
             | in the name: "learning" isn't something that happens at
             | inference time; inference is generating results from
             | already-trained models. Perhaps you're conflating RL with
             | multistage (e.g. "chain of thought") inference? Or maybe
             | you're talking about feeding the result of inference-time
             | interactions with the user back into subsequent rounds of
             | training? I'm curious to hear more.
        
               | stormfather wrote:
               | I wasn't clear. Model weights aren't changing at
               | inference time. I meant at inference time the model will
               | output a sequence of thoughts and actions to perform
               | tasks given to it by the user. For instance, to answer a
               | question it will search the web, navigate through some
               | sites, scroll, summarize, etc. You can model this as a
               | game played by emitting a sequence of actions in a
               | browser. RL is the technique you want to train this
               | component. To scale this up you need to have a massive
               | amount of examples of sequences of actions taken in the
               | browser, the outcome it led to, and a label for if that
               | outcome was desirable or not. I am saying that by
               | recording users googling stuff and emailing each other
               | for decades Google has this massive dataset to train
               | their RL powered browser using agent. Deepseek proving
               | that simple RL ca be cheaply applied to a frontier LLM
               | and have reasoning organically emerge makes this approach
               | more obviously viable.
        
           | onlyrealcuzzo wrote:
           | If you watch this video, it explains well what the major
           | difference is between DeepSeek and existing LLMs:
           | https://www.youtube.com/watch?v=DCqqCLlsIBU
           | 
           | It seems like there is MUCH to gain by migrating to this
           | approach - and it _theoretically_ should not cost more to
           | switch to that approach than vs the rewards to reap.
           | 
           | I expect all the major players are already working full-steam
           | to incorporate this into their stacks as quickly as possible.
           | 
           | IMO, this seems incredibly bad to Nvidia, and incredibly good
           | to everyone else.
           | 
           | I don't think this seems particularly bad for ChatGPT.
           | They've built a strong brand. This should just help them
           | reduce - by far - one of their largest expenses.
           | 
           | They'll have a slight disadvantage to say Google - who can
           | much more easily switch from GPU to CPU. ChatGPT _could_ have
           | some growing pains there. Google would not.
        
             | wolfhumble wrote:
             | > I don't think this seems particularly bad for ChatGPT.
             | They've built a strong brand. This should just help them
             | reduce - by far - one of their largest expenses.
             | 
             | Often expenses like that are keeping your competitors away.
        
               | onlyrealcuzzo wrote:
               | Yes, but it typically doesn't matter if someone can reach
               | parity or even surpass you - they have to surpass you by
               | a step function to take a significant number of your
               | users.
               | 
               | This is a step function in terms of efficiency (which
               | presumably will be incorporated into ChatGPT within
               | months), but not in terms of end user experience. It's
               | only slightly better there.
        
               | ReptileMan wrote:
               | One data point but my subscription for chatgpt is
               | cancelled every time. So I made every month decision to
               | resub. And because the cost of switching is essentially
               | zero - the moment a better service is up there I will
               | switch in an instant.
        
               | onlyrealcuzzo wrote:
               | There are obviously people like you, but I hope you
               | realize this is not the typical user.
        
             | tomrod wrote:
             | That is a fantastic video, BTW.
        
           | simpaticoder wrote:
           | _> Imagine having 300._
           | 
           | Would it not be useful to have multiple independent AIs
           | observing and interacting to build a model of the world? I'm
           | thinking something roughly like the "councelors" in the
           | Civilization games, giving defense/economic/cultural advice,
           | but generalized over any goal-oriented scenario (and
           | including one to take the "user" role). A group of AIs with
           | specific roles interacting with each other seems like a good
           | area to explore, especially now given the downward
           | scalability of LLMs.
        
             | tomrod wrote:
             | Yes; to my understanding that is MoE.
        
             | JoshTko wrote:
             | This is exactly where Deepseeks enhancements come into
             | play. Essentially deepseek lets the model think out loud
             | via chain of thought (o1 and Claude also do this) but DS
             | also does not supervise the chain of thought, and simply
             | rewards CoT that get the answer correctly. This is just one
             | of the half dozen training optimization that Deepseek has
             | come up with.
        
         | neuronic wrote:
         | > NVIDIAs moat
         | 
         | Offtopic, but your comment finally pushed me over the edge to
         | semantic satiation [1] regarding the word "moat". It is
         | incredible how this word turned up a short while ago and now it
         | seems to be a key ingredient of every second comment.
         | 
         | [1] https://en.wikipedia.org/wiki/Semantic_satiation
        
           | mikestew wrote:
           | _It is incredible how this word turned up a short while
           | ago..._
           | 
           | I'm sure if I looked, I could find quotes from Warren Buffet
           | (the recognized originator of the term) going back a few
           | decades. But your point stands.
        
             | mikeyouse wrote:
             | Yeah, he's been talking about "economic moats" since at
             | least the 1990s. At least since 1995;
             | 
             | https://www.berkshirehathaway.com/letters/1995.html
        
               | pillefitz wrote:
               | Nobody claimed it's a new word. Still, the frequency
               | increased 100x over the last days, subjectively speaking.
        
             | kccqzy wrote:
             | The earliest occurrence of the word "moat" that I could
             | find online from Buffett is from 1986:
             | https://www.berkshirehathaway.com/letters/1986.html That
             | shareholder letter is charmingly old-school.
             | 
             | Unfortunately letters before 1977 weren't available online
             | so I wasn't able to search.
             | 
             | It also helps that I've been to several cities with an
             | actual moat so this word is familiar to me.
        
           | fastasucan wrote:
           | The word moat was first used in english in the 15th century
           | https://www.merriam-webster.com/dictionary/moat
        
           | ljw1004 wrote:
           | I'm struggling to understand how a moat can have a CRACK in
           | it.
        
           | cwmoore wrote:
           | https://en.wikipedia.org/wiki/Frequency_illusion
        
         | bn-l wrote:
         | Link has all the params but running at 4 bit quant.
        
         | tw1984 wrote:
         | > If most of NVIDIAs moat is in being able to efficiently
         | interconnect thousands of GPUs
         | 
         | nah. it moat is CUDA and millions of devs using CUDA aka the
         | ecosystem
        
           | mupuff1234 wrote:
           | But if it's not combined with super high end chips with
           | massive margins that moat is not worth anywhere close to 3T
           | USD.
        
           | ReptileMan wrote:
           | And then some chineese startup create an amazing compiler
           | that takes cuda and moves it to X (AMD, Intel, Asic) and we
           | are back at square one.
           | 
           | So far it seems that the best investment is in RAM producers.
           | Unlike compute the ram requirements seem to be stubborn.
        
             | 01100011 wrote:
             | Don't forget that "CUDA" involves more than language
             | constructs and programming paradigms.
             | 
             | With NVDA, you get tools to deploy at scale, maximize
             | utilization, debug errors and perf issues, share HW between
             | workflows, etc. These things are not cheap to develop.
        
               | Symmetry wrote:
               | It might not be cheap to develop them but if you can save
               | $10B in hardware costs by doing so you're probably
               | looking at positive ROI.
        
         | a_wild_dandan wrote:
         | Running a 680-billion parameter frontier model on a few Macs
         | (at 13 tok/s!) is nuts. That'a _two years_ after ChatGPT was
         | released. That rate of progress just blows my mind.
        
       | brandonpelfrey wrote:
       | Great article. I still feel like very few people are viewing the
       | Deepseek effects in the right light. If we are 10x more efficient
       | it's not that we use 1/10th the resources we did before, we
       | expand to have 10x the usage we did before. All technology
       | products have moved this direction. Where there is capacity, we
       | will use it. This argument would not work if we were close to AGI
       | or something and didn't need more, but I don't think we're
       | actually close to that at all.
        
         | VHRanger wrote:
         | Correct. This effect is known in economics since forever - new
         | technology has
         | 
         | - An "income effect". You use the thing more because it's
         | cheaper - new usecases come up
         | 
         | - A "substitution effect." You use other things more because of
         | the savings.
         | 
         | I got into this on labor economics here [1] - you have
         | counterintuitive examples with ATMs actually increasing the
         | number of bank branches for several decades.
         | 
         | [1]: https://singlelunch.com/2019/10/21/the-economic-effects-
         | of-a...
        
         | neuronic wrote:
         | Would this not mean we need much much more training data to
         | fully utilize the now "free" capacities?
        
           | vonneumannstan wrote:
           | It's pretty clear that the reasoning models are using mass
           | amounts of synthetic data so it's not a bottleneck.
        
         | jnwatson wrote:
         | This is called Jevons Paradox.
         | 
         | https://en.wikipedia.org/wiki/Jevons_paradox.
        
         | aurareturn wrote:
         | Yep. I've been harping on this. DeepSeek is bullish for Nvidia.
        
           | ReptileMan wrote:
           | >DeepSeek is bullish for Nvidia.
           | 
           | DeepSeek is bullish for the semiconductor industry as a
           | whole. Whether it is for Nvidia remains to be seen. Intel was
           | in Nvidia position in 2007 and they didn't want to trade
           | margins for volumes in the phone market. And there they are
           | today.
        
             | aurareturn wrote:
             | Why wouldn't it be for Nvidia? Explain more.
        
         | mvdtnz wrote:
         | Great, now I can rewrite 10x more emails or solve 10x more
         | graduate level programming tasks (mostly incorrectly). Brave
         | new world.
        
       | p0w3n3d wrote:
       | > which require low-latency responses, such as content
       | moderation, fraud detection, _dynamic pricing_ , etc.
       | 
       | Is it even legal to give different prices to different customers?
        
         | jnwatson wrote:
         | Of course it is. That how the airlines stay in business.
        
           | p0w3n3d wrote:
           | However imagine entering a store where the camera looks up
           | your face in shared database and profiles you as a person who
           | will pay higher prices - and the prices are displayed near
           | you according to your profile...
        
         | esafak wrote:
         | It depends on what basis. You can't discriminate based on
         | protected classes.
        
       | typeofhuman wrote:
       | I'm rooting for DeepSeek (or any competitor) against OpenAI
       | because I don't like Sam Altman. I'm confident in admitting it.
        
         | 1970-01-01 wrote:
         | The enemy of your enemy is only temporarily your friend.
        
           | typeofhuman wrote:
           | Wise words from the epoch of time.
        
           | TypingOutBugs wrote:
           | As a European I really don't see the difference between US
           | and Chinese tech right now - the last week from Trump has
           | made me feel more threatened from the US than I ever have
           | been by China (Greenland, living in a Nordic country with
           | treaties to defend it).
           | 
           | I appreciate China has censorship, but the US is going that
           | way too (recent "issues" for search terms). Might be
           | different scales now, but I think it'll happen. I don't care
           | as much if a Chinese company wins the LLM space than I did
           | last year.
        
           | rwoerz wrote:
           | Indeed! Just ask DeepSeek something about Tiananmen or
           | Taiwan. Answering seems to be an absolute "no-brainer" for
           | it.
        
       | liuliu wrote:
       | This is a humble and informed acrticle (comparing to others
       | written by financial analysts the past a few days). But still
       | have the flaw of over-estimating efficiency of deploying a 687B
       | MoE model on commodity hardware (to use locally, cloud providers
       | will do efficient batching and it is different): you cannot do
       | that on any single Apple hardware (need to at least hook up 2 M2
       | Ultra). You can barely deploy that on desktop computers just
       | because non-register DDR5 can have 64GiB per stick (so you are
       | safe with 512 RAM). Now coming to PCIe bandwidth: 37B per token
       | activation means exactly that, each activation requires new set
       | of 37B weights, so you need to transfer 18GiB per token into VRAM
       | (assuming 4-bit quant). PCIe 5 (5090) have 64GB/s transfer speed
       | so your upper bound is limited to 4 tok/s with a well balanced
       | propose built PC (and custom software). For programming tasks
       | that usually requires ~3000 tokens for thinking, we are looking
       | at 12 mins per interaction.
        
         | lvass wrote:
         | Is it really 37B different parameters for each token? Even with
         | the "multi-token prediction system" that the article mentions?
        
           | liuliu wrote:
           | I don't think anyone uses MTP for inference right now. Even
           | if you use MTP for drafting, you need to batching in the next
           | round to "verify" it is the right token, if that happens you
           | need to activate more experts.
           | 
           | DELETED: If you don't use MTP for drafting, and use MTP to
           | skip generations, sure. But you also need to evaluate your
           | use case to make sure you don't get penalized for doing that.
           | Their evaluation in the paper don't use MTP for generation.
           | 
           | EDIT: Actually, you cannot use MTP other than drafting
           | because you need to fill in these KV caches. So, during
           | generation, you cannot save your compute with MTP (you save
           | memory bandwidth, but this is more complicated for MoE model
           | due to more activated experts).
        
       | pjdesno wrote:
       | The description of DeepSeek reminds me of my experience in
       | networking in the late 80s - early 90s.
       | 
       | Back then a really big motivator for Asynchronous Transfer Mode
       | (ATM) and fiber-to-the-home was the promise of video on demand,
       | which was a huge market in comparison to the Internet of the day.
       | Just about all the work in this area ignored the potential of
       | advanced video coding algorithms, and assumed that broadcast TV-
       | quality video would require about 50x more bandwidth than today's
       | SD Netflix videos, and 6x more than 4K.
       | 
       | What made video on the Internet possible wasn't a faster
       | Internet, although the 10-20x increase every decade certainly
       | helped - it was smarter algorithms that used orders of magnitude
       | less bandwidth. In the case of AI, GPUs keep getting faster, but
       | it's going to take a hell of a long time to achieve a 10x
       | improvement in performance per cm^2 of silicon. Vastly improved
       | training/inference algorithms may or may not be possible
       | (DeepSeek seems to indicate the answer is "may") but there's no
       | physical limit preventing them from being discovered, and the
       | disruption when someone invents a new algorithm can be nearly
       | immediate.
        
         | TMWNN wrote:
         | >but there's no physical limit preventing them from being
         | discovered, and the disruption when someone invents a new
         | algorithm can be nearly immediate.
         | 
         | The rise of the net is Jevons paradox fulfilled. The orders of
         | magnitude less bandwidth needed per cat video drove much more
         | than that in overall growth in demand for said videos. During
         | the dotcom bubble's collapse, bandwidth use kept going up.
         | 
         | Even if there is a near-term bear case for NVDA (dotcom
         | bubble/bust), history indicates a bull case for the sector
         | overall and related investments such as utilities (the entire
         | history of the tech sector from 1995 to today).
        
         | accra4rx wrote:
         | Love those analogies . This is one of main reason I love hacker
         | news / reddit . Honest golden experiences
        
         | AlanYx wrote:
         | Another aspect that reinforces your point is that the ATM push
         | (and subsequent downfall) was not just bandwidth-motivated but
         | also motivated by a belief that ATM's QoS guarantees were
         | necessary. But it turned out that software improvements,
         | notably MPLS to handle QoS, were all that was needed.
        
           | pjdesno wrote:
           | Nah, it's mostly just buffering :-)
           | 
           | Plus the cell phone industry paved the way for VOIP by
           | getting everyone used to really, really crappy voice quality.
           | Generations of Bell Labs and Bellcore engineers would rather
           | have resigned than be subjected to what's considered
           | acceptable voice quality nowadays...
        
             | hedgehog wrote:
             | Yes, I think most video on the Internet is HLS and similar
             | approaches which are about as far from the ATM circuit-
             | switching approach as it gets. For those unfamiliar HLS is
             | pretty much breaking the video into chunks to download over
             | plain HTTP.
        
             | nyarlathotep_ wrote:
             | >> Plus the cell phone industry paved the way for VOIP by
             | getting everyone used to really, really crappy voice
             | quality
             | 
             | What accounts for this difference? Is there something
             | inherently worse about the nature of cell phone
             | infrastructure over land-line use?
             | 
             | I'm totally naive on such subjects.
             | 
             | I'm just old enough to remember landlines being widespread,
             | but nearly all of my phone calls have been via cell since
             | the mid 00s, so I can't judge quality differences given the
             | time that's passed.
        
               | hnuser123456 wrote:
               | Because at some point, someone decided that 8 kbps makes
               | for an acceptable audio stream per subscriber. And at
               | first, the novelty of being able to call anyone anywhere,
               | even with this awful quality, was novel enough that
               | people would accept it. And most people did until the
               | carriers decided they could allocate a little more with
               | VoLTE, if it works on your phone in your area.
        
               | ipdashc wrote:
               | > Because at some point, someone decided that 8 kbps
               | makes for an acceptable audio stream per subscriber.
               | 
               | Has it not been like this for a very long time? I was
               | under the impression that "voice frequency" being defined
               | as up to 4 kHz was a very old standard - after all,
               | (long-distance) phone calls have always been multiplexed
               | through coaxial or microwave links. And it follows that
               | 8kbps is all you need to losslessly digitally sample
               | that.
               | 
               | I assumed it was jitter and such that lead to lower
               | quality of VoIP/cellular, but that's a total guess. Along
               | with maybe compression algorithms that try to squeeze the
               | stream even tighter than 8kbps? But I wouldn't have
               | figured it was the 8kHz sample rate at fault, right?
        
               | hnuser123456 wrote:
               | Sure, if you stop after "nobody's vocal coords make
               | noises above 4khz in normal conversation", but the
               | rumbling of the vocal coords isn't the entire audio data
               | which is present in-person. Clicks of the tongue and
               | smacking of the lips make much higher frequencies, and
               | higher sample rates capture the timbre/shape of the
               | soundwave instead of rounding it down to a smooth sine
               | wave. Discord defaults to 64kbps, but you can push it up
               | to 96kbps or 128kbps with nitro membership, and it's not
               | hard to hear an improvement with the higher bitrates. And
               | if you've ever used bluetooth audio, you know the
               | difference in quality between the bidirectional call
               | profile, and the unidirectional music profile, and wished
               | to have the bandwidth of the music profile with the low
               | latency of the call profile.
        
             | WalterBright wrote:
             | I've noticed this when talking on the phone with someone
             | with a significant accent.
             | 
             | 1. it takes considerable work on my part to understand it
             | on a cell phone
             | 
             | 2. it's much easier on POTS
             | 
             | 3. it's not a problem on VOIP
             | 
             | 4. no issues in person
             | 
             | With all the amazing advances in cell phones, the voice
             | quality of cellular is stuck in the 90's.
        
               | bayindirh wrote:
               | I generally travel to Europe, and it baffles why I can't
               | use VoLTE there (maybe my roaming doesn't allow that),
               | and fallback to 3G for voice calls.
               | 
               | At home, I use VoLTE and the sound is almost impeccable,
               | very high quality, but in the places I roam to, what I
               | get is FM quality 3G sound.
               | 
               | It's not that cellular network is incapable of that sound
               | quality, but I don't get to experience it except my home
               | country. Interesting, indeed.
        
           | tlb wrote:
           | And memory. In the heyday of ATM (late 90s) a few megabytes
           | was quite expensive for a set-top box, so you couldn't buffer
           | many seconds of compressed video.
           | 
           | Also, the phone companies had a pathological aversion to
           | understanding Moore's law, because it suggested they'd have
           | to charge half as much for bandwidth every 18 months. Long
           | distance rates had gone down more like 50%/decade, and even
           | that was too fast.
        
         | aurareturn wrote:
         | Doesn't your point about video compression tech support
         | Nvidia's bull case?
         | 
         | Better video compression led to an explosion in video
         | consumption on the Internet, leading to much more revenue for
         | companies like Comcast, Google, T-Mobile, Verizon, etc.
         | 
         | More efficient LLMs lead to much more AI usage. Nvidia, TSMC,
         | etc will benefit.
        
           | vFunct wrote:
           | I agree that advancements like DeepSeek, like transformer
           | models before it, is just going to end up increasing demand.
           | 
           | It's very shortsighted to think we're going to need fewer
           | chips because the algorithms got better. The system became
           | more efficient, which causes induced demand.
        
           | snailmailstare wrote:
           | It improves TSMC' case.. Paying Nvidia would be like paying
           | Cray for every smartphone that is faster than a supercomputer
           | of old.
        
           | 9rx wrote:
           | Yes, over the long haul, probably. But as far as individual
           | investors go they might not like that Nvidia.
           | 
           | Anyone currently invested is presumably in because they like
           | the insanely high profit margin, and this is apt to quash
           | that. There is now much less reason to give your first born
           | to get your hands on their wares. Comcast, Google, T-Mobile,
           | Verizon, etc., and especially those not named Google, have
           | nothingburger margins in comparison.
           | 
           | If you are interested in what they can do with volume, then
           | there is still a lot of potential. They may even be more
           | profitable on that end than a margin play could ever hope
           | for. But that interest is probably not from the same person
           | who currently owns the stock, it being a change in territory,
           | and there is apt to be a lot of instability as stock changes
           | hands from the one group to the next.
        
           | onlyrealcuzzo wrote:
           | No - because this eliminates entirely or shifts the majority
           | of work from GPU to CPU - and Nvidia does not sell CPUs.
           | 
           | If the AI market gets 10x bigger, and GPU work gets 50%
           | smaller (which is still 5x larger than today) - but Nvidia is
           | priced on 40% growth for the next ten years (28x larger) -
           | there is a price mismatch.
           | 
           | It is _theoretically_ possible for a massive reduction in GPU
           | usage or shift from GPU to CPU to benefit Nvidia if that
           | causes the market to grow enough - but it seems unlikely.
           | 
           | Also, _I believe_ (someone please correct if wrong) DeepSeek
           | is claiming a 95% overall reduction in GPU usage compared to
           | traditional methods (not the 50% in the example above).
           | 
           | If true, that is a death knell for Nvidia's growth story
           | after the current contracts end.
        
             | munksbeer wrote:
             | I can see close to zero possibility that the majority of
             | the work will be shifted to the CPU. Anything a CPU can do
             | can just be done better with specialised GPU hardware.
        
               | lokar wrote:
               | People have been saying the exact same thing about other
               | workloads for years, and always been wrong. Mostly
               | claiming custom chips or FPGAs will beat out general
               | purpose CPUs.
        
               | Vegenoid wrote:
               | Then why do we have powerful CPUs instead of a bunch of
               | specialized hardware? It's because the value of a CPU is
               | in its versatility and ubiquity. If a CPU can do a thing
               | good enough, then most programs/computers will do that
               | thing on a CPU instead of having the increased complexity
               | and cost of a GPU, even if a GPU would do it better.
        
               | chrisco255 wrote:
               | We have both? Modern computing devices like smart phones
               | use SoCs with integrated GPUs. GPUs aren't really
               | specialized hardware, either, they are general purpose
               | hardware useful in many scenarios (built for graphics
               | originally but clearly useful in other domains including
               | AI).
        
               | ozten wrote:
               | > Anything a CPU can do can just be done better
               | 
               | Nope. Anything inheriantly serial is better off on the
               | CPU due to caching and it's architecture.
               | 
               | Many things that are highly parallizable are getting GPU
               | enabled. Games and ML are GPU by default, but many things
               | are migrating to CUDA.
               | 
               | You need both for cheap, high performance computing. They
               | are different workloads.
        
             | aurareturn wrote:
             | No - because this eliminates entirely or shifts the
             | majority of work from GPU to CPU - and Nvidia does not sell
             | CPUs.
             | 
             | I'm not even sure how to reply to this. GPUs are
             | fundamentally much more efficient for AI inference than
             | CPUs.
        
               | snailmailstare wrote:
               | I think SIMD is not so much better than SIMT for solved
               | problems as a level in claiming a problem as solved.
        
           | pjdesno wrote:
           | No, it doesn't.
           | 
           | Not only are 10-100x changes disruptive, but the players who
           | don't adopt them quickly are going to be the ones who
           | continue to buy huge amounts of hardware to pursue old
           | approaches, and it's hard for incumbent vendors to avoid
           | catering to their needs, up until it's too late.
           | 
           | When everyone gets up off the ground after the play is over,
           | Nvidia might still be holding the ball but it might just as
           | easily be someone else.
        
           | mandevil wrote:
           | It lead to more revenue for the industry as a whole. But not
           | necessarily for the individual companies that bubbled the
           | hardest: Cisco stock is still to this day lower than it was
           | at peak in 2000, to point to a significant company that sold
           | actual physical infra products necessary for the internet and
           | still around and profitable to this day. (Some companies that
           | bubbled did quite well, AMZN is like 75x from where it was in
           | 2000. But that's a totally different company that captured an
           | enormous amount of value from AWS that was not visible to the
           | market in 2000, so it makes sense.)
           | 
           | If stock market-cap is (roughly) the market's aggregated best
           | guess of future profits integrated over all time, discounted
           | back to the present at some (the market's best guess of the
           | future?) rate, then increasing uncertainty about the
           | predicted profits 5-10 years from now can have enormous
           | influence on the stock. Does NVDA have an AWS within it now?
        
             | aurareturn wrote:
             | >It lead to more revenue for the industry as a whole. But
             | not necessarily for the individual companies that bubbled
             | the hardest: Cisco stock is still to this day lower than it
             | was at peak in 2000, to point to a significant company that
             | sold actual physical infra products necessary for the
             | internet and still around and profitable to this day. (Some
             | companies that bubbled did quite well, AMZN is like 75x
             | from where it was in 2000. But that's a totally different
             | company that captured an enormous amount of value from AWS
             | that was not visible to the market in 2000, so it makes
             | sense.)
             | 
             | Cisco in 1994: $3.
             | 
             | Cisco after dotcom bubble: $13.
             | 
             | So is Nvidia's stock price closer to 1994 or 2001?
        
           | fspeech wrote:
           | If you normalize Nvidia's gross margin and take into account
           | of competitors sure. But its current high margin is driven by
           | Big Tech FOMO. Do keep in mind that 90% margin or 10x cost to
           | 50% margin or 2x cost is a 5x price reduction.
        
             | aurareturn wrote:
             | So why would DeepSeek decrease FOMO? It should increase it
             | if anything.
        
               | Vegenoid wrote:
               | Because DeepSeek demonstrates that loads of compute isn't
               | necessary for high-performing models, and so we won't
               | need as much and as powerful of hardware as was
               | previously thought, which is what Nivida's valuation is
               | based on?
        
         | vFunct wrote:
         | I worked on a network that used a protocol very similar to ATM
         | (actually it was the first Iridium satellite network). An
         | internet based on ATM would have been amazing. You're basically
         | guaranteeing a virtual switched circuit, instead of the packets
         | we have today. The horror of packet switching is all the
         | buffering it needs, since it doesn't guarantee circuits.
         | 
         | Bandwidth is one thing, but the real benefit is that ATM also
         | guaranteed minimal latencies. You could now shave off another
         | 20-100ms of latency for your FaceTime calls, which is subtle
         | but game changing. Just instant-on high def video
         | communications, as if it were on closed circuits to the next
         | room.
         | 
         | For the same reasons, the AI analogy could benefit from both
         | huge processing as well as stronger algorithms.
        
           | lxgr wrote:
           | > You're basically guaranteeing a virtual switched circuit
           | 
           | Which means you need state (and the overhead that goes with
           | it) for each connection _within the network_. That 's
           | horribly inefficient, and precisely the reason packet-
           | switching won.
           | 
           | > An internet based on ATM would have been amazing.
           | 
           | No, we'd most likely be paying by the socket connection (as
           | somebody has to pay for that state keeping overhead), which
           | sounds horrible.
           | 
           | > You could now shave off another 20-100ms of latency for
           | your FaceTime calls, which is subtle but game changing.
           | 
           | Maybe on congested Wi-Fi (where even circuit switching would
           | struggle) or poorly managed networks (including shitty ISP-
           | supplied routers suffering from horrendous bufferbloat).
           | Definitely not on the majority of networks I've used in the
           | past years.
           | 
           | > The horror of packet switching is all the buffering it
           | needs [...]
           | 
           | The ideal buffer size is exactly the bandwidth-delay product.
           | That's really not a concern these days anymore. If anything,
           | buffers are much too large, causing unnecessary latency;
           | that's where bufferbloat-aware scheduling comes in.
        
             | vFunct wrote:
             | The cost for interactive video would be a requirement of
             | 10x bandwidth, basically to cover idle time. Not efficient
             | but not impossible, and definitely wouldn't change ISP
             | business models.
             | 
             | The latency benefit would outweigh the cost. Just
             | absolutely instant video interaction.
        
               | foobarian wrote:
               | It is fascinating to think that before digital circuits
               | phone calls were accomplished by an end-to-end electrical
               | connection between the handsets. What luxury that must
               | have been! If only those ancestors of ours had modems and
               | computers to use those excellent connections for low-
               | latency gaming... :-)
        
           | thijson wrote:
           | I remember my professor saying how the fixed packet size in
           | ATM (53 bytes) was a committee compromise. North America
           | wanted 64 bytes, Europe wanted 32 bytes. The committee chose
           | around the midway point.
        
             | wtallis wrote:
             | 53 byte frames is what results in the exact compromise of
             | 48 bytes for the _payload_ size.
        
           | pjdesno wrote:
           | Man, I saw a presentation on Iridium when I was at Motorola
           | in the early 90s, maybe 92? Not a marketing presentation -
           | one where an engineer was talking, and had done their own
           | slides.
           | 
           | What I recall is that it was at a time when Internet folks
           | had made enormous advances in understanding congestion
           | behavior in computer networks, and other folks (e.g. my
           | division of Motorola) had put a lot of time into
           | understanding the limited burstiness you get with silence
           | suppression for packetized voice, and these folks knew
           | nothing about it.
        
         | paulddraper wrote:
         | I love algorithms as much the next guy, but not really.
         | 
         | DCT was developed in 1972 and has a compression ratio of 100:1.
         | 
         | H.264 compresses 2000:1.
         | 
         | And standard resolution (480p) is ~1/30th the resolution of 4k.
         | 
         | ---
         | 
         | I.e. Standard resolution with DCT is smaller than 4k with
         | H.264.
         | 
         | Even high-definition (720p) with DCT is only twice the
         | bandwidth of 4k H.264.
         | 
         | Modern compression has allowed us to add a bunch more pixels,
         | but it was hardly a requirement for internet video.
        
           | wtallis wrote:
           | The web didn't go from streaming 480p straight to 4k. There
           | were a couple of intermediate jumps in pixel count that were
           | enabled in large part by better compression. Notably, there
           | was a time period where it was important to ensure your
           | computer had hardware support for H.264 decode, because it
           | was taxing on low-power CPUs to do at 1080p and you weren't
           | going to get streamed 1080p content in any simpler, less
           | efficient codec.
        
             | paulddraper wrote:
             | Right.
             | 
             | Modern compression algorithms were developed but not even
             | computationally available for some of the time.
        
           | WhitneyLand wrote:
           | DCT is not an algorithm at all, it's a mathematical
           | transform.
           | 
           | It doesn't have a compression ratio.
        
             | paulddraper wrote:
             | > DCT compression, also known as block compression,
             | compresses data in sets of discrete DCT blocks.[3] DCT
             | blocks sizes including 8x8 pixels for the standard DCT, and
             | varied integer DCT sizes between 4x4 and 32x32
             | pixels.[1][4] The DCT has a strong energy compaction
             | property,[5][6] capable of achieving high quality at high
             | data compression ratios.[7][8] However, blocky compression
             | artifacts can appear when heavy DCT compression is applied.
             | 
             | https://en.wikipedia.org/wiki/Discrete_cosine_transform
        
           | foobarian wrote:
           | I'm sure it helped, but yeah, not only e2e bandwidth but also
           | the total network throughput increased by vast orders of
           | magnitude.
        
         | eigenvalue wrote:
         | Yes, that is a very apt analogy!
        
         | tuna74 wrote:
         | I always like the "look" of high bit rate Mpeg2 video. Download
         | HD japanese TV content from 2005-2010 and it still looks really
         | good.
        
         | TheCondor wrote:
         | It seems more stark even. The energy costs that are current and
         | then projected for AI are _staggering_. At the same time, I
         | think it has been MS that has been publishing papers on LLMs
         | that are smaller (so called small language models) but more
         | targeted and still achieving a fairly high  "accuracy rate."
         | 
         | Didn't TMSC say that SamA came for a visit and said they needed
         | $7T in investment to keep up with the pending demand needs.
         | 
         | This stuff is all super cool and fun to play with, I'm not a
         | nay sayer but it almost feels like these current models are
         | "bubble sort" and who knows how it will look if "quicksort" for
         | them becomes invented.
        
       | aurareturn wrote:
       | Perhaps most devastating is DeepSeek's recent efficiency
       | breakthrough, achieving comparable model performance at
       | approximately 1/45th the compute cost. This suggests the entire
       | industry has been massively over-provisioning compute resources.
       | 
       | I wrote in another thread why DeepSeek should increase demand for
       | chips, not lower.
       | 
       | 1. More efficient LLMs should lead to more usage, which means
       | more AI chip demand. Jevon's Paradox.
       | 
       | 2. Even if DeepSeek is 45x more efficient (it is not), models
       | will just become 45x+ bigger. It won't stay small.
       | 
       | 3. To build a moat, OpenAI and American AI companies need to up
       | their datacenter spending even more.
       | 
       | 4. DeepSeek's breakthrough is in distilling models. You still
       | need a ton of compute to train the foundational model to distill.
       | 
       | 5. DeepSeek's conclusion in their paper says more compute is
       | needed for next break through.
       | 
       | 6. DeepSeek's model is trained on GPT4o/Sonnet outputs. Again,
       | this reaffirms the fact that in order to take the next step, you
       | need to continue to train better models. Better models will
       | generate better data for next-gen models.
       | 
       | I think DeepSeek hurts OpenAI/Anthropic/Google/Microsoft. I think
       | DeepSeek helps TSMC/Nvidia.                 Combined with the
       | emergence of more efficient inference architectures through
       | chain-of-thought models, the aggregate demand for compute could
       | be significantly lower than current projections assume.
       | 
       | This is misguided. Let's think logically about this.
       | 
       | More thinking = smarter models
       | 
       | Faster hardware = more thinking
       | 
       | More/newer Nvidia GPUs, better TSMC nodes = faster hardware
       | 
       | Therefore, you can conclude that Nvidia and TSMC demand should go
       | up because of CoT models. In 2025, CoT models are clearly
       | bottlenecked by not having enough compute.                 The
       | economics here are compelling: when DeepSeek can match GPT-4
       | level performance while charging 95% less for API calls, it
       | suggests either NVIDIA's customers are burning cash unnecessarily
       | or margins must come down dramatically.
       | 
       | Or that in order to build a moat, OpenAI/Anthropic/Google and
       | other laps need to double down on even more compute.
        
         | outside1234 wrote:
         | But Microsoft hosts 3rd party models too, and cheaper models
         | means more usage, which means more $$$ to scaled cloud
         | providers right?
        
           | clvx wrote:
           | it means they can serve more with what they have if they
           | implement models with deepseek's optimizations. More usage
           | doesn't mean Nvidia will get the same margins when cloud
           | providers scale out with this innovation.
        
         | AnotherGoodName wrote:
         | I agree with this.
         | 
         | Fwiw many of the improvements in Deepseek were already in other
         | 'can run on your personal computer' AI's such as Meta's Llama.
         | Deepseek is actually very similar to Llama in efficiency.
         | People were already running that on home computers with M3's.
         | 
         | A couple of examples; Meta's multi-token prediction was
         | specifically implemented as a huge efficiency improvement that
         | was taken up by Deepseek. REcurrent ADaption (READ) was another
         | big win by Meta that Deepseek utilized. Multi-head Latent
         | Attention is another technique, not pioneered by Meta but used
         | by both Deepseek and Llama.
         | 
         | Anyway Deepseek isn't some independent revolution out of
         | nowhere. It's actually very very similar to the existing state
         | of the art and just bundles a whole lot of efficiency gains in
         | one model. There's no secret sauce here. It's much better than
         | what openAI has but that's because openAI seem to have
         | forgotten 'The Bitter Lesson'. They have been going at things
         | in an extremely brute force way.
         | 
         | Anyway why do i point out that Deepseek is very similar to
         | something like Llama? Because Meta's spending 100's of billions
         | on chips to run it. It's pretty damn efficient, especially
         | compared to openAI but they are still spending billions on
         | datacenter build-outs.
        
           | crubier wrote:
           | > openAI seem to have forgotten 'The Bitter Lesson'. They
           | have been going at things in an extremely brute force way.
           | 
           | Isn't the point of 'The Bitter Lesson' precisely that in the
           | end, brute force wins, and hand-crafted optimizations like
           | the ones you mention llama and deepseek use are bound to lose
           | in the end?
        
             | AnotherGoodName wrote:
             | Imho the tldr is that the wins are always from 'scaling
             | search and learning'.
             | 
             | Any customisations that aren't related to the above are
             | destined to be overtaken by someone that can improve the
             | scaling of compute. OpenAI do not seem to be doing as much
             | to improve the scaling of the compute in software terms
             | (they are doing a lot in hardware terms admitedly). They
             | have models at the top of the charts for various benchmarks
             | right now but it feels like a temporary win from chasing
             | those benchmarks outside of the focus of scaling compute.
        
       | macawfish wrote:
       | This is exactly where project digits comes in. Nvidia needs to
       | pivot toward being a local inference platform if they want to
       | survive the next shift.
        
       | skizm wrote:
       | I'm wondering if there's a (probably illegal) strategy in the
       | making here:                   - Wait till NVDA rebounds in
       | price.         - Create an OpenAI "competitor" that is powered by
       | Llama or a similar open weights model.         - Obscure the fact
       | that the company runs on this open tech and make it seem like
       | you've developed your own models, but don't outright lie.
       | - Release an app and whitepaper (whitepaper looks and sounds
       | technical, but is incredibly light on details, you only need to
       | fool some new-grad stock analysts).         - Pay some shady
       | click farms to get your app to the top of Apples charts (you only
       | need it to be there for like 24 hours tops).         - Collect
       | profits from your NVDA short positions.
        
         | tw1984 wrote:
         | this is exactly what DeepSeek is doing, the only difference is
         | they built the real model, not a fake one.
        
         | startupsfail wrote:
         | - Fail at the above.
         | 
         | I don't think this is what happened with DeepSeek. It seems
         | that they've genuinely optimized their model for efficiency and
         | used GPUs properly (tiled FP8 trick and FP8 training). And came
         | out on top.
         | 
         | The impact on the NVIDIA stock is ridiculous. DeepSeek took the
         | advantage of flexible GPU architecture (unlike inflexible
         | hardware acceleration).
        
           | mmiliauskas wrote:
           | This is what I still don't understand, how much of what they
           | claim has been actually replicated? From what I understand
           | the "50x cheaper" inference is coming from their pricing
           | page, but is it actually 50x cheaper than the best open
           | source models?
        
             | zamadatix wrote:
             | 50x cheaper than OpenAI's pricing on an open source model
             | which doesn't require giving that quality level up. The
             | best open source models were much closer in pricing but
             | V3/R1 are that way while being a results topper.
        
       | fairity wrote:
       | DeepSeek just further reinforces the idea that there is a first-
       | move _disadvantage_ in developing AI models.
       | 
       | When someone can replicate your model for 5% of the cost in 2
       | years, I can only see 2 rational decisions:
       | 
       | 1) Start focusing on cost efficiency today to reduce the
       | advantage of the second mover (i.e. trade growth for
       | profitability)
       | 
       | 2) Figure out how to build a real competitive moat through one or
       | more of the following: economies of scale, network effects,
       | regulatory capture
       | 
       | On the second point, it seems to me like the only realistic
       | strategy for companies like OpenAI is to turn themselves into a
       | platform that benefits from direct network effects. Whether
       | that's actually feasible is another question.
        
         | Mistletoe wrote:
         | I feel like AI tech just reverse scales and reverse flywheels,
         | unlike the tech giant walls and moats now, and I think that is
         | wonderful. OpenAI has really never made sense from a financial
         | standpoint and that is healthier for humans. There's no network
         | effect because there's no social aspect to AI chatbots. I can
         | hop on DeepSeek from Google Gemini or OpenAI at ease because I
         | don't have to have friends there and/or convince them to move.
         | AI is going to be a race to the bottom that keeps prices low to
         | zero. In fact I don't know how they are going to monetize it at
         | all.
        
         | tw1984 wrote:
         | > DeepSeek just further reinforces the idea that there is a
         | first-move disadvantage in developing AI models.
         | 
         | you are assuming that what DeepSeek achieved can be reasonably
         | easily replicated by other companies. then the question is when
         | all big techs and tons of startups in China and the US are
         | involved, how come none of those companies succeeded?
         | 
         | deepseek is unique.
        
           | 11101010001100 wrote:
           | Deepseek is unique, but the US has consistently
           | underestimated Chinese R&D, which is not a winning strategy
           | in iterated games.
        
             | rightbyte wrote:
             | There seem to be a 100 fold uptick in jingoists in the last
             | 3-4 years which makes my head hurt but I think there is no
             | consistent "underestimation" in academic circles? I think I
             | have read articles about the up and coming Chinese STEM for
             | like 20 years.
        
               | coliveira wrote:
               | Yes, for people in academia the trend is clear, but it
               | seems that WallStreet didn't believe this was possible.
               | They assume that spending more money is all you need to
               | dominate technology. Wrong! Technology is about human
               | potential. If you have less money but bigger investment
               | in people you'll win the technological race.
        
               | rightbyte wrote:
               | I think Wall Street is in for surprise as they have been
               | profiting from liquidating the inefficiency of worker
               | trust and loyalty for quite some time now.
               | 
               | It think they think American engineering excellence was
               | due to neoliberal inginuenity visavi the USSR, not the
               | engineers and the transfer of academic legacy from
               | generation to generation.
        
               | coliveira wrote:
               | This is even more apparent when large tech corporations
               | are, supposedly, in a big competition but at the same
               | time firing thousands of developers and scientists. Are
               | they interested in making progress or just reducing
               | costs?
        
               | corimaith wrote:
               | What does DeepSeek or really High Flyer do that is
               | particularly exceptional regarding employees? HFT and
               | other elite law or Hedge funds are known to have pretty
               | zany benefits.
        
               | 11101010001100 wrote:
               | Precisely. This is the view from the ivory tower.
        
             | corimaith wrote:
             | That doesn't the calculus regarding the actions you would
             | pick externally, in fact it only strengthens the point for
             | increased tech restrictions and more funding.
        
           | rightbyte wrote:
           | Unique, ye, but isn't their method open? I read something
           | about a group replicating a smaller variant of their main
           | model.
        
             | ghostzilla wrote:
             | Which brings the question, if LLMs are an asset of such
             | strategic value, why did China allow the DeepSeek to be
             | released?
             | 
             | I see two possibilities here, either that the CCP is not
             | that all-reaching as we think, or that the value of the
             | technology isn't critical, and that the release was further
             | cleared with the CCP and maybe even timed to come right
             | after Trump's announcement of American AI supremacy.
        
               | rightbyte wrote:
               | It is hard to estimate how much it is "didn't care",
               | "didn't know" or "did it" I think. Rather pointless
               | unless there are public party discussion about it to
               | read.
        
               | fairity wrote:
               | It's early innings, and supporting the open source
               | community could be viewed by the CCP as an effective way
               | to undermine the US's lead in AI.
               | 
               | In a way, their strategy could be:
               | 
               | 1) Let the US invest $1 trillion in R&D
               | 
               | 2) Support the open source community such that their
               | capability to replicate these models only marginally lags
               | the private sector
               | 
               | 3) When R&D costs are more manageable, lean in and play
               | catch up
        
               | creato wrote:
               | I really doubt there was any intention behind it at all.
               | I bet deepseek themselves are surprised at the impact
               | this is having, and probably regret releasing so much
               | information into the open.
        
               | lenerdenator wrote:
               | It will be assumed by the American policy establishment
               | that this represents what the CCP doesn't consider
               | important, meaning that they have even better stuff in
               | store. It will also be assumed that this was timed to
               | take a dump on Trump's announcement, like you said.
               | 
               | And it did a great job. Nvidia stock's sunk, and
               | investors are going to be asking if it's really that
               | smart to give American AI companies their money when the
               | Chinese can do something similar for significantly less
               | money.
        
           | jerjerjer wrote:
           | We have one success after ~two years of ChatGPT hype (and
           | therefore subsequent replication attempts). That's as fast as
           | it gets.
        
         | aurareturn wrote:
         | This is wrong. First mover advantage is strong. This is why
         | OpenAI is much bigger than Mixtral despite what you said.
         | 
         | First mover advantage acquired and keeps subscribers.
         | 
         | No one really cares if you matched GPT4o one year later. OpenAI
         | has had a full year to optimize the model, build tools around
         | the model, and used the model to generate better data for their
         | next generation foundational model.
        
           | jaynate wrote:
           | They also burnt a hell of a lot more cash. That's a
           | disadvantage.
        
           | itissid wrote:
           | OpenAI does not have a business model that is cashflow
           | positive at this point and/or a product that gives them a
           | significant leg up in the same moat sense Office/Teams might
           | give to Microsoft.
        
             | aurareturn wrote:
             | Companies in the mobile era took a decade or more to become
             | profitable. For example, Uber and Airbnb.
             | 
             | Why do you expect OpenAI to become profitable after 3 years
             | of chatgpt?
        
               | meiraleal wrote:
               | Nobody expects it but what we know for sure is that they
               | have burnt billions of dollars. If other startups can get
               | there spending millions, the fact is that openai won't
               | ever be profitable.
               | 
               | And more important (for us), let the hiring frenzy start
               | again :)
        
               | aurareturn wrote:
               | They have a ton of revenue and high gross margins. They
               | burn billions because they need to keep training ever
               | better models until the market slows and competition
               | consolidates.
        
               | fairity wrote:
               | The counter argument is that they won't be able to
               | sustain those gross margins when the market matures
               | because they don't have an effective moat.
               | 
               | In this world, R&D costs and gross margin/revenue are
               | inextricably correlated.
        
               | aurareturn wrote:
               | When the market matures, there will be fewer competitors
               | so they won't need to sustain the level of investment.
               | 
               | The market always consolidates when it matures. Every
               | time. The market always consolidates into 2-3 big
               | players. Often a duopoly. OpenAI is trying to be one of
               | the two or three companies left standing.
        
               | physicsguy wrote:
               | Interest rates have an effect too, Uber and Airbnb were
               | starting in a much more fundraising friendly time.
        
           | dplgk wrote:
           | What is OpenAI's first-mover moat? I switched to Claude with
           | absolutely no friction or moat-jumping.
        
             | xxpor wrote:
             | What is Google's first mover moat? I switched to
             | Bing/DuckDuckGo with absolutely no friction or moat
             | jumping.
             | 
             | Brands are incredibly powerful when talking about consumer
             | goods.
        
               | bpt3 wrote:
               | Google's moat _was_ significantly better results than the
               | competition for about 2 decades.
               | 
               | Your analogy is valid at this time, but proves the GP's
               | point, not yours.
        
               | fairity wrote:
               | I think it's worth double clicking here. _Why_ did Google
               | have significantly better search results for a long time?
               | 
               | 1) There was a data flywheel effect, wherein Google was
               | able to improve search results by analyzing the vast
               | amount of user activity on its site.
               | 
               | 2) There were real economies of scale in managing the
               | cost of data centers and servers
               | 
               | 3) Their advertising business model benefited from
               | network effects, wherein advertisers don't want to bother
               | giving money to a search engine with a much smaller user
               | base. This profitability funded R&D that competitors
               | couldn't match.
               | 
               | There are probably more that I'm missing, but I think the
               | primary takeaway is that Google's scale, in and of
               | itself, led to a better product.
               | 
               | Can the same be said for OpenAI? I can't think of any
               | strong economies of scale or network effects for them,
               | but maybe I'm missing something. Put another way, how
               | does OpenAI's product or business model get significantly
               | better as more people use their service?
        
               | aurareturn wrote:
               | They have more data on what people want from models?
               | 
               | Their SOTA models can generate better synthetic data for
               | the next training run - leading to a flywheel effect?
        
               | rayval wrote:
               | In theory, the more people use the product, the more
               | OpenAI knows what they are asking about and what they do
               | after the first result, the better it can align its model
               | to deliver better results.
               | 
               | A similar dynamic occurred in the early days of search
               | engines.
        
               | visarga wrote:
               | I call it the experience flywheel. Humans come with
               | problems, AI asistant generates some ideas, human tries
               | them out and comes back to iterate. The model gets
               | feedback on prior ideas. So you could say AI tested an
               | idea in the real world, using a human. This happens many
               | times over for 300M users at OpenAI. They put a trillion
               | tokens into human brains, and as many into their logs.
               | The influence is bidirectional. People adapt to the
               | model, and the model adapts to us.. But that is in
               | theory.
               | 
               | In practice I never heard OpenAI mention how they use
               | chat logs for improving the model. They are either afraid
               | to say, for privacy reasons, or want to keep it secret
               | for technical advantage. But just think about the
               | billions of sessions per month. A large number of them
               | contain extensive problem solving. So the LLMs can
               | collect experience, and use it to improve problem
               | solving. This makes them into a flywheel of human
               | experience.
        
               | nyrikki wrote:
               | You are forgetting a bit, I worked in some of the large
               | datacenters where both Google and Yahoo had cages.
               | 
               | 1) Google copied the hotmail model of strapping commodity
               | PC components to cheap boards and building software to
               | deal with complexity.
               | 
               | 2) Yahoo had a much larger cage, filled with very very
               | expensive and large DEC machines, with one poor guy
               | sitting in a desk in there almost full time rebooting the
               | systems etc....I hope he has any hearing left today.
               | 
               | 3) Just right before the .com crash, I was in a cage next
               | to Google's racking dozens of brand new Netra T1s, which
               | were pretty slow and expensive...that company I was
               | working for died in the crash.
               | 
               | Look at Google's web page:
               | 
               | https://www.webdesignmuseum.org/gallery/google-1999
               | 
               | Compare that to Yahoo:
               | 
               | https://www.webdesignmuseum.org/gallery/yahoo-in-1999
               | 
               | Or the company they originaly tried to sell google to
               | Excite:
               | 
               | https://www.webdesignmuseum.org/gallery/excite-2001
               | 
               | Google grew to be profitable because they controlled
               | costs, invested in software vs service contracts and
               | enterprise gear, had a simple non-intrusive text based ad
               | model etc...
               | 
               | Most of what you mention above was well after that model
               | focused on users and thrift allowed them to scale and is
               | survivorship bias. Internal incentives that directed
               | capitol expenditures to meet the mission vs protect
               | peoples back was absolutely a related to their survival.
               | 
               | Even though it was a metasearch, my personal preference
               | was SavvySearch until it was bought and killed or what
               | ever that story way.
               | 
               | OpenAI is far more like Yahoo than Google.
        
               | WalterBright wrote:
               | > I hope he has any hearing left today
               | 
               | I opted for a fanless graphics board, for just that
               | reason.
        
               | talldayo wrote:
               | > What is Google's first mover moat?
               | 
               | AdSense
        
               | eikenberry wrote:
               | Google wasn't the first mover in search. They were at
               | least second if not third.
        
             | aurareturn wrote:
             | OpenAI has a lot more revenue than Claude.
             | 
             | Late in 2024, OpenAI had $3.7b in revenue. Meanwhile,
             | Claude's mobile app hit $1 million in revenue around the
             | same time.
        
               | apwell23 wrote:
               | > Late in 2024, OpenAI had $3.7b in revenue
               | 
               | Where do they report these ?
               | 
               | edit i found it here
               | https://www.cnbc.com/2024/09/27/openai-sees-5-billion-
               | loss-t...
               | 
               | "OpenAI sees roughly $5 billion loss this year on $3.7
               | billion in revenue"
        
             | kpennell wrote:
             | almost everyone I know is the same. 'Claude seems to be
             | better and can take more data' is what I hear a lot.
        
             | ed wrote:
             | One moat will eventually come in the form of personal
             | knowledge about you - consider talking with a close friend
             | of many years vs a stranger
        
               | kgc wrote:
               | Couldn't you just copy all your conversations over?
        
             | moralestapia wrote:
             | *sigh*
             | 
             | This broken record again.
             | 
             | Just observe reality. OpenAI is leading, by far.
             | 
             | All these "OpenAI has no moat" arguments will only make
             | sense whenever there's a material, _observable_ (as in not
             | imaginary), shift on their market share.
        
             | ransom1538 wrote:
             | I moved 100% over to deepseek. No switch cost. Zero.
        
           | lxgr wrote:
           | > First mover advantage acquired and keeps subscribers.
           | 
           | Does it? As a chat-based (Claude Pro, ChatGPT Plus etc.)
           | user, LLMs have zero stickiness to me right now, and the APIs
           | hardly can be called moats either.
        
             | distances wrote:
             | If it's for mass consumer market then it does matter. Ask
             | any non-technical person around you. High chance is that
             | they know ChatGPT but can't name a single other AI model or
             | service. Gemini, just a distant maybe. Claude, definitely
             | not -- I'm positive I'm hard pressed to find anyone in my
             | _technical_ friends who knows about Claude.
        
               | xmodem wrote:
               | They probably know CoPilot as the thing Microsoft is
               | trying to shove down their throat...
        
         | boringg wrote:
         | Your making some big assumptions projecting into the future.
         | One that deepseek takes market position, two that the
         | information they have released is honest regarding training
         | usage, spend etc.
         | 
         | Theres a lot more still to unpack and I don't expect this to
         | stay solely in the tech realm. Seems to politically sensitive.
        
         | meiraleal wrote:
         | DeepSeek is profitable, openai is not. That big expensive moat
         | won't help much when the competition knows how to fly.
        
           | aurareturn wrote:
           | DeepSeek is not profitable. As far as I know, they don't have
           | any significant revenue from their models. Meanwhile, OpenAI
           | has $3.7b in revenue last reported and has high gross
           | margins.
        
             | meiraleal wrote:
             | tell that to the stock market then, it might change the
             | graph direction back to green.
        
               | aurareturn wrote:
               | I'm doing the best I can.
        
       | 11101010001100 wrote:
       | I think this is just a(nother) canary for many other markets in
       | the US v China game of monopoly. One weird effect in all this is
       | that US Tech may go on to be over valued (i.e., disconnect from
       | fundamentals) for quite some time.
        
       | btbuildem wrote:
       | I always appreciate reading a take from someone who's well versed
       | in the domains they have opinions about.
       | 
       | I think longer-term we'll eat up any slack in efficiency by
       | throwing more inference demands at it -- but the shift is
       | tectonic. It's a cultural thing. People got acclimated to
       | shlepping around morbidly obese node packages and stringing
       | together enormous python libraries - meanwhile the deepseek guys
       | out here carving bits and bytes into bare metal. Back to FP!
        
         | vonneumannstan wrote:
         | This is a bizarre take. First Deepseek no doubt is still using
         | the same bloated Python ML packages as everyone else. Second
         | since this is "open source" it's pretty clear that the big labs
         | are just going to replicate this basically immediately and with
         | their already massive compute advantages put models out that
         | are extra OOM larger/better/etc/ than what Deepseek can
         | possibly put out. Theres just no reason to think that e.g. a
         | 10x increase in training efficiency does anything but increase
         | the size of the next model generation by 10x.
        
       | christkv wrote:
       | All this is good news for all of us. Bad news probably for
       | Nvidia's margins long term but who cares. If we can train and
       | inference in less cycles and watts that is awesome.
        
       | qwertox wrote:
       | Considering the fact that current models were trained on top-
       | notch books, those read and studied by the most brilliant
       | engineers, the models are pretty dumb.
       | 
       | They are more like the thing which enabled computers to work with
       | and digest text instead of just code. The fact that they can
       | parrot pretty interesting relationships from the texts they've
       | consumed kind of proofs that they are capable of statistically
       | "understanding" what we're trying to talk with them about, so
       | it's a pretty good interface.
       | 
       | But going back to the really valuable content of the books
       | they've been trained on, they just don't understand it. There's
       | other AI which needs to get created which can really learn the
       | concepts taught in those books instead of just the words and the
       | value of the proximities between them.
       | 
       | To learn that other missing part will require hardware just as
       | uniquely powerful and flexible as what Nvidia has to offer. Those
       | companies now optimizing for inference and LLM training will be
       | good at it and have their market share, but they need to ensure
       | that their entire stack is as capable of Nvidia's stack, if they
       | also want to be part of future developments. I don't know if
       | Tenstorrent or Groq are capable of doing this, but I doubt it.
        
       | lenerdenator wrote:
       | I think it's more than just the market effect on "established" AI
       | players like Nvidia.
       | 
       | I don't think it's necessarily a coincidence that DeepSeek
       | dropped within a short time frame of the announcement of the AI
       | investment initiative by the Trump administration.
       | 
       | The idea is to get the money from investors who want to earn a
       | return. Lower capex is attractive to investors, and DS drops
       | capex dramatically. It makes Chinese AI talent look like the
       | smart, safe bet. Nothing like DS could happen in China unless the
       | powers-that-be knew about it and got some level of control. I'm
       | also willing to bet that this isn't the best they've got.
       | 
       | They're saying "we can deliver the same capabilities for far
       | less, and we're not going to threaten you with a tariff for not
       | complying".
        
       | robomartin wrote:
       | Despite the fact that this article is very well written and
       | certainly contains high quality information, I choose to remain
       | skeptical as it pertains to Nvidia's position in the market. I'll
       | come right out and say that my experience likely makes me see
       | this from a biased position.
       | 
       | The premise is simple: Business is warfare. Anything you can do
       | to damage or slow down the market leader gives you more time to
       | get caught up. FUD is a powerful force.
       | 
       | My bias comes from having been the subject of such attacks in my
       | prior tech startup. Our technology was destroying the offerings
       | of the market leading multi-billion-dollar global company that
       | pretty much owned the sector. The natural processes of such a
       | beast caused them not to be able to design their way out of a
       | paper bag. We clearly had an advantage. The problem was that we
       | did not have the deep pockets necessary to flood the market with
       | it and take them out.
       | 
       | What did they do?
       | 
       | The started a FUD campaign.
       | 
       | They went to every single large customer and our resellers (this
       | was a hardware/software product) a month or two before the two
       | main industry tradeshows, and lied to them. They promised that
       | they would show market-leading technology "in just a couple of
       | months" and would add comments like "you might want to put your
       | orders on hold until you see this". We had multi-million dollar
       | orders held for months in anticipation of these product
       | unveilings.
       | 
       | And, sure enough, they would announce the new products with a
       | great marketing push at the next tradeshow. All demos were
       | engineered and manipulated to deceive, all of them. Yet, the
       | incredible power of throwing millions of dollars at this effort
       | delivered what they needed, FUD.
       | 
       | The problem with new products is that it takes months for them to
       | be properly validated. So, if the company that had frozen a $5MM
       | order for our products decides to verify the claims of our
       | competitor, it typically took around four months. In four months,
       | they would discover that the new shiny object was shit and less
       | stellar than what they were told. I other words, we won. Right?
       | 
       | No!
       | 
       | The mega-corp would then reassure them that they iterated vast
       | improvements into the design and those would be presented --I kid
       | you not-- at the next tradeshow. Spending millions of dollars
       | they, at this point, denied us of millions of dollars of revenue
       | for approximately one year. FUD, again.
       | 
       | The next tradeshow came and went and the same cycle repeats...it
       | would take months for customers to realize the emperor had no
       | clothes. It was brutal to be on the receiving end of this without
       | the financial horsepower to be able to break through the FUD. It
       | was a marketing arms race and we were unprepared to win it. In
       | this context, the idea that a better mouse trap always wins is
       | just laughable.
       | 
       | This did not end well. They were not going to survive another FUD
       | cycle. Reality eventually comes into play. Except that, in this
       | case, 2008 happened. The economic implosion caught us in serious
       | financial peril due to the damage done by the FUD campaign.
       | Ultimately, it was not survivable and I had to shut down the
       | company.
       | 
       | It took this mega-corp another five years to finally deliver a
       | product that approximated what we had and another five years
       | after that to match and exceed it. I don't even want to imagine
       | how many hundreds of millions they spent on this.
       | 
       | So, long way of saying: China wants to win. No company in China
       | is independent from government forces. This is, without a doubt,
       | a war for supremacy in the AI world. It is my opinion that, while
       | the technology, as described, seems to make sense, it is highly
       | likely that this is yet another form of a FUD campaign to gain
       | time. If they can deny Nvidia (and others) the orders needed to
       | maintain the current pace, they gain time to execute on a
       | strategy that could give them the advantage.
       | 
       | Time will tell.
        
       | samiv wrote:
       | I think the biggest threat for future NVIDIa right now is their
       | own current success.
       | 
       | Their software platforms and CUDA are a very strong moat against
       | everyone else. I don't see any beating them on that front right
       | now.
       | 
       | The problem is that I'm afraid that all that money sloshing
       | inside the company is rotting the culture and that will
       | compromise future development.                 - Grifters are
       | filling out positions in many orgs only trying to milk it as much
       | as possible.        - Old employees become complacent with their
       | nice RSU packages Rest & Vest.
       | 
       | NVIDIA used to be extremely nimble and was way fighting way above
       | it's weight class. Prior to Mellanox acquisition only around 10k
       | employees and after another 10k more.
       | 
       | If there's a real threat to their position at the top of the AI
       | offerings will they be able to roll up the sleeves and get back
       | to work or will the organizations be unable to move ahead.
       | 
       | Long term I think it's inevitable that China will take over the
       | technology leadership. They have the population and they have the
       | education programs and the skill to do this. At the same time in
       | the old western democracies things are becoming stagnant and I
       | even dare to say that the younger generations are declining. In
       | my native country the educational system has collapsed, over 20%
       | kids that finish elementary school cannot read or write. They can
       | mouth-breath and scroll TikTok though but just barely since their
       | attention span is about the same as gold fish.
        
         | _DeadFred_ wrote:
         | LOL. This isn't rot, it is reaching the end goal, the people
         | doing the work reach the rewards they were working towards. Rot
         | would imply somehow management should prevent rest and vest but
         | that is the exact model that they acquired their talent on. You
         | would have to remove capitalism from companies when companies
         | win at capitalism making it all just a giant rug pull for
         | employees.
        
       | scudsworth wrote:
       | what a compelling domain name. it compels me not to click on it
        
       | indymike wrote:
       | This story could be applied to every tech breakthrough. We start
       | where the breakthrough is moated by hardware, access to
       | knowledge, and IP. Over time:
       | 
       | - Competition gets crucial features into cheaper hardware
       | 
       | - Work-arounds for most IP are discovered
       | 
       | - Knowledge finds a way out of the castle
       | 
       | This leads to a "Cambrian explosion" of new devices and software
       | that usually gives rise to some game-changing new ways to use the
       | new technology. I'm not sure where we all thought this somehow
       | wouldn't apply to AI. We've seen the pattern with almost every
       | new technology you can think of. It's just how it works. Only the
       | time it takes for patents to expire changes this... so long as
       | everyone respects the patent.
        
         | _DeadFred_ wrote:
         | It's still wild to me that toasters have always been $20 but
         | extremely expensive lasers, digital chips, amps, motors, LCD
         | screens worked their way down to $20 CD players.
        
           | indymike wrote:
           | So... Electric toasters came to market in the 1920s, priced
           | from $15, eventually getting as low as $5. Adjusting for
           | inflation, that $15 toaster cost $236.70 in 2025 USD. Today's
           | $15 toaster would be about 90C/ in 1920s dollars... so it
           | follows the story.
        
       | lxgr wrote:
       | The most important part for me is:
       | 
       | > DeepSeek is a tiny Chinese company that reportedly has under
       | 200 employees. The story goes that they started out as a quant
       | trading hedge fund similar to TwoSigma or RenTec, but after Xi
       | Jinping cracked down on that space, they used their math and
       | engineering chops to pivot into AI research.
       | 
       | I guess now we have the answer to the question that countless
       | people have already asked: Where could we be if we figured out
       | how to get most math and physics PhDs to work on things other
       | than picking up pennies in front of steamrollers (a.k.a. HFT)
       | again?
        
         | rfoo wrote:
         | This is completely fake though. It was more like their founder
         | decided to start a branch to do AI research. It was well
         | planned, they bought significantly more GPUs than they can use
         | for quant research even before they start to do anything AI.
         | 
         | There was a crack down on algorithmic trading, but it didn't
         | had much impact and IMO someone higher up definitely does not
         | want to kill these trading firms.
        
           | lxgr wrote:
           | The optimal amount of algorithmic trading is definitely more
           | than none (I appreciate liquidity and price quality as much
           | as the next guy), but arguably there's a case here that we've
           | overshot a bit.
        
             | rightbyte wrote:
             | The price data I (we?) get is 15 minute delayed. I would
             | guess most of the profiteering is from consumers not
             | knowing the last transaction prices? I.e. an artificially
             | created edge by the broker who then sells the API to clean
             | their hands of the scam.
        
               | lxgr wrote:
               | Real-time price data is indeed not free, but widely
               | available even in retail brokerages. I've never seen a 15
               | minute delay in any US based trade, and I think I can
               | even access level 2 data a limited number of times on
               | most exchanges (not that it does me much good as a retail
               | investor).
               | 
               | > I would guess most of the profiteering is from
               | consumers not knowing the last transaction prices?
               | 
               | No, not at all. And I wouldn't even necessarily call it
               | profiteering. Ironically, as a retail investor you even
               | _benefit_ from hedge funds and HFTs being a counterpart
               | to your trades: You get on average better (and worst case
               | as good) execution from PFOF.
               | 
               | Institutional investors (which include pension funds,
               | insurances etc.) are a different story.
        
               | rightbyte wrote:
               | OK ty I guess I got it wrong. I thought it was way more
               | common than for my scrappy bank.
        
           | doctorpangloss wrote:
           | Who knows? That too is a bunch of mythmaking. One thing's for
           | sure, there are no moats or secrets.
        
         | auntienomen wrote:
         | DeepSeek is a subsidiary of a relatively successful Chinese
         | quant trading firm. It was the boss' weird passion project,
         | after he made a few billion yuan from his other passion,
         | trading. The whole thing was funded by quant trading profits,
         | which kind of undermines your argument. Maybe we should just
         | let extremely smart people work on the things that catch their
         | interest?
        
           | lxgr wrote:
           | Interest of extremely smart people is often is strongly
           | correlated with potential profits, and these are very much
           | correlated with policy, which in the case of financial
           | regulation shapes market structures.
           | 
           | Another way of saying this: It's a well-known fact that
           | complicated puzzles with a potentially huge reward attached
           | to them attract the brightest people, so I'm arguing that we
           | should be very conscious of the types of puzzles we
           | implicitly come up with, and consider this an externality to
           | be accounted for.
           | 
           | HFT is, to a large extent, a product of policy, in particular
           | Reg NMS, based on the idea that we need to have many
           | competing exchanges to make our markets more efficient. This
           | has worked well in breaking down some inefficiencies, but has
           | created a whole set of new ones, which are the basis of HFT
           | being possible in the first place.
           | 
           | There are various ideas on whether different ways of
           | investing might be more efficient, but these largely focus on
           | benefits to investors (i.e. less money being "drained away"
           | by HFT). What I'm arguing is that the "draining" might not
           | even be the biggest problem, but rather that the people doing
           | it could instead contribute to equally exciting, non-zero sum
           | games instead.
           | 
           | We definitely want to keep around the the part of HFT that
           | contributes to more efficient resource allocation (an
           | inherently hard problem), but wouldn't it be great if we
           | could avoid the part that only works around the kinks of a
           | particular market structure emergent from a particular piece
           | of regulation?
        
         | godelski wrote:
         | Interestingly a lot of the math and physics people in the ML
         | community are considered "grumpy researchers." A joke apparent
         | with this starter pack[0].
         | 
         | From my personal experience (undergrad physics, worked as
         | engineer, came to CS & ML because I liked the math), there's a
         | lot of pushback.                 - I've been told that the math
         | doesn't matter/you don't need math.       - I've heard very
         | prominent researchers say "fuck theorists"        - I've seen
         | papers routinely rejected for improving training techniques
         | with reviewers say "just tune a large model"       - I see
         | papers that show improvements when conditioning comparisons on
         | compute restraints because "not enough datasets" or "but does
         | it scale" (these questions can always be asked but require
         | exponentially more work)       - I've been told I'm gatekeeping
         | for saying "you don't need math to make good models, but you
         | need it to know why your models are wrong" (yes, this is a
         | reference)       - when pointing out math or statistical errors
         | I'm told it doesn't matter       - and much more.
         | 
         | I've heard this from my advisor, dissertation committee,
         | bosses[1], peers, and others (of course, HN). If my experience
         | is short of being rare, I think it explains the grumpy
         | group[2]. But I'm also not too surprised with how common it is
         | in CS for people to claim that everything is easy or that leet
         | code is proof of competence (as opposed to evidence).
         | 
         | I think unfortunately the problem is a bit bigger, but it isn't
         | unsolvable. Really, it is "easily" solvable since it just
         | requires us to make different decisions. Meaning _each and
         | every one of us_ has a direct impact on making this change.
         | Maybe I'm grumpy because I want to see this better world. Maybe
         | I'm grumpy because I know it is possible. Maybe I'm grumpy
         | because it is my job to see problems and try to fix them lol
         | 
         | [0] https://bsky.app/starter-
         | pack/roydanroy.bsky.social/3lba5lii... (not perfect, but
         | there's a high correlation and I don't think that's a
         | coincidence)
         | 
         | [1] Even after _demonstrating_ how my points directly improve
         | the product, more than doubling performance on _customer_ data.
         | 
         | [2] not to mention the way experiments are done, since it is
         | stressed in physicists that empirics is not enough.
         | https://www.youtube.com/watch?v=hV41QEKiMlM
        
           | lxgr wrote:
           | Is this in academia?
           | 
           | Arguably, the emergence of quant hedge funds and private AI
           | research companies is at least as much a symptom of the
           | dysfunctions of academia (and society's compensation of
           | academics on dimensions monetary and beyond) as it is of the
           | ability of Wall Street and Silicon Valley to treat former
           | scientists better than that.
        
             | godelski wrote:
             | > Is this in academia?
             | 
             | Yes and no. Industry AI research is currently tightly
             | coupled with academic research. Most of the big papers you
             | see are either directly from the big labs or in
             | partnership. Not even labs like Stanford have sufficient
             | compute to train GPT from scratch (maybe enough for
             | DeepSeek). Here's Fei-Fei Li discussing the issue. Stanford
             | has something like 300 GPUs[1]? And those have to be split
             | across labs.
             | 
             | The thing is that there's always a pipeline. Academic does
             | most of the low level research, say TRL[2] 1-4,
             | partnerships happen between 4-6, and industry takes over
             | the rest. (with some wiggleroom on these numbers). Much of
             | ML academic research right now is tuning large models, made
             | by big labs. This isn't low TRL. Additionally, a lot of
             | research is rejected for not out-performing technologies
             | that are already at TRL 5-7. See Mamba for a recent
             | example. You could also point to KANs, which are probably
             | around TRL 3.                 > Arguably, the emergence of
             | quant hedge funds and private AI research companies is at
             | least as much a symptom of the dysfunctions of academia
             | 
             | Which is where I, again, both agree and disagree. It is not
             | _just_ a symptom of the dysfunction of academia, but _also_
             | industry. The reason I pointed out the grumpy researchers
             | is because a lot of these people have been discussing
             | techniques that DeepSeek used, long before they were used.
             | DeepSeek looks like what happens when you set these people
             | free. Which is my argument, that we should do that. Scale
             | Maximalists (also alled "Bitter Lesson Maximalists", but I
             | dislike the term) have been dominating ML research, and
             | DeepSeek shows that scale isn't enough. So will hopefully
             | give the mathy people more weight. But then again, is not
             | the common way monopolies fall is because they become too
             | arrogant and incestuous?
             | 
             | So mostly, I agree, I'm just pointing out that there is a
             | bit more subtly and I think we need to recognize that to
             | make progress. There are a lot of physicists and mathy
             | people who like ML and have been doing research in the area
             | but are often pushed out because of the thinking I listed.
             | Though part of the success of the quant industry is
             | recognizing that the strong math and modeling skills of
             | physicists generalize pretty well and you go after people
             | who recognize that an equation that describes a spring
             | isn't only useful for springs, but is useful for anything
             | that oscillates. That understanding of math at that level
             | is very powerful and boy are there a lot of people that
             | want the opportunity to demonstrate this in ML, they just
             | never get similar GPU access.
             | 
             | [0] https://www.ft.com/content/d5f91c27-3be8-454a-bea5-bb8f
             | f2a85...
             | 
             | [1] https://archive.is/20241125132313/https://www.thewrap.c
             | om/un...
             | 
             | [2]
             | https://en.wikipedia.org/wiki/Technology_readiness_level
        
       | hn_throwaway_99 wrote:
       | I'm curious if someone more informed than me can comment on this
       | part:
       | 
       | > Besides things like the rise of humanoid robots, which I
       | suspect is going to take most people by surprise when they are
       | rapidly able to perform a huge number of tasks that currently
       | require an unskilled (or even skilled) human worker (e.g., doing
       | laundry ...
       | 
       | I've always said that the real test for humanoid AI is folding
       | laundry, because it's an incredibly difficult problem. And I'm
       | not talking about giving a machine clothing piece-by-piece
       | flattened so it just has to fold, I'm talking about saying to a
       | robot "There's a dryer full of clothes. Go fold it into separate
       | piles (e.g. underwear, tops, bottoms) and don't mix the husband's
       | clothes with the wife's". That is, something most humans in the
       | developed world have to do a couple times a week.
       | 
       | I've been following some of the big advances in humanoid robot
       | AI, but the above task _still_ seems miles away given current
       | tech. So is the author 's quote just more unsubstantiated hype
       | that I'm constantly bombarded with in the AI space, or have there
       | been advancements recently in robot AI that I'm unaware of?
        
         | ieee2 wrote:
         | I saw such robot's demos doing exactly that on youtube/x - not
         | very precisely yet, but almost sufficiently enough. And it is
         | just a beginning. Considering that majority of the laundry is
         | very similar (shirts, t-shirts, trousers, etc..) I think this
         | will be solved soon with enough training.
        
           | hn_throwaway_99 wrote:
           | Can you share what you've seen? Because from what I've seen,
           | I'm far from convinced. E.g. there is this,
           | https://youtube.com/shorts/CICq5klTomY , which nominally does
           | what I've described. Still, as impressive as that is, I think
           | the distance from what that robot does to what a human can do
           | is a _lot_ farther than it seems. Besides noticing that the
           | folded clothes are more like a neatly arranged pile, what
           | about all the edge cases? What about static cling? Can it
           | match socks? What if something gets stuck in the dryer?
           | 
           | I'm just very wary of looking at that video and saying "Look!
           | It's 90% of the way there! And think how fast AI advances!",
           | because that critical last 10% can often be harder than the
           | first 90% and then some.
        
             | Nition wrote:
             | First problem with that demo is that putting all your
             | clothes in a dryer is a very American thing. Much of the
             | world pegs their washing on a line.
        
         | rattray wrote:
         | https://physicalintelligence.company is working on this - see a
         | demo where their robot does ~exactly what you said, I believe
         | based on a "generalist" model (not pretrained on the tasks):
         | https://www.youtube.com/watch?v=J-UTyb7lOEw
        
           | delusional wrote:
           | There are so many cuts in that 1 minute video, Jesus Christ.
           | You'd think it was produced for TikTok.
        
         | hnuser123456 wrote:
         | 2 months ago, Boston Dynamics' Atlas was barely able to put
         | solid objects in open cubbies. [1] Folding, hanging, and
         | dresser drawer operation appears to be a few years out still.
         | 
         | https://www.youtube.com/watch?v=F_7IPm7f1vI
        
       | rashidae wrote:
       | While Nvidia's valuation may feel bloated due to AI hype, AMD
       | might be the smarter play.
        
       | UncleOxidant wrote:
       | Even if DeepSeek has figured out how to do more (or at least as
       | much) with less, doesn't the Jevons Paradox come into play? GPU
       | sales would actually increase because even smaller companies
       | would get the idea that they can compete in a space that only 6
       | months ago we assumed would be the realm of the large mega tech
       | companies (the Metas, Googles, OpenAIs) since the small players
       | couldn't afford to compete. Now that story is in question since
       | DeepSeek only has ~200 employees and claims to be able to train a
       | competitive model for about 20X less than the big boys spend.
        
         | samvher wrote:
         | My interpretation is that yes in the long haul, lower
         | energy/hardware requirements might increase demand rather than
         | decrease it. But right now, DeepSeek has demonstrated that the
         | current bottleneck to progress is _not_ compute, which
         | decreases the near term pressure on buying GPUs at any cost,
         | which decreases NVIDIA's stock price.
        
           | kemiller wrote:
           | Short term, I 100% agree, but remains to be seen what "short"
           | means. According to at least some benchmarks, Deepseek is
           | _two full orders of magnitude_ cheaper for comparable
           | performance. Massive. But that opens the door for much more
           | elaborate  "architectures" (chain of thought,
           | architect/editor, multiple choice) etc, since it's possible
           | to run it over and over to get better results, so raw speed &
           | latency will still matter.
        
         | yifanl wrote:
         | It does, but proving that it can be done with cheaper (and more
         | importantly for NVidia), lower margin chips breaks the spell
         | that NVidia will just be eating everybody's lunch until the end
         | of time.
        
           | aurareturn wrote:
           | If demand for AI chips will increase due to Jevon's paradox,
           | why would Nvidia's chips become cheaper?
           | 
           | In the long run, yes, they will be cheaper due to more
           | competition and better tech. But next month? It will be more
           | expensive.
        
             | yifanl wrote:
             | The usage of existing but cheaper nvidia chips to make
             | models of similar quality is the main takeaway.
             | 
             | It'll be much harder to convince people to buy the latest
             | and greatest with this out there.
        
               | UncleOxidant wrote:
               | The sweet spot for running local LLMs (from what I'm
               | seeing on forums like r/localLlama) is 2 to 4 3090s each
               | with 24GB of VRAM. NVidia (or AMD or Intel) would clean
               | up if they offered a card with 3090 level performance but
               | with 64GB of VRAM. Doesn't have to be the leading edge
               | GPU, just a decent GPU with lots of VRAM. This is kind of
               | what Digits will be (though the memory bandwidth is going
               | to be slower with because it'll be DDR5) and kind of what
               | AMD's Strix Halo is aiming for - unified memory systems
               | where the CPU & GPU have access to the same large pool of
               | memory.
        
               | redlock wrote:
               | The issue here is that, even with a lot of VRAM, you may
               | be able to run the model, but with a large context, it
               | will still be too slow. (For example, running LLaMA 70B
               | with a 30k+ context prompt takes minutes to process.)
        
               | aurareturn wrote:
               | The usage of existing but cheaper nvidia chips to make
               | models of similar quality is the main takeaway.
               | 
               | So why not buy a more expensive Nvidia chip to run a
               | better model?
        
               | yifanl wrote:
               | Is there still evidence that more compute = better model?
        
               | aurareturn wrote:
               | Yes. Plenty of evidence.
               | 
               | The DeepSeek R1 model people are freaking out about, runs
               | better with more compute because it's a chain of thoughts
               | model.
        
               | Vegenoid wrote:
               | Because if you don't have infinite money, considering
               | whether to buy a thing is about the ratio of price to
               | performance, not just performance. If you can get enough
               | performance for your needs out of a cheaper chip, you buy
               | the cheaper chip.
        
         | hodder wrote:
         | Jevons paradox isn't some iron law like gravity.
        
           | trgn wrote:
           | feels like it is in tech. any gains in hardware or algorithm
           | advance, immediately get consumed by increase in data
           | retention and software bloat.
        
         | gamblor956 wrote:
         | Important to note: the $5 million alleged cost is just the cpu
         | compute cost for the final version of the model; it's not the
         | cumulative cost of the research to date.
         | 
         | The analogous costs would be what OpenAI spent to go from GPT 4
         | to GPT 4o (i.e., to develop the reasoning model from the most
         | up-to-date LLM model). $5 million is still less than what
         | OpenAI spent but it's not a magnitude lower. (OpenAI spent up
         | to $100 million on GPT4 but a fraction of that to get GPT 4o.
         | Will update comment if I can find numbers for 4o before edit
         | window closes)
        
           | fspeech wrote:
           | It doesn't make sense to compare individual models. A better
           | way is to look at total compute consumed, normalized by the
           | output. In the end what counts is the cost of providing
           | tokens.
        
         | fspeech wrote:
         | But why would the customers accept the high prices and high
         | gross margin of Nvidia if they no longer fear missing out with
         | insufficient hardware?
        
         | tedunangst wrote:
         | Selling 100 chips for $1 profit is less profitable than selling
         | 20 chips for $10 profit.
        
       | mackid wrote:
       | Microsoft did a bunch of research into low-bit weights for
       | models. I guess OAI didn't look at this work.
       | 
       | https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7...
        
       | highfrequency wrote:
       | The R1 paper (https://arxiv.org/pdf/2501.12948) emphasizes their
       | success with reinforcement learning without requiring any
       | supervised data (unlike RLHF for example). They note that this
       | works well for math and programming questions with verifiable
       | answers.
       | 
       | What's totally unclear is what data they used for this
       | reinforcement learning step. How many math problems of the right
       | difficulty with well-defined labeled answers are available on the
       | internet? (I see about 1,000 historical AIME questions, maybe
       | another factor of 10 from other similar contests). Similarly,
       | they mention LeetCode - it looks like there are around 3000
       | LeetCode questions online. Curious what others think - maybe the
       | reinforcement learning step requires far less data than I would
       | guess?
        
       | mrinterweb wrote:
       | The vast majority of Nvidia's current value is tied to their
       | dominance in AI hardware. That value could be threatened if LLMs
       | could be trained and or ran efficiently using a CPU or a quantum
       | chip. I don't understand enough about the capabilities of quantum
       | computing to know if running or training a LLM would be possible
       | using a quantum chip, but if it becomes possible, NVDA stock is
       | unlikely to fair well (unless they are making the new chip).
        
       | tempeler wrote:
       | First of all, I don't invest in Nvidia and like Olygopols. But it
       | is too early to talk about Nvidia's future. People are just
       | betting and wishing about Nvidia's future. No one knows people's
       | what people will do in the future. what they will think? It's
       | just guessing and betting. Their real competitor is not Deepseek.
       | Did AMD or others release something new and compete with Nvidia's
       | products? If NVDIA will be the market leader, this means they
       | will lead the price. Being Olygopol is something like that. They
       | don't need to compete for the price of competitors.
        
       | wtcactus wrote:
       | To me, this seems like we are back again in 1953 and a company
       | just announced they are now capable of building one of IBM's 5
       | computers for 10% of the price.
       | 
       | I really don't understand the rationale of "We can now train GPT
       | 4o for 10% the price, so that will bring demand for GPUs down.".
       | If I can train GPT 4o for 10% the price, and I have a budget of
       | 1B USD, that means I'm now going to use the same budget and train
       | my model for 10x as long (or 10x bigger).
       | 
       | At the same time, a lot of small players that couldn't properly
       | train a model before, because the starting point was simply out
       | of their reach, will now be able to purchase equipment that's
       | capable of something of note, and they will buy even more GPUs.
       | 
       | P.S. Yes, I know that the original quote "I think there is a
       | world market for maybe five computers", was taken out of context.
       | 
       | P.S.S. In this rationale, I'm also operating under the assumption
       | that Deepseek numbers are real. Which, given the track record of
       | Chinese companies, is probably not true.
        
       | ozten wrote:
       | NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng),
       | who has previously purchased at least 10,000 A100 shovels... has
       | a "side project" where they figured out how to dig really well
       | with a shovel and shared their secrets.
       | 
       | The gold rush, wether real or a bubble is still there! NVIDA will
       | still sell every shovel they can manufacture, as soon as it is
       | available in inventory.
       | 
       | Fortune 100 companies will still want the biggest toolshed to
       | invent the next paradigm or to be the first to get to AGI.
        
       | 0n0n0m0uz wrote:
       | Please tell me if I am wrong. I know very little details and
       | heard a few headlines and my hasty conclusion is that this
       | development clearly shows the exponential nature of AI
       | development in terms of how people are able to piggyback from the
       | resources, time and money of the previous iteration. They used
       | the output from chatgpt as the input to their model. Is this
       | true, more or less accurate or off base?
        
       ___________________________________________________________________
       (page generated 2025-01-27 23:01 UTC)