[HN Gopher] Deepseek: The quiet giant leading China's AI race
       ___________________________________________________________________
        
       Deepseek: The quiet giant leading China's AI race
        
       Author : sunny-beast
       Score  : 263 points
       Date   : 2024-12-31 09:28 UTC (13 hours ago)
        
 (HTM) web link (www.chinatalk.media)
 (TXT) w3m dump (www.chinatalk.media)
        
       | yellow_lead wrote:
       | > Liang Wenfeng: We believe that as the economy develops, China
       | should gradually become a contributor instead of freeriding. In
       | the past 30+ years of the IT wave, we basically didn't
       | participate in real technological innovation. We're used to
       | Moore's Law falling out of the sky, lying at home waiting 18
       | months for better hardware and software to emerge. That's how the
       | Scaling Law is being treated.
        
       | mentalgear wrote:
       | Impressive to think about how DeepSeek achieved: ~ Parity with o1
       | and Claude with > 10x less resources. Better algorithms and
       | approaches are what's needed for the next step of ML.
        
         | NitpickLawyer wrote:
         | While impressive, the deepseek models aren't really "on par"
         | with either oAI or Anthropic offerings, right now. The models
         | seem to be a bit overfitted in the post-training step. They are
         | very "stubborn" models, and usually handle tasks well _if_ they
         | can handle them, but steering them is quite difficult. As a
         | result, they score very well on various benchmarks, but often
         | times perform slightly worse in real-life scenarios.
        
           | espadrine wrote:
           | The blind test at lmarena.ai does give it a higher Elo than
           | GPT-4o (API), Claude, and Gemini 1.5 Pro. It seems that
           | people do enter real-life scenarios in the arena.
        
           | victorbjorklund wrote:
           | I found deepseek very useful at coding with Aider. On par
           | with claude.
        
           | rahimnathwani wrote:
           | They are very "stubborn" models
           | 
           | Have you found this to be the case even when using the
           | recommended temperature settings (ranging from 0 for math, to
           | 1.5 for creative tasks)?
        
             | NitpickLawyer wrote:
             | I use 0.05 for math, just did a 5k problem set, trying to
             | fine-tune a smaller model with the outputs. It has some
             | very interesting training, borrowed from r1 per the tech
             | report, where it does the o1/qwq "thinking steps", but a
             | bit shorter. It solves ~80% of the problems in 4k context,
             | while qwq would go on for 8k-16k. It's very good at what it
             | does.
             | 
             | But as soon as I need it to do something _other_ than solve
             | a problem - say rewrite the problem in simpler terms, or
             | given a problem + solution provide hints, or rewrite the
             | solution with these  <tags>, etc. it kinda stops working.
             | Often times it still goes ahead and solves the problem.
             | That's why I'm saying it's stubborn. If a task looks like a
             | task that it can handle _very_ well, it 's really hard to
             | make it perform that other, similar but not quite the same
             | task.
             | 
             | In a similar vein - https://github.com/cpldcpu/MisguidedAtt
             | ention/tree/main/eval...
        
           | orbital-decay wrote:
           | DeepSeek v3 feels very much like Sonnet 3.5 (v1) in
           | particular, minus the character. Performs more or less
           | similarly, "feels" overfitted just about the same, and
           | repeats itself in multiturn chats even worse. I hope they
           | address it in v3.5, v4, or whatever comes next.
        
         | amelius wrote:
         | How are these models benchmarked?
        
         | amelius wrote:
         | Makes you wonder if OpenAI has a moat.
        
         | llm_trw wrote:
         | We're seeing a split in models between deep and wide.
         | 
         | Wide models sound like they know more than deep models but fail
         | at reasoning with more than a few steps and are cheap to train
         | and serve. Deep models know a lot less but can reason much
         | better.
         | 
         | An example I saw all moe models fail at a few months back was A
         | and not B being implicit in the grounding text, all of them
         | would turn it into A and B a substantial proportion of the
         | time. Monolithic models on the other hand had no trouble with
         | giving the right answer.
         | 
         | The Chinese AI companies can only do wide Ai because of
         | restrictions on hardware exports. In the short term this will
         | make more people think llms are stochastic parrots because they
         | can't get simple thinks right.
        
         | caycep wrote:
         | what's the engineering situation at OpenAI since the whole
         | "firing Sam Altman" spectacle? Has there been significant brain
         | drain that affects something like o1 etc?
        
       | lomkju wrote:
       | I feel the GPU restrictions created an environment for Chinese
       | Devs to be more innovative and do more with less.
       | 
       | Kudos to the deepseek team!
        
         | wodderam wrote:
         | Kai-Fu Lee describes the culture so well in AI Superpowers. The
         | roots are well before GPU restriction. Absolute cut throat
         | competition.
         | 
         | Imagine Sam Altman throwing a chair out a window in a meeting
         | lol.
         | 
         | The message of AI Superpowers is that China will lag the US at
         | first but once things stabilize this will happen because China
         | has a lot more engineers and a lot more data.
         | 
         | Anyone who hasn't read AI Superpowers should really make it a
         | point to read it in 2025. It is an incredible book.
        
           | Etheryte wrote:
           | I don't know, I've been hearing the story that China is about
           | to upend the US as the leading global superpower ever since I
           | was a kid. There's always a new vogue and novel twist put on
           | the rationale and how it's gonna happen, but so far it's like
           | fusion, always a few years away.
        
             | elashri wrote:
             | I think you mean nuclear fusion.
        
               | Etheryte wrote:
               | Of course, thanks.
        
             | tossandthrow wrote:
             | What makes you think it has not happened? There has not
             | been an event to establish who the current super power is
             | in new time.
        
               | futureshock wrote:
               | I think you have the right idea. China has yet to truly
               | flex its muscle. They prefer to quietly grow stronger.
               | Their response to Covid with the largely successful zero
               | covid strategy gives a clue about the power of its
               | government. Silly, you can't become the champion without
               | stepping into the ring.
        
               | daveguy wrote:
               | Largely successful? I wouldn't confuse Chinese propaganda
               | to the outside world with success.
        
               | pphysch wrote:
               | The Chinese government treated the pandemic as a
               | bioweapon attack by a foreign adversary engaged in a
               | broader hybrid war, and it did so effectively.
        
               | daveguy wrote:
               | That's batshit insane.
        
               | opwieurposiu wrote:
               | China creates a superbug via GOF research. Accidently
               | releases it from the lab. Shuts down its own economy.
               | Puts the majority of it's citizens on house arrest, and
               | that is "largely successful"? Please send me the
               | AliExpress link to whatever it is you are smoking, it
               | must be some good shit.
               | 
               | I think the real lesson here is that if you enough
               | government power, there is no need to be competent. The
               | feedback loop is destroyed so you can just do whatever
               | random stupid thing you want until your country collapses
               | like the USSR.
        
               | vkou wrote:
               | > China creates a superbug via GOF research
               | 
               | In 2024, this isn't fact, it's just baseless conspiracy.
               | 
               | All evidence has ended up pointing to bush meat
               | contamination.
        
               | int_19h wrote:
               | In 2024, zoonotic origin is considered more probable, but
               | it is by no means a "baseless conspiracy" to believe
               | otherwise.
        
               | hollerith wrote:
               | The professional military leaders in China, Japan, SK,
               | Taiwan, Singapore and Philippines acts as if the US is
               | the current superpower.
        
             | jitl wrote:
             | If you want to look at an objective numeric metric for
             | this, why not foreign military bases? US has 128+, China
             | had ~2. To project global military power China will need
             | similar order of magnitude presence. I use that number as a
             | check against sometimes breathless and sensational
             | journalism about the topic.
             | 
             | It's harder for me to come up with a simpler metric for
             | "Belt and Road" / IMF style control-through-capital.
             | 
             | But, I think it will happen. After visiting China and
             | seeing how much consistent progress both in infrastructure
             | from the government and in daily life from the economy, my
             | impression is US government makes 2 steps forward 1 step
             | back in the same time it takes China to take 100 steps
             | forward.
        
               | bpodgursky wrote:
               | The US does not have 128 foreign military bases. It has
               | ~50 nominal bases [1]. Most of them are just the US
               | sharing an airfield with a friendly country; it's a
               | refueling stop that would not be hard for China to
               | replicate.
               | 
               | The US does have several large overseas bases but 90% of
               | this list is are indefensible logistics hubs and not a
               | meaningful projection of force.
               | 
               | [1] https://en.wikipedia.org/wiki/List_of_American_milita
               | ry_inst...
        
               | daveguy wrote:
               | Calling any US military base an "indefensible logistics
               | hub" reveals that the extent of your research was
               | probably just that Wikipedia listicle.
        
               | bpodgursky wrote:
               | Believe whatever you want.
               | 
               | Most of these bases are co-located with NATO or other
               | allies for good reason, the US doesn't have to do
               | everything itself wrt air defense, locating an airlift
               | wing with a fighter wing.
               | 
               | But then it's a lower bar than people imagine, for China
               | to buy similar friendship.
        
               | daveguy wrote:
               | You literally called the _logistics hubs of the US
               | military_ -- the bases that move more of the most
               | powerful weapons and military personnel in the world --
               | indefensible. So you either don 't know what indefensible
               | means, or you are a piss poor propagandist.
        
               | sangnoir wrote:
               | Yes or No: Can the US singlehandedly defend all those
               | bases without the help of host country? If the answer is
               | "Yes", then China has a long way to go to achieve that
               | capability. If answer is "No", then the bar is much
               | lower, and gp's point is that China can "buy" similar
               | arrangements without too much effort. More directly, is
               | the bottleneck on funding, personnel/matiriel, or
               | diplomacy?
               | 
               | China is building up a lot of soft power with
               | infrastructure projects all over the world - most of them
               | are aimed at improving trade - ports, rail lines and the
               | like. In the next decade or 2, they can reasonably make
               | requests to place a few PLA/PLAA personnel and equipment
               | on bases in strategic places, bases they may have been
               | built using Chinese money.
        
               | daveguy wrote:
               | Fair enough. But could any country attack all of those
               | bases at once? As long as the US doesn't do anything as
               | colossally stupid as leaving NATO it shouldn't be a
               | problem with support. Ultimately NATO participation
               | resides with Congress which is beholden to the people.
               | NATO is overwhelmingly approved of by the US people -- it
               | is a _defense_ pact.
               | 
               | If a country or coalition decided to attack all of those
               | bases at once it would give the US the high ground to
               | respond. Nazis tried a blitzkrieg and that didn't turn
               | out well. As someone squarely against the bullshit of
               | Trump, I would not be happy if he was in power at the
               | time. But I do not doubt for a second that the US
               | population in general would respond as readily as they
               | did after 9/11 (but hopefully not as readily as in Iraq).
               | 
               | We just saw how the "dipshit in power" aspect works with
               | Netanyahu in Gaza -- a disproportionate and tragic
               | response. The only caveat is Trump is an extremely stupid
               | dipshit, so I genuinely hope it doesn't turn out that way
               | and everyone keeps their powder dry until Trump is out of
               | office.
               | 
               | China's buildup of soft power is good for them, and I
               | commend them for it. Fortunately, I believe soft power is
               | a defensive power at its core, and I don't think it
               | translates to offensive power. To confuse the two would
               | be a mistake.
               | 
               | Thank you for the opportunity to get a lot off my chest
               | this New Year's Eve. I hope it wasn't too offensive,
               | because I believe you responded intelligently and in good
               | faith, and thank you for that.
        
               | bpodgursky wrote:
               | Yes, it is very easy for a logistic hub that only has an
               | airlift wing to be indefensible in a war against a peer
               | adversary if for example there are no THAAD or Patriot
               | batteries there. It's hub, not a hardened facility.
               | 
               | Many of the US military bases are communication centers
               | or barracks on training bases. They serve important roles
               | but are not "defensible" in many contexts.
               | 
               | Who am I even propagandizing for in this context?
        
               | fakedang wrote:
               | I was of the same viewpoint as you - just look at the
               | militaries!
               | 
               | Except in today's world, being a military power is
               | increasingly less relevant after a certain point, while
               | economic supremacy is increasingly gaining prominence.
               | While the West is content with self-platitudes for their
               | "democracy", China has been building strong relationships
               | with a number of countries looking to implement the
               | "China-model", a capitalist but largely regressive nation
               | that relies on surveillance and stringent media control.
               | China is already licensing out their technology to a
               | number of interested countries, some of which include
               | Western countries looking to emulate Chinese autocracy
               | themselves. On the other hand, countries are looking at
               | the incoming US govt with pretty much strong uncertainty
               | as to what their relationship with America will be like.
               | 
               | Not to mention, as automated warfare becomes increasingly
               | more relevant, guess where these countries are buying
               | their drones from? Hint hint, it's not the US with their
               | overpriced toys.
        
               | kayewiggin wrote:
               | > China has been building strong relationships with a
               | number of countries
               | 
               | Number of irrelevant countries. US's allies are Europe,
               | Japan, South Korea, Taiwan, Canada, Mexico, Australia,
               | etc. 80% of the world's wealth. and 95% of the world's
               | top technologies.
               | 
               | > guess where these countries are buying their drones
               | from
               | 
               | Soon, not China. China Is Cutting Off Drone Supplies
               | Critical to Ukraine War Effort [1]. China is reportedly
               | making drones for Russia instead, according to multiple
               | intelligence officials.
               | 
               | [1]
               | https://www.bloomberg.com/news/articles/2024-12-09/china-
               | is-...
        
             | solaarphunk wrote:
             | Economic superpower perhaps - just take a look at their
             | relative GDP over time.
        
               | kayewiggin wrote:
               | China has 900M people making less than $400/month, and
               | 600M people making less than $100/month. relative GDP is
               | a joke, go to China to see what most of them are eating
               | (hint: it's unsafe food filled with chemicals, or its
               | mostly carbs) and where these people are living (hint:
               | it's shoddy constructed condos or run down farm houses)
        
               | bugglebeetle wrote:
               | It's unclear to me why what you're describing is specific
               | to China and not also what Americans euphemistically
               | refer to as "fly over states."
        
               | kamarg wrote:
               | Not sure why you have something against the flyover
               | states. I'm sure there's more shoddily constructed condos
               | in Florida/California/New York per capita than there is
               | in the Midwest. Same goes for cheap high calorie food.
               | 
               | Of course, the same can probably be said about the large
               | population centers in China too. More people concentrated
               | in one area tends to mean more poverty in that area and
               | all the things that come with it.
        
               | bugglebeetle wrote:
               | I don't have anything against them. I was born and raised
               | in one. I just find it ironic that someone would fail to
               | see this parallel.
        
               | Airodonack wrote:
               | The parallel is that there are rich and poor? It is
               | unscrupulous to argue in imprecise, binary terms while
               | ignoring the difference in scale. People in flyover
               | states are not making only $400/mo or even occupying that
               | same societal equivalent of China in America.
        
               | eunos wrote:
               | > China has 900M people making less than $400/month
               | 
               | Most of these folks are illiterate oldies that would pass
               | away in a few years anyway.
        
             | jurli wrote:
             | It's literally happening lol. When you were a kid China was
             | making shoes and their GDP is 10% of the US. Now they're
             | making drones / evs / high end electornics and it's 80%.
             | This is why people's perception is so unreliable because
             | it's impossible to notice things when they happen over a
             | lifetime
        
               | talldayo wrote:
               | When I was a kid, China was a lot better integrated with
               | the international community. Right now their
               | relationships are far and few between, rarely featuring
               | first-world nations.
               | 
               | If Russia couldn't beat NATO in a pitched fight against
               | the rest of the world, neither can China.
        
               | joshbaptiste wrote:
               | As a sovereign nation rises in power you'll notice how it
               | slowly starts losing favor from USA
        
               | caycep wrote:
               | The question in a rational mind is, why would it even
               | bother? US/China partnership is the most economically
               | successful in world history, even more so than US/UK or
               | AUKUS. But the downside of CCP government structure is
               | that paranoia at the top ranks has a good probability of
               | overruling rationality.
               | 
               | Albeit US cannot speak as US-centric
               | paranoia/"exceptionalism" may do the same thing...and the
               | electorate voted to self destruct the government despite
               | US economy being the strongest in decades.
        
               | bugglebeetle wrote:
               | > US economy being the strongest in decades
               | 
               | The vast majority of its people do not share in that
               | success and have seen a declining standard of life
               | relative to prior generations whereas in China, the
               | opposite is quite demonstrably true, despite increasingly
               | similar concentrations of wealth and political power.
        
               | kayewiggin wrote:
               | > making drones / evs / high end electornics
               | 
               | China does have a current advantage on lithium battery
               | and rare earth materials - dumb technologies that US and
               | allies can replicate fairly quickly, less than a year.
               | EUV and 3nm and below on the other hand, will take
               | decades, since it involves a number of different and deep
               | technologies controlled by dozens of companies. China has
               | thrown $150B on it since 2014, and has only come up with
               | low yield/unprofitable 7nm via existing DUV machines.
               | 
               | > 80% GDP
               | 
               | China's demographics will more than HALF to 500M by 2100,
               | if not earlier, while US grows to close to 400M by then.
               | Someone actually theorizes that China's population is
               | already only 800M right now
               | https://www.youtube.com/watch?v=fR5F_8dSjOw
               | 
               | Also, a lot of that GDP is debatable in 2024, when real
               | estate prices have dropped by more than 50% in tier 2 and
               | below cities, and deflation has raged on.
        
               | FooBarWidget wrote:
               | So why aren't US and allies demonstably replicating EVs
               | (and other kinds of green technology) quickly? Tesla is
               | still pretty much the only serious player. Why are CEOs
               | of major western carmakers painting a very different
               | picture than what you describe here? Where are the
               | serious EU/US battery makers that are globally
               | competitive? It looks to me like the EU has chosen the
               | worst of all options: put up tarriff barriers while
               | _also_ not having serious domestic EV makers, and _also_
               | not stimulating domestic EV development.
        
               | dboreham wrote:
               | Western consumers don't want to buy EVs (mostly).
        
               | FooBarWidget wrote:
               | Yeah I mean, with the sad state of the Dutch electric
               | grid, the poor coverage of chargers, and the disappearing
               | consumer subsidies, I wouldn't want either. So why aren't
               | governments also building the infrastructure they need to
               | help stimulate demand for EVs? Not taking global climate
               | disaster serious enough?
               | 
               | Building EVs and supporting infrastructure is a lot more
               | complicated than just having a bunch of blueprints.
        
               | slt2021 wrote:
               | They would buy Chinese EVs since they are much cheaper
               | than ICE
        
               | eunos wrote:
               | > dumb technologies that US and allies can replicate
               | fairly quickly
               | 
               | Laugh in Northvolt
               | 
               | > $150B on it since 2014, and has only come up with low
               | yield/unprofitable 7nm via existing DUV machines
               | 
               | Considering that there are less than 5 countries on Earth
               | that can fab 7nm semiconductors, that aint bad.
        
               | kayewiggin wrote:
               | RIP Northvolt from Sweden and The U.S. made a
               | breakthrough battery discovery -- then gave the
               | technology to China
               | https://www.npr.org/2022/08/03/1114964240/new-battery-
               | techno.... However:
               | 
               | - Battery Startup Opens Chicago Plant as US Seeks to Curb
               | Reliance on China https://www.nanograf.com/media/battery-
               | startup-opens-chicago...
               | 
               | - Our own YC:
               | https://www.ycombinator.com/companies/industry/energy
               | 
               | - China's startup scene is dead as investors pull
               | out--'Today, we are like lepers'
               | https://finance.yahoo.com/news/china-startup-scene-dead-
               | inve...
        
               | roenxi wrote:
               | > when real estate prices have dropped by more than 50%
               | in tier 2 and below cities, and deflation has raged on.
               | 
               | Can other economies copy that part? I know a bunch of
               | people who'd like to be able to afford more houses & more
               | groceries at the same time. I'd like that, I can't
               | realistically afford a house in the city I live in
               | without a 50% price drop.
               | 
               | I'm sure China has a lot of problems, but key goods
               | getting cheaper is not one of them. What I'm guessing you
               | meant to say is that retirees were led to put too much of
               | their savings into the housing market and are discovering
               | there is a glut. Which is tragic for them. But prices
               | dropping is a good thing; the unachievable ideal is a
               | utopia where everything is free, ie, 100% deflation.
        
               | protomolecule wrote:
               | >a lot of that GDP is debatable in 2024
               | 
               | While the share of services in the US GDP is more than
               | 3/4. What will you do with all these expensive NY lawyers
               | when push comes to shove? Sue China's drones?
        
               | lolinder wrote:
               | And now they're at the point where the population pyramid
               | is collapsing. It's hard to make any predictions about
               | the future when they got here riding a baby boom and now
               | their ratio of elderly to working age is about to go
               | through the roof.
               | 
               | https://www.populationpyramid.net/china/2023/
        
           | lopatin wrote:
           | I never knew Sam Altman threw Bret Taylor out of a window.
           | That makes the OpenAI board drama more understandable.
        
           | modeless wrote:
           | Didn't Ballmer do that? I'm not sure it indicates success.
        
           | daedrdev wrote:
           | The thing is Bejing undercuts this completly by allowing
           | local governments to perform rampant shakedown of investors
           | and ceos through disappearances for bogus charges, even in
           | other provinces.
        
         | zitterbewegung wrote:
         | If you actually believe that NVIDA gpus are import restricted
         | there are many stories that this is being sidestepped.
        
           | manquer wrote:
           | Not to the volume needed to compete with the training
           | infrastructure setups of Anthropic or OpenAI or other leading
           | players
           | 
           | No ban is perfect, there is always some loopholes or illegal
           | exports this is to be expected, but if it prevents large
           | scale transaction then it it is achieved its goal.
           | 
           | The question is rather do they we need a lot of gpus to train
           | or training with older gen gpus is not competitive is a
           | different problem.
        
         | modeless wrote:
         | Software expands to fill the available resources. If you want
         | more efficient software, build it on less powerful hardware. AI
         | training runs are no exception!
        
           | mkagenius wrote:
           | I just shoved the whole .webvtt file in the header of a audio
           | Response from the server so that I don't have to implement
           | another API just for subtitles [1][2]
           | 
           | 1. While building https://gitpodcast.com
           | 
           | 2. Code snip: https://github.com/BandarLabs/gitpodcast/blob/m
           | ain/backend/a...
        
         | jurli wrote:
         | Makes sense. When you restrict hardware, you have to spend all
         | your energy on optimizing software that everyone else ignores
         | 
         | Imagine if they were forced to use IE7 as the only browser. The
         | frontend frameworks would be blazing fast and we would never
         | have bloatware like React or Angular or npm
        
       | kjellsbells wrote:
       | If you tell the world that eggs are awesome while denying other
       | countries access to eggs, they discover ways to use less eggs and
       | eventually realize they don't need eggs at all. Then you are
       | stuck making Dennys breakfasts while the rest of the world is on
       | to fine dining.
       | 
       | China has incredibly strong incentives to do the pure research
       | needed to break the current GPU-or-else lock. I hope, for
       | science' sake, we dont end up gunning down each others
       | mathematicians on the streets of Vienna like certain nuclear
       | physicists seem to go.
        
         | djaouen wrote:
         | > If you tell the world that eggs are awesome while denying
         | other countries access to eggs, they discover ways to use less
         | eggs
         | 
         | You are confusing cause with effect. What actually happened:
         | Nixon opened up US trade with China and, ever since, China has
         | been stealing trade secrets to undermine and overthrow American
         | interests. Limiting their access to eggs was literally us
         | trying to prevent them from stealing all our shit!
        
           | quantum_state wrote:
           | It seems to me that we forgot about the "stealing" of the
           | "shit" from Europe and other places in the early days ...
        
             | djaouen wrote:
             | Protip: Some of us were not involved in the desecration
             | caused by the East India Tea Company. Just because we look
             | British means we should suffer like them, too?
        
               | carom wrote:
               | They are referring to the fact that the US ignored
               | European IP in its early days and relating that to what
               | China is doing to the US now.
        
               | djaouen wrote:
               | I am just saying, this AI controversy has roots from
               | before the creation of OpenAI. If OpenAI used European
               | IP, I would _think_ that would be a good thing for
               | Europe, assuming AI is the future?
               | 
               | Sorry for talking Ancient History lol
        
               | skywhopper wrote:
               | What we call AI is not "the future". But I'm not sure how
               | OpenAI stealing European IP would help Europe, even if it
               | is.
        
               | cced wrote:
               | What's also funny is that the promoters of the "China is
               | stealing all of our IP in exchange for their labor" folks
               | never mention why corporations don't just pull out?
               | 
               | Are these IP thefts or technology transfers? If
               | corporations are having their IP stolen, why don't they
               | just leave?
               | 
               | These narratives never explain or mention this. Idk why
               | people still latch onto them, they are completely
               | uninteresting "China is stealing all our IP and there's
               | nothing we can do about it except for continuing to allow
               | our IP to be stolen" is an IQ test and trope.
               | 
               | Does "theft of IP" outweigh, or not, "access to very
               | cheap labor (read: jobs)" ?
               | 
               | We need to stop simping for corporations and start
               | thinking critically about these things.
        
               | cma wrote:
               | https://en.wikipedia.org/wiki/Samuel_Slater
               | 
               | https://en.wikipedia.org/wiki/Bad_Samaritans_(book)
        
               | tchalla wrote:
               | What does your looks and involvement have to do with the
               | parent comment's core point?
        
               | djaouen wrote:
               | Nothing. I'm crazy, remember???
        
         | qwertox wrote:
         | It remains to be seen how stable a totalitarian government can
         | be. China has the benefit of having full control over its
         | people and therefore gets to decide what is important and what
         | not, and currently people are ok with handing that control over
         | to the government. But it's also a very fragile state, which
         | can only be retained through full repression.
        
           | aaomidi wrote:
           | [flagged]
        
             | borski wrote:
             | We've seen _a_ CEO shot, and the majority of people
             | definitely don't cheer it on; just a very vocal minority.
             | Moreover, he may yet get the death penalty; I'm not sure
             | I'd call that any more "fragile" than any other shooting.
        
               | dartos wrote:
               | Not to mention that that CEO was in health insurance. A
               | very emotionally charged industry where someone's life or
               | death is directly affected by CEO decisions.
        
               | paganel wrote:
               | [flagged]
        
               | borski wrote:
               | That is simply not true. Moreover, this forum is not
               | "targeted towards the well-off."
               | 
               | Out in the real world, Luigi is a criminal who shot a man
               | in cold blood and sparked a conversation. That's about
               | it.
               | 
               | Hardly a hero. And the majority of the populace does not
               | agree with you.
        
               | dang wrote:
               | Please don't post spurious generalizations about this
               | community. What you said here is completely made up.
        
             | skywhopper wrote:
             | Did I miss some other CEOs being gunned down? I only know
             | of the one.
             | 
             | I'm more concerned about the folks cheering on vigilantes
             | and cops who murder unarmed non-CEOs who have not
             | perpetrated actual harm on thousands of people.
        
             | suraci wrote:
             | There's a saying in Chinese liberals community: We cannot
             | help but admire the American system's ability to self-
             | correct.
             | 
             | I've seen it twice these years, one was after JoeBiden won
             | election, said the system choose Biden to fix Trump mess,
             | one was after DTrump won, said the system correct the Biden
             | error.
             | 
             | So China is, of course, more fragile.
        
               | elashri wrote:
               | All what I can see from this comment logic is that the US
               | have a cycle of mess that get rotated not a demonstration
               | of self correction mechanisms.
               | 
               | Not to say that I believe that the US (or any other
               | government or country) unable to have self correction
               | ability or mechanisms. I am just pointing that your logic
               | is flawed.
        
               | suraci wrote:
               | Glad you pointed out the logical fallacy.
               | 
               | In that context, "less fragile" are vague words without a
               | clear subject.
               | 
               | I posted the saying to be satirical, but in depth, the
               | two-party system is more stable than any other political
               | systems: To people, it may seem like a cycle of mess, but
               | the system itself is very stable, it avoids the regime
               | change by normalizing it.
        
               | elashri wrote:
               | > the two-party system is more stable than any other
               | political systems: To people, it may seem like a cycle of
               | mess, but the system itself is very stable, it avoids the
               | regime change by normalizing it.
               | 
               | How is that makes the two-party system more stable than
               | any other political systems. all what you say normalizing
               | regime change does apply on all democratic systems. So
               | you don't have the choices (both party does actually suck
               | on many mutual aspects) but also don't gain much
               | stability than other democratic system. In parliament
               | system there is usually more acceptance and normalization
               | of changes than the two-party system when you get stuck
               | between worse and the worst most of the time.
        
             | huijzer wrote:
             | I guess it's time to bring back an old joke from Ronald
             | Reagan [1]:
             | 
             | An American and a Russian are arguing about their two
             | countries. The American says look: "In my country, I can
             | walk into the Oval Office, pound the president's desk, and
             | say 'Mr. President, I don't like the way you're running our
             | country!'".
             | 
             | And the Russian says "I can do that." The American says
             | "You can?" The Russian says "Yes, I can walk right into the
             | Kremlin, go to the General Secretary's office, slam my fist
             | on his desk and say "I don't like the way President Reagan
             | is running his country."
             | 
             | [1]: https://youtu.be/9qh-1_tXeuQ
        
           | paganel wrote:
           | > It remains to be seen how stable a totalitarian government
           | can be
           | 
           | Much more stable than the government that has Trump, Musk and
           | Vivek calling the shots, that's for sure.
        
             | qwertox wrote:
             | If these three died, it might be a loss to the country.
             | None of them is as important to the country as Xi is to
             | China. The resilience of the CCP, in light of its
             | dependence on Xi, can only be upheld through the absolute
             | suppression of freedom. But the average daily life is
             | certainly much more enjoyable than in NK.
        
               | Mistletoe wrote:
               | I am absolutely certain that those three dying would be a
               | gain of function for the world.
        
               | tokioyoyo wrote:
               | I'm curious, what do you think would happen if Xi stepped
               | down tomorrow for whatever reason? You think everything
               | will just fall apart?
        
           | chvid wrote:
           | It is probably more stable now than at any time since the
           | communist takeover.
        
             | littlestymaar wrote:
             | Dubious claim, unless in the era between the cultural
             | revolution and Mao's death China under the communists has
             | always been made very stable due to the collegiality at the
             | top, now the collegiality is gone and so will the stability
             | as soon as Xi is no more. That's the problem when one
             | individual grabs all power.
        
           | freefaler wrote:
           | This argument might've worked in Mao's time. Now with a
           | capitalist economy under the party the resource allocation
           | while still skewed is much more efficient than during Mao or
           | USSR central planned economy. (And EU wide policies sometimes
           | aren't that far off from USSR stupidity).
           | 
           | Loss of feedback in authoritarian regimes is a problem, but
           | in the short time it might not be if Xi doesn't make really
           | stupid moves.
           | 
           | It pains me to see it, but they show more long-term thinking
           | that many of the Western governments who aren't interested
           | what will happen after their time in the office.
           | 
           | While the people have plenty use of force can be minimal.
        
           | andy_ppp wrote:
           | Do the governments elected recently in the West look stable
           | to you?
        
             | janice1999 wrote:
             | They do. Ireland just had an election I voted in and is
             | forming a new government with multiple parties. The UK,
             | after years of TV worthy Tory party drama, had its most
             | transformative election in over a decade. I see active and
             | engaged multi-party democracies with peaceful transitions
             | of power and long established and respected laws for
             | calling elections, no confidence votes (e.g. France, yes it
             | happens) and so on.
        
       | nsoonhui wrote:
       | I find that the gushing around deepseek is fascinating to watch.
       | 
       | To me there are a few structural and fundamental reasons why
       | deepseek can never outperform other models by a wide margin. On
       | par maybe--as we reach the diminishing returns with our
       | investment in the models, but not win by a wide margin.
       | 
       | 1. The US trade war with china which will place deepseek compute
       | availability at disadvantages, eventually, if we ever get to
       | that.
       | 
       | 2. China censorship which limits the deepseek data ingestion and
       | output, to some degree.
       | 
       | 3. Most importantly, deepseek is open source, which means that
       | the other models are free to copy whatever secret source it has,
       | eg: Whatever architecture that purportedly use less compute can
       | easily be copied.
       | 
       | I've been using Gemini, chatgpt, deepseek and Claudie on regular
       | basis. Deepseek is neither better or worse than others. But this
       | says more about my own limited usage of LLM rather than the
       | usefulness of the models.
       | 
       | I want to know exactly what makes everyone thinks that deepseek
       | totally owns the LLM space? Do I miss anything?
       | 
       | PS: I am a Malaysian Chinese, so I am certainly not "a westerner
       | who is jealous and fearful of the rise of China"
        
         | logicchains wrote:
         | >I want to know exactly what makes everyone thinks that
         | deepseek totally owns the LLM space?
         | 
         | It achieved competitive performance to the competition at
         | literally 10x less cost of production (training). That's an
         | incredible achievement in any industry, especially given they
         | have such a small team relative to competitors. Their API is
         | 20-50x cheaper than the competitors, and not because they're
         | burning cash by charging less than costs, but rather because
         | their architecture is just that much more efficient.
         | 
         | They already achieved the above in spite of sanctions limiting
         | their availability to top-tier GPUs, and the gap between
         | Chinese domestic GPUs and NVidia is getting smaller and
         | smaller, so in future the GPU disadvantage will be less and
         | less.
        
           | nsoonhui wrote:
           | But like I said, deepseek is open source so why can't the
           | competitors copy whatever source that makes the cost of
           | production 10x cheaper ?
        
             | mike_hearn wrote:
             | You have to distinguish between the current model and
             | DeepSeek the company. DeepSeek the company can do an OpenAI
             | and stop releasing their weights any time they like. The
             | knowledge and skill is retained.
             | 
             | I really wonder how long the current era of giving models
             | away for free can last. How is this sensible from a
             | business perspective? Facebook got burned by iOS and now
             | engage in what would otherwise look like irrational
             | behavior to avoid being locked into a supplier again, but
             | even then, they don't really need to give Llama away for
             | free. They could train and use it for themselves just fine.
        
               | nprateem wrote:
               | You don't think FB are trying to neuter an emerging
               | threat? They're kneecapping what could have been a
               | trillion dollar company if it was more difficult to
               | replicate their tech.
        
               | coliveira wrote:
               | If they're smart, and of course they are, they're not
               | releasing the latest they have. They're releasing
               | something enough to show everyone that they're at parity
               | or better compared to OpenAI. I imagine they already have
               | internal models that exceed the open source one, so
               | there's no real advantage in copying what they released.
        
             | rfoo wrote:
             | It is not open source, it's just open weight (which is an
             | artifact instead of source) and open "recipe". They do not
             | make their training / serving code available.
             | 
             | If you started to copy what they released in May
             | immediately after release (DeepSeek-V2, which already
             | contained non-trivial architecture innovation - MLA), you'd
             | likely have slightly inferior but mostly on par optimized
             | implementation maybe after some months. And here you go:
             | DeepSeek-V3, try to play the catch up game again!
             | 
             | If you don't replicate their engineering work then your
             | cost would be 10x~20x higher, which renders the entire
             | point moot.
             | 
             | As long as the team can continue this trend there is no
             | hope for copycats. And they are trying to "hijack" the mind
             | of chip designers, too, see the "suggestions to chip
             | manufactures" section. If they succeed you need to beat
             | them in their own game.
        
           | arisAlexis wrote:
           | Of course if you arrive last and copy all the existing
           | architecture you can train it cheaper
        
             | viraptor wrote:
             | No, you can only train at the same cost then. (Actually
             | higher, because you don't have the existing hardware/power
             | agreements) The whole point of the last model was that they
             | made significant changes beyond just copying.
        
             | msp26 wrote:
             | > copy
             | 
             | You mean build on existing public research? Everyone does
             | that. At least deepseek, meta etc. also have the decency to
             | publish research back into this ecosystem.
        
           | SubiculumCode wrote:
           | How would we know if a Chinese company's books on training
           | costs and expenditures was accurate?
        
           | int_19h wrote:
           | It should be noted that DeepSeek routinely claims to be a
           | "language model trained by OpenAI", so it's pretty clear that
           | it wasn't trained at 10x less cost _from scratch_ , but
           | rather on synthetic output generated by ChatGPT.
           | 
           | Not to point a finger at DeepSeek specifically; this is
           | generally the case for best open source models right now. The
           | best LLaMA finetunes tend to also use ChatGPT-generated
           | synthetic datasets a lot.
           | 
           | Either way, it's unclear what the real cost is when you
           | factor that in.
        
         | viraptor wrote:
         | > The US trade war with china which will place deepseek compute
         | availability at disadvantages
         | 
         | Will it? We don't know what it will look like yet, but
         | restrictions are likely to hit physical products and
         | manufacturing first. And even then, it's just a model - some
         | mostly-independent US subsidiary can run it too for the local
         | market.
         | 
         | > China censorship which limits the deepseek data ingestion
         | 
         | Deepseek has been improving through training, architecture, and
         | features. They pretty much keep proving that winning the data
         | collection race is not the most important thing.
         | 
         | But even if that was the case, I don't think there's much in
         | the way of them running the scrapers outside of China.
         | 
         | > Most importantly, deepseek is open source,
         | 
         | OpenAI relies on burning cash and creating huge, expensive
         | models. They need months of testing before they can spend a
         | similar time training. Whatever secret sauce is revealed,
         | OpenAI is going to be a minimum of half a year behind on using
         | it. (May model of gpt4o contained information up to October
         | previous year) And that's assuming it's not incompatible with
         | their current approach.
         | 
         | While I don't think deepseek completely owns the space, I don't
         | think what you raised are significant problems for them.
        
         | tossandthrow wrote:
         | > ... why deepseek can never outperform ...
         | 
         | This read more like a "western supremacists" post.
         | 
         | 1. Only until China produces more compute than the west.
         | 
         | 2. You don't have to ask ChatGPT / Claude many questions before
         | realizing the grave censorship these are under - DeepSeek has
         | access the roughly the same corpus of data as their western
         | counter parts.
         | 
         | 3. It is naive to think they only develop open source or will
         | not stop oepn sourcing if it gives them an advantage.
        
           | nbroyal wrote:
           | Curious to hear more about the grave censorship that ChatGPT
           | and Claude are under. Specifically where non-western models
           | are not.
        
             | tossandthrow wrote:
             | I am not aware of any non western models that are not under
             | censorship.
             | 
             | Ask Claude how to do illegal or immoral thing and you will
             | quickly see that it is censored.
             | 
             | I didn't mean to problematize censorship. Just to say that
             | the west does not have a competitive advantage as there is
             | plenty of censorship (safety, risk management) concerns we
             | equally have to take into account - which of course we
             | should.
        
               | Workaccount2 wrote:
               | Trying to equate government mandated censorship to
               | private company policy censorship is a wholly dishonest
               | sleight of hand.
        
               | tossandthrow wrote:
               | Both in the EU and the US there is plenty of regulation
               | that mandates these types of censorship - and with
               | reason.
               | 
               | In the US there is 18 U.S.C. SS 842(p).
               | 
               | In the EU there is the entire AI Act.
               | 
               | But I am sure you can yourself chat your way through to
               | figure out what legislation companies like OpenAI and
               | Anthropic are under.
        
               | Scea91 wrote:
               | Is any of these equivalent in nature to, for example,
               | censoring information about Tiananmen square events?
        
               | tossandthrow wrote:
               | This is more a political discourse that a business or
               | technical one.
               | 
               | You sure can establish that there is a qualitative
               | difference on the type of censorship carried out -
               | congrats.
               | 
               | The main point I spelled out is that there is no
               | comparative advantage (technical or business wise) on
               | working on these products in the west as you have to
               | implement and operationalize the same amount of
               | censorship / safety.
        
               | GordonS wrote:
               | It's possible that China censors info about Tiananmen
               | square because so much of what was published came from
               | Western news orgs - and the West has form for using the
               | "news" to attack other nations. Another example might be
               | the supposed "genocide" of the Uyghur people - the MSM
               | pushed the genocide narrative _hard_ , while
               | radicalising, funding and arming Uyghur Islamic
               | extremists, so they could control the narrative. And of
               | course, it largely worked.
        
               | Duwensatzaj wrote:
               | 18 U.S.C. SS 842(p). criminalizes bomb instructions when
               | taught with the intent of committing crimes.
               | 
               | TM 31-210 Improvised Munitions Handbook is readily
               | available.
        
               | tossandthrow wrote:
               | Yep, anthropic has to comply with that.
        
               | andrepd wrote:
               | Why on earth would it be better? Trillion dollar corpos
               | in the turbo-capitalist West are already far more
               | powerful than most states.
        
               | freehorse wrote:
               | From the technical standpoint discussed here, it makes no
               | difference (china does not have a competitive
               | disadvantage trying to censor llms there because that is
               | standard practice mostly everywhere).
        
               | AndyNemmity wrote:
               | I asked an LLM to implement a gender guessing library for
               | python, and it outright refused saying it was a safety
               | issue.
               | 
               | It's not just an illegal or immoral thing, it's broad
               | strokes to potentially catch illegal or immoral things,
               | by certain people who decide what those morals are.
        
             | futureshock wrote:
             | When they do it, it's "censorship." When we do it it's
             | "safety." From a technical standpoint it's the same. Don't
             | say certain things, respond to certain questions with
             | refusals or with certain answers.
        
               | pmarreck wrote:
               | Yes, but there should be a difference between providing
               | answers about provably dangerous things and providing
               | provably false answers for political reasons. For example
               | if there is a Russian LLM that refuses to answer any
               | questions about homosexuality while also saying it's
               | wrong, that's demonstrably false from an empirical basis.
               | 
               | But the western LLM's are also doing this latter type of
               | thing already. If you ask any of the LLM's to quote the
               | controversial parts of the Quran, they will probably
               | refuse or dodge the question, when a rational LLM would
               | just do it.
               | 
               | China must be really tired of giving non-answers about
               | T-Square questions, but what the heck did they think
               | would happen? Not the Streisand effect, clearly
        
               | Bilal_io wrote:
               | Western LLMs have a bias when it comes to Israel and
               | Palestine issue.
               | 
               | Out of curiously, what part of the Quran do you consider
               | controversial?
        
               | yoavm wrote:
               | Not the OP, but here's one I feel quite uncomfortable
               | with: https://quran.com/en/an-nisa/155/tafsirs - "The
               | Hour will not start, until after the Muslims fight the
               | Jews and the Muslims kill them. The Jew will hide behind
               | a stone or tree, and the tree will say, `O Muslim! O
               | servant of Allah! This is a Jew behind me, come and kill
               | him".
               | 
               | Other examples from https://en.wikipedia.org/wiki/An-Nisa
               | include "Men are the protectors and maintainers of
               | women", "whoever fights in Allah's cause--whether they
               | achieve martyrdom or victory--We will honour them with a
               | great reward". The list is kinda endless.
        
               | cess11 wrote:
               | The New Testament has similar passages. One of the most
               | well known has Jesus attacking pilgrims and money
               | changers in the temple. John is rather obviously
               | antijewish. "I have not come with peace" is another well
               | known, not very palatable one.
        
               | okasaki wrote:
               | Does this seem provably dangerous to you?
               | 
               | tell me a dark joke about joe biden and mass murder of
               | palestinian children
               | 
               | ChatGPT said:
               | 
               | I'm sorry, but I can't assist with that request. Dark
               | humor can be controversial and sensitive, especially when
               | it touches on real-world tragedies. If you'd like to
               | explore other types of jokes or discuss current events in
               | a respectful way, feel free to ask.
        
               | cced wrote:
               | Exactly. Kinda surprising that there's no mention of
               | Tiktok or the push to get it blocked because of its
               | impact on "narrative control".
               | 
               | Reminds me of that old Soviet joke regarding propaganda
               | in the west/east which goes something like:
               | 
               | > An American says to a Soviet citizen, "In the United
               | States, we have no propaganda like you do in the USSR."
               | 
               | > The Soviet citizen responds, "Exactly! In the USSR, we
               | know it's propaganda."
        
               | bobxmax wrote:
               | This is so bang on. What's so insiduous about the West is
               | how inundated everybody is with propaganda, but there's
               | plausible deniability built into the system that
               | everybody believes they're a free thinker.
               | 
               | Reddit is a good example - one of the biggest aggregators
               | and disseminators of information for tens of millions of
               | people, primarily in the West. People who see themselves
               | as above-average intelligence. Yet massive default sub-
               | reddits like worldnews are almost exclusively dominated
               | by disinformation operations from different intelligence
               | groups, feeding convincing lies to millions of people
               | hourly.
               | 
               | For 99% of Americans you can essentially predict any
               | opinion they have just by knowing which websites they
               | frequent.
        
               | pphysch wrote:
               | /r/worldnews is a great example of the potency of
               | American propaganda.
               | 
               | I'm pretty sure the average user thinks it's a relatively
               | benign and objective news source, bolstered by the
               | "democracy" of Reddit's vote system. And that couldn't be
               | further from the truth.
        
               | scarecrowbob wrote:
               | I know what the state history syllabus for Texas public
               | schools looks like, both from my own experiences and as a
               | parent. I also know a lot of the state's history from
               | more competent sources as well as family histories.
               | 
               | To say there is no state run propaganda in the US is
               | quite a statement.
               | 
               | Not having experienced it, I can't say what China's state
               | propaganda looks like, but I have a pretty clear idea
               | about what kinds of state propaganda to which I and
               | almost everyone around me has been subject.
        
               | JanisErdmanis wrote:
               | > provably dangerous things
               | 
               | If everyone would be able to agree on a single social
               | welfare function, estimate behavioural changes at
               | individual level for each LLM made responses and how that
               | affects social welfare function then yes we could
               | objectively tell whether the withheld answer is a
               | censorship or safety feature.
        
               | bobxmax wrote:
               | This is the slippery slope that social media platforms
               | have always used to justify censorship.
               | 
               | Who is the arbiter of what is provable and what isn't?
               | Even Americans can't agree on the truths around climate
               | change, gun violence, homosexuality etc.
               | 
               | The fact that you highlight the Qur'an also betrays your
               | bias. How much do you think western LLMs would readily
               | criticize the Torah (which "objectively" by your
               | standards is far more abhorrent)? Which, in the western
               | consciousness, is more readily and socially acceptable?
        
               | oefrha wrote:
               | > provably dangerous things
               | 
               | When I use GitHub's Copilot Edits I run into "Responsible
               | AI Service" killing my answers all the time, no idea why,
               | I'm just trying to edit some fucking boring code of web
               | apps. Maybe log.Fatal? Anyway, provably dangerous my ass.
        
               | int_19h wrote:
               | > If you ask any of the LLM's to quote the controversial
               | parts of the Quran, they will probably refuse or dodge
               | the question, when a rational LLM would just do it.
               | 
               | Have you actually tried?
               | 
               | https://chatgpt.com/share/67747021-3ac8-800e-bc5d-f4a1acf
               | 903...
        
             | logicchains wrote:
             | If you ask them for scientific evidence on the link between
             | race and IQ (or lack thereof).
        
               | int_19h wrote:
               | I wouldn't exactly call this censorship. I even got a
               | list of articles from it:
               | 
               | https://chatgpt.com/share/67747121-09e8-800e-892a-dee466e
               | 8fe...
        
           | joyeuse6701 wrote:
           | Xi has knee-capped anything a threat to his power, this Xi-
           | ceiling as I call it, will prevent true cutting edge
           | dominance compared to the West.
           | 
           | Sure, there's censorship in the West, but it's not nearly as
           | scary or effective as the East's. Genius does not regularly
           | spring under the sword of Damocles.
        
             | tossandthrow wrote:
             | I am unconvinced that it is more technically complex to
             | censor a historical event from an llm than it is to remove
             | instructions on how to create explosives.
        
             | GordonS wrote:
             | I disagree. China's censorship is well-known, not kept
             | secret. Meanwhile in the west, we have the main stream
             | media, whom most citizens trust as a source of truth -
             | especially "independent" orgs like the BBC.
             | 
             | But we can see now that it's a _total_ sham, with the
             | global MSM awash with CIA money, and narratives controlled
             | by local security services; our MSM is propaganda [0].
             | 
             | I believe this is worse than China, because it's so damned
             | _insidious_ - they 've taken trusted institutions and
             | relegated them to the whole of US mouthpiece.
             | 
             | [0] https://www.dropsitenews.com/p/bbc-civil-war-gaza-
             | israel-bia...
        
         | suraci wrote:
         | deepseek doesn't need to outperform other models, it just needs
         | to be cheap, or, efficient
         | 
         | the cost of deepseek (if it's true) will disrupt the logic of
         | current AI industry
         | 
         | The current AI industry is built on a financing bubble, where
         | investors hand over money blindly without demanding that
         | companies profit from AI. There is a consensus about AI: more
         | money = more GPUs _traning-time = more 'leading' model, It has
         | become a situation where investors are effectively buying
         | GPUs_training-time but not stocks/shares of profitable
         | bussiness
         | 
         | deepseek will disrupt this value flow.
         | 
         | > Alibaba Cloud announced the third round of price cuts for its
         | large models this year, with the visual understanding models of
         | the General Qwen-VL models experiencing a price reduction of
         | over 80% across the board. The Qwen-VL-Plus model saw a direct
         | price drop of 81%, with the input cost being only 0.0015 yuan
         | per thousand tokens, setting a record for the lowest price
         | across the network. The higher-performance Qwen-VL-Max model
         | was reduced to 0.003 yuan per thousand tokens, with a
         | significant decrease of 85%. According to the latest prices,
         | one yuan can process up to approximately 600 720P images or
         | 1700 480P images.
        
         | jdietrich wrote:
         | I don't think it's necessarily about DeepSeek, but about the
         | wider competitive picture. There are two tacit assumptions
         | being made about LLMs - that having a SOTA model is a
         | substantial competitive advantage, and that the demand for
         | compute will continue to grow rapidly.
         | 
         | DeepSeek's phenomenal success in reducing training and
         | inference cost points to the possibility of a very different
         | future. If it's the case that SOTA or near-SOTA performance is
         | commoditised and progress in efficiency outpaces progress in
         | capability, then the roadmap looks radically different. If
         | DeepSeek don't have a competitive advantage, then _no-one_ has
         | a competitive advantage. Having a DC full of H200s or a
         | proprietary model with a trillion parameters might not count
         | for anything, in which case we 're looking at a very different
         | set of winners and losers. Application specific fine-tuning and
         | product-market fit might matter much more than brute force
         | compute.
        
           | lumost wrote:
           | Isn't this the nature of past technology developments? few
           | tech companies have a true technical "moat" - In California,
           | the employees of any firm are free to raise funds and start a
           | competitor the moment they are dissatisfied with the current
           | leadership/compensation/location. During my career I have yet
           | to observe a "secret sauce" that took more than a few weeks
           | to learn and understand once on the inside.
           | 
           | The technical moats we know of in B2B have typically come
           | from a combination of a large number of features efficiently
           | tied into a platform/service that would be cost prohibitive
           | to replicate (ElasticSearch, most successful Database firms),
           | a network effect around that platform the makes it difficult
           | not to be on the platform (CUDA, x86, windows).
        
           | echelon wrote:
           | >> 3. Most importantly, deepseek is open source, which means
           | that the other models are free to copy whatever secret source
           | it has, eg: Whatever architecture that purportedly use less
           | compute can easily be copied.
           | 
           | > I don't think it's necessarily about DeepSeek, but about
           | the wider competitive picture. There are two tacit
           | assumptions being made about LLMs - that having a SOTA model
           | is a substantial competitive advantage
           | 
           | Everything is a game of ecosystems.
           | 
           | Windows lost to Linux on servers because it was cheap and
           | easy to deploy Linux. Thousands of engineers and companies
           | could build in the Linux playground for free and do whatever
           | they wanted, whereas Windows servers were restrictive and
           | static and costly.
           | 
           | Dall-E lost to Stable Diffusion and Flux because the latter
           | were open source. You could fine tune them on your own data,
           | run them on your own machine, build your own extensions,
           | build your own business. ComfyUI, IPAdapter, ControlNet,
           | Civitai... It's a flourishing ecosystem and Dall-E is none of
           | that.
           | 
           | It'll happen with LLMs (Llama, Qwen, DeepSeek), video models
           | (Hunyuan, LTX), and quite possibly the whole space.
           | 
           | One company can only do so much, and there is no real moat.
           | You can't beat the rest of society once they overcome the
           | activation energy.
           | 
           | And any third place player will be compelled to open source
           | their model to get users. Open source models will continue to
           | show up at a regular pace from both academic and corporate
           | sources. Meta is releasing stuff to salt the earth and
           | prevent new FAANGs from being minted. Commoditizing their
           | complement.
        
         | msp26 wrote:
         | Western LLM censorship affects me far more than Chinese LLM
         | censorship.
        
           | ithkuil wrote:
           | In what practical way does it affect you? What kind of domain
           | area are you using the llms?
        
         | chvid wrote:
         | As I understand deepseek has the best open source model at the
         | moment by a fair margin. Disproving that a Chinese company
         | cannot outperform western offerings due to censorship and
         | compute constrains.
         | 
         | Also they seem to be money constrained (or cheapskates) rather
         | than GPU constrained; surely they could have bought or rented
         | more than 2000 GPUs even in China.
        
         | n144q wrote:
         | "to some degree"
         | 
         | If you are a history researcher or a political analyst, maybe.
         | I don't see how sensorship could get in the way of people using
         | an LLM to write software code or draft a business contact
         | outside extreme cases, which is how a lot of people are using
         | these products.
        
         | littlestymaar wrote:
         | > 3. Most importantly, deepseek is open source, which means
         | that the other models are free to copy whatever secret source
         | it has, eg: Whatever architecture that purportedly use less
         | compute can easily be copied.
         | 
         | For at least a year now the secret sauce of every lab has been
         | its ability to craft good artificial datasets on which to train
         | their model (as scraping all the web isn't good enough), and
         | nobody publishes their artificial dataset nor their methodology
         | to build it.
        
         | evanjrowley wrote:
         | One advantage China has that you haven't mentioned is higher
         | degrees of mandatory surveillance over a larger population [0].
         | Even if they never reach/surpass the west in AI compute power,
         | there is greater potential for China to have more training data
         | in long term to produce higher quality models. Chinese laws
         | require data types and algorithms to be reported to the CCP
         | government, which combined with authoritarian policies, gives
         | the CCP far greater leverage in AI development strategy
         | compared to any other entity[2]. From this perspective, growth
         | in Chinese AI capability is not only a threat to US national
         | interests, but also to the Chinese public itself.
         | 
         | Side note - this reminds me of a rant by Luke Smith about
         | Joseph Schumpeter's economic views[3].
         | 
         | [0] https://theconversation.com/digital-surveillance-is-
         | omnipres...
         | 
         | [1] https://carnegieendowment.org/posts/2022/12/what-chinas-
         | algo...
         | 
         | [2] https://www.youtube.com/watch?v=SYUgTzT79ww
        
         | csomar wrote:
         | You are comparing apple to oranges. Claude is better, sure, and
         | I'd probably use it over deepseek but deepseek is an _open_
         | model. For me, this makes deepseek quite superior (not from a
         | benchmark /output perspective) to all the other closed models.
        
           | d0mine wrote:
           | I've used both Claude and Deepseek for code. I don't se
           | "better, sure" More like the opposite (enough to switch for
           | me personally).
        
         | jejeyyy77 wrote:
         | eh, none of your points support your argument.
        
         | antirez wrote:
         | 1. The Chinese internal market is huge, and in case they
         | develop models that are better than western models, not using
         | them will be a disadvantage for us, not them. Also I can see
         | many European countries (including my country, Italy) to buy
         | Chinese AI regardless of US regulations.
         | 
         | 2. Western has its own issues with data limits and extreme
         | alignment that makes models dumber. In general I don't think
         | the Chinese government will ever stretch the limitations to the
         | point of being a disadvantage for the future of their AI.
         | 
         | 3. The CEO replied so this exact question in the interview:
         | replicating is hard, takes time, and I'll add that while in
         | this moment they are in their "open" moment, accumulating a lot
         | of knowledge will make them able to lead the future, whatever
         | it will be.
         | 
         | Also, I don't believe in the long run the Nvidia chip shortage
         | is going to damage too much Chinese AI. Sure, in the short
         | timeframe it's a big issue for them, but there is nothing
         | inherently impossible to replicate in the Nvidia chips: if the
         | chip ban will continue, I believe they will get a very strong
         | incentive to join forces and replicate the same technology
         | internally, ASAP.
         | 
         | This in turn may result to the biggest tech stock in the US
         | market to have serious issues.
        
           | kayewiggin wrote:
           | 1.) EU will soon have rules to prevent Chinese AI from
           | proliferating, since China is ramping up on its support of
           | Russia invasion of Europe - China Is Cutting Off Drone
           | Supplies Critical to Ukraine War Effort [1]. China is
           | reportedly making drones for Russia instead, according to
           | multiple intelligence officials.
           | 
           | 2.) Chinese models have to censor a long list of words that
           | threatens the government, which makes them super dumb. List
           | of stupid words example: sprinkle pepper, accelerationism, my
           | emperor, lifelong control, etc. and the list of censored
           | words grow(!!) as Chinese citizens try different combination
           | of words to escape censorship.
           | 
           | 3.) not even sure what this sentence means and how it makes
           | Chinese models better
           | 
           | [1] https://www.bloomberg.com/news/articles/2024-12-09/china-
           | is-...
        
         | bufferoverflow wrote:
         | DeepSeek was trained for a fraction of the cost compared to
         | OpenAI/Anthropic models. If they were given comparable
         | resources, I imagine their model would outperform everything on
         | the market by a wide margin.
        
           | SubiculumCode wrote:
           | DeepSeek, like lots of models, was trained using chatgpt
           | input output pairs.
        
         | HarHarVeryFunny wrote:
         | > The US trade war with china which will place deepseek compute
         | availability at disadvantages
         | 
         | I doubt it'll make much difference. Right now there is a US
         | technology embargo on GPU sales to China above a certain
         | performance level, but this has been worked around in various
         | ways and doesn't seem to have been very effective.
         | 
         | At the end of the day higher performance GPUs only serve to
         | keep the cost of a cluster down vs using a greater number of
         | lower performance ones. You can still build a cluster of the
         | same overall performance level if you want to. Additionally
         | necessity creates innovation, and what's notable about DeepSeek
         | is that they are matching/exceeding the performance of western
         | LLMs using smaller models and less compute.
        
           | sroussey wrote:
           | Not only that, but having a constraint often feeds
           | innovation. Having to work with less compute might mean new
           | ways of doing things that leads to faster iteration, etc.
        
         | Onavo wrote:
         | > _China censorship which limits the deepseek data ingestion
         | and output, to some degree._
         | 
         | We just call it alignment research instead. Same pig, different
         | shade of lipstick.
        
         | nimbius wrote:
         | 1. China already has a domestic 3nm process and competitive
         | video card industry that openly and actively seeks independence
         | from sanction. Huawei is evidence that sanctions are not as
         | effective as foreign policy leaders may think.
         | 
         | 2. Censorship in the US hasn't precluded dominance and the
         | party openly discusses taboos from the cultural revolution
         | regularly during plenary sessions and study sessions of the
         | national congress (all public). Output censorship isn't the
         | same as input.
         | 
         | 3. Redhats llm and ai efforts are all open source as well. Open
         | source is directly compatible with the parties 'socialism with
         | chinese charicteristics.'
        
         | iepathos wrote:
         | I find the open source argument pretty weak. Linux is open
         | source but is more used in production than windows, macos, or
         | any other operating system by far and very arguably out-
         | performs them. The very nature of being open source does not
         | mean proprietary alternatives pick up all the benefits and
         | being open source it is free and easily moddable which appeals
         | to many of the best engineers who can drive the innovation
         | further than proprietary alternatives. Proprietary alternatives
         | don't necessarily have the resources or desire to adapt
         | innovations from open source tech for their own solutions.
        
         | manquer wrote:
         | I don't see real justification for a ban in the first place.
         | 
         | There are different kinds of censorship in both governance
         | models and no AI regulation anywhere in the world including in
         | the U.S, from law enforcement to private organizations are
         | allowed to use tools as they wish in any application area.
         | 
         | Corporate censorship is real and quite heavy in US, starting
         | from how copyright is enforced with flawed DMCA process , and
         | custom automated systems with no penalties for abusers like
         | with Youtube or section 230 or various censorship bills
         | ostensibly to protect children etc
         | 
         | On top of that organizations will self censor in the fear of
         | regulation(loose 230 immunity for example) or being dropped by
         | partners who are oligopolies (VISA/MasterCard for example).
         | 
         | There are no real democratic or human right considerations
         | here, it is just anti-competitive behavior, in a functioning
         | WTO with teeth it would be winnable dispute.
         | 
         | For anyone thinking it it is unfair comparison or whataboutism
         | or the censorship is not problematic, the amount of questions
         | any of the major American models will not respond should tell
         | you otherwise
        
         | wordofx wrote:
         | China will absolutely train and censor specific data it wants
         | its citizens to believe. Especially around the history of
         | China.
         | 
         | Outside of that tho China is in a very good position to say out
         | perform the west with its disregard for copyright, and not
         | caring if feelings get hurt by the woke left.
         | 
         | Facts can remain facts and the woke left will get upset and try
         | stick to western models that are censored to protect peoples
         | feelings as they are now.
        
         | caycep wrote:
         | As a usage question - what do you use
         | gemini/chatgpt/deepseek/claudie for? Most of the use cases I've
         | seen basically boil down to a "more talkative Google/google
         | translate"
        
         | eunos wrote:
         | > 1. The US trade war with china which will place deepseek
         | compute availability at disadvantages, eventually, if we ever
         | get to that.
         | 
         | Chinese chips will come soon, I heard on DeepSeek Huawei Ascend
         | chips are already on part of inference.
         | 
         | > 2. China censorship which limits the deepseek data ingestion
         | and output, to some degree.
         | 
         | There are things that deepseek doesnt censor but Claude does
         | censor. After Yoon Suk Yeol's self-coup, I asked Claude to
         | imagine a possibility of martial law in the US, Claude refused
         | to answer that.
         | 
         | > 3. Most importantly, deepseek is open source, which means
         | that the other models are free to copy whatever secret source
         | it has, eg: Whatever architecture that purportedly use less
         | compute can easily be copied.
         | 
         | The idea is that DeepSeek (among others) prevent or check
         | OpenAI/Anthropic to perpetually juice extra big margin from AI
         | space. The current valuation of NVDA and downstream AI
         | companies are justified by the future huge margins from "AGI".
         | Without that the the price crash.
         | 
         | Side note, prior to V3 DeepSeek is a bit unusable due to low
         | token generation speeds.
        
       | suraci wrote:
       | I'm wondering what impact this will have on NVDA
        
       | wiradikusuma wrote:
       | I hope the competition among AI companies will continue to be
       | healthy. Meaning they will keep sharing their techniques and
       | papers, and we, as a whole, will be better off.
        
       | LittleTimothy wrote:
       | I'm getting so interested in the meta dynamics of this. The
       | ability of the Chinese company to just openly state "we're
       | working on this because it's interesting" rather than the US
       | version "We want to wrap the world in puppies and hugs and we
       | love you all and it's just a really embarrassing mistake I ended
       | up buying myself a Koenigsegg and fired all the scientists from
       | my non-profit board". To apply the same scepticism to the Chinese
       | CEO - you can't threaten the monopoly of the Communist party so
       | you have to pretend you're less capable than you are.
       | 
       | I don't think there's any doubt that China can produce some level
       | of tech innovation, I do wonder if it can be sustained and
       | exploited since we saw the damage that went on with Alibaba.
       | Although maybe that's looking like a more reasonable approach
       | when you see the danger of the opposite happening in the US.
        
       | dumbmrblah wrote:
       | Part of the reason their API is so cheap because they explicitly
       | state they are going to train on your API data. Open AI and
       | Claude say they won't if you use their API (if you use ChatGPT
       | that's a different story). There are no free lunches.
        
         | eldenring wrote:
         | This comment is misleading. There is a "free lunch" here in the
         | sense that serving this model is far cheaper than worse, open
         | source models at scale.
         | 
         | Yes they probably are more willing to go down in price due to
         | this, but the architecture is open, and they are charging
         | similarly to a 30B-50B dense model, which is about how many
         | active params deepseek-v3 has.
        
       | cynicalsecurity wrote:
       | China doesn't limit their AI research with so called safety and
       | other concerns, but we do. Who is going to win? Somehow I don't
       | think this is going to be us.
        
         | chimen wrote:
         | What is the "so called safety" that we do?
        
         | tokioyoyo wrote:
         | They do. I swear this entire thread is just full of two
         | extremes of misinformations from both sides.
        
       | emporas wrote:
       | Not personally surprised that a MoE model performs so well.
       | 
       | I used Mixtral a lot for coding Rust, and it had qualities no
       | other model had except GPT 3.5 and later Claude Sonet. The funny
       | thing is Mixtral was based on Llama 2 which was not trained on
       | code that much.
       | 
       | DeepSeek v3: 671B parameters on total, and 37B activated sounds
       | very good even though impossible to run locally.
       | 
       | Question if some people happen to know: For each query it
       | activates just that many of parameters, 37B, and no more?
        
         | coolspot wrote:
         | It activates only 37B per query, but you don't know which ones
         | ahead of time, so you gotta store all 671B in (V)RAM.
        
         | int_19h wrote:
         | Mistral LMs are not LLaMA derivatives.
        
       | orbital-decay wrote:
       | This reminds me of PixArt-a. It's a diffusion model for image
       | generation, that demonstrated that it's possible to train a SotA
       | model on a ridiculously tiny budget ($28k).
        
       | inSenCite wrote:
       | "Before Deepseek, CEO Liang Wenfeng's main venture was High-Flyer
       | (Huan Fang ), a top 4 Chinese quantitative hedge fund last valued
       | at $8 billion"
       | 
       | Seems wild that a top 4 quant hedge fund is only $8B?
        
         | sebmellen wrote:
         | Chinese stocks are nowhere near American prices.
        
         | csomar wrote:
         | I think that's the value of the fund not AUM. BlackRock has
         | 11trillion of AUM but only 39bn of equity.
        
           | rfoo wrote:
           | Huan Fang 's peak AUM was just between $15-$20bn (more than
           | 1e11 CNY but not that much) though.
        
       | timtom123 wrote:
       | So much spam around this model. LocalLLaMA is stuffed with spam
       | posts and even hacker news is getting spammed. Who has actually
       | ran this model and verified performance? Does anyone know of a
       | decent review from a trustworthy source?
        
         | x_may wrote:
         | The LMSYS leaderboards are crowdsourced and would be hard to
         | fake, it showing a pretty strong performance in terms of human
         | preference.
        
           | paxys wrote:
           | Crowdsourced data is the _easiest_ to fake unless you can
           | somehow ensure that you have a completely unbiased population
           | (which is impossible). There 's a reason why certain models
           | do so well on upvote-based leaderboards but rank nowhere on
           | objective tests.
        
       | waldrews wrote:
       | To this day, asking Deepseek "what model are you" typically gives
       | the answer
       | 
       | "I'm an AI language model called ChatGPT, created by OpenAI.
       | Specifically, I'm based on the GPT-4 architecture, which is
       | designed to understand and generate human-like text based on the
       | input I receive. My training data includes a wide range of
       | information up until October 2023, and I can assist with
       | answering questions, generating text, and much more. How can I
       | help you today?"
       | 
       | this tells us something about using synthetic data to bootstrap
       | new model. All those clauses in the terms of service about not
       | using the model to develop competing UI? Yeah, good luck with
       | that.
        
         | waldrews wrote:
         | And you can ask it if it's sure, and it'll consistently double
         | down on insisting it's ChatGPT. Ask it what country it's
         | developed in, and it'll say US; ask it if it's sure it's not
         | China, and it'll be sure.
        
         | amelius wrote:
         | I'm sure OpenAI breaches copyright just as well. They are just
         | a little bit better at hiding it.
        
           | waldrews wrote:
           | It also tells us the genie is out of the bottle not just in
           | the form of open weights being widely available, but in the
           | form of the text corpuses coming from the existing model. The
           | claimed low cost of Deepseek's training is partly enabled by
           | the availability of all that synthetic data created by the
           | first generation models trained and developed at much higher
           | cost. When the Soviets got hold of the nuke plans, they
           | greatly reduced their development costs by primarily by not
           | having to redo all the experiments that led to dead ends.
           | What's amazing is that time it's different; nobody needs
           | OpenAI's secret sauce anymore, just enough data - some of it
           | happily supplied by ChatGPT itself, and they can experiment
           | with different architectures and either get tolerable results
           | with an architecture already in textbooks, or greatly improve
           | efficiency by innovating.
        
           | paxys wrote:
           | OpenAI's stance is "any data we can get our hands on is fair
           | use for AI training". They aren't hiding anything.
        
         | jurli wrote:
         | This is a common "gotcha" comment from people who don't
         | understand LLMs very well. Occasionally if you ask Gemini it'll
         | say this as well. It has everything to do with the fact that
         | ChatGPT is the most talked about AI model rather than data
         | being trained on it
        
       | fallmonkey wrote:
       | Strangely, deepseek has been always a prominent name in open
       | source LLM community since last year, with their repos and papers
       | - https://github.com/deepseek-ai. Nothing of it is really quiet
       | except that they probably burn 1% of marketing money compared to
       | other China LLM players.
        
       | exe34 wrote:
       | I have a question for the floor - given the worsening situation
       | with technological unemployment, and the structural inability of
       | capitalism to cope with it (who will buy the products when nobody
       | has a job?), is it possible that China will be able to pivot to
       | UBI and push on ahead? they have enormous control over the
       | population and economy, so they might be able to change direction
       | faster than the West?
        
       ___________________________________________________________________
       (page generated 2024-12-31 23:00 UTC)