[HN Gopher] Deepseek: The quiet giant leading China's AI race
___________________________________________________________________
Deepseek: The quiet giant leading China's AI race
Author : sunny-beast
Score : 263 points
Date : 2024-12-31 09:28 UTC (13 hours ago)
(HTM) web link (www.chinatalk.media)
(TXT) w3m dump (www.chinatalk.media)
| yellow_lead wrote:
| > Liang Wenfeng: We believe that as the economy develops, China
| should gradually become a contributor instead of freeriding. In
| the past 30+ years of the IT wave, we basically didn't
| participate in real technological innovation. We're used to
| Moore's Law falling out of the sky, lying at home waiting 18
| months for better hardware and software to emerge. That's how the
| Scaling Law is being treated.
| mentalgear wrote:
| Impressive to think about how DeepSeek achieved: ~ Parity with o1
| and Claude with > 10x less resources. Better algorithms and
| approaches are what's needed for the next step of ML.
| NitpickLawyer wrote:
| While impressive, the deepseek models aren't really "on par"
| with either oAI or Anthropic offerings, right now. The models
| seem to be a bit overfitted in the post-training step. They are
| very "stubborn" models, and usually handle tasks well _if_ they
| can handle them, but steering them is quite difficult. As a
| result, they score very well on various benchmarks, but often
| times perform slightly worse in real-life scenarios.
| espadrine wrote:
| The blind test at lmarena.ai does give it a higher Elo than
| GPT-4o (API), Claude, and Gemini 1.5 Pro. It seems that
| people do enter real-life scenarios in the arena.
| victorbjorklund wrote:
| I found deepseek very useful at coding with Aider. On par
| with claude.
| rahimnathwani wrote:
| They are very "stubborn" models
|
| Have you found this to be the case even when using the
| recommended temperature settings (ranging from 0 for math, to
| 1.5 for creative tasks)?
| NitpickLawyer wrote:
| I use 0.05 for math, just did a 5k problem set, trying to
| fine-tune a smaller model with the outputs. It has some
| very interesting training, borrowed from r1 per the tech
| report, where it does the o1/qwq "thinking steps", but a
| bit shorter. It solves ~80% of the problems in 4k context,
| while qwq would go on for 8k-16k. It's very good at what it
| does.
|
| But as soon as I need it to do something _other_ than solve
| a problem - say rewrite the problem in simpler terms, or
| given a problem + solution provide hints, or rewrite the
| solution with these <tags>, etc. it kinda stops working.
| Often times it still goes ahead and solves the problem.
| That's why I'm saying it's stubborn. If a task looks like a
| task that it can handle _very_ well, it 's really hard to
| make it perform that other, similar but not quite the same
| task.
|
| In a similar vein - https://github.com/cpldcpu/MisguidedAtt
| ention/tree/main/eval...
| orbital-decay wrote:
| DeepSeek v3 feels very much like Sonnet 3.5 (v1) in
| particular, minus the character. Performs more or less
| similarly, "feels" overfitted just about the same, and
| repeats itself in multiturn chats even worse. I hope they
| address it in v3.5, v4, or whatever comes next.
| amelius wrote:
| How are these models benchmarked?
| amelius wrote:
| Makes you wonder if OpenAI has a moat.
| llm_trw wrote:
| We're seeing a split in models between deep and wide.
|
| Wide models sound like they know more than deep models but fail
| at reasoning with more than a few steps and are cheap to train
| and serve. Deep models know a lot less but can reason much
| better.
|
| An example I saw all moe models fail at a few months back was A
| and not B being implicit in the grounding text, all of them
| would turn it into A and B a substantial proportion of the
| time. Monolithic models on the other hand had no trouble with
| giving the right answer.
|
| The Chinese AI companies can only do wide Ai because of
| restrictions on hardware exports. In the short term this will
| make more people think llms are stochastic parrots because they
| can't get simple thinks right.
| caycep wrote:
| what's the engineering situation at OpenAI since the whole
| "firing Sam Altman" spectacle? Has there been significant brain
| drain that affects something like o1 etc?
| lomkju wrote:
| I feel the GPU restrictions created an environment for Chinese
| Devs to be more innovative and do more with less.
|
| Kudos to the deepseek team!
| wodderam wrote:
| Kai-Fu Lee describes the culture so well in AI Superpowers. The
| roots are well before GPU restriction. Absolute cut throat
| competition.
|
| Imagine Sam Altman throwing a chair out a window in a meeting
| lol.
|
| The message of AI Superpowers is that China will lag the US at
| first but once things stabilize this will happen because China
| has a lot more engineers and a lot more data.
|
| Anyone who hasn't read AI Superpowers should really make it a
| point to read it in 2025. It is an incredible book.
| Etheryte wrote:
| I don't know, I've been hearing the story that China is about
| to upend the US as the leading global superpower ever since I
| was a kid. There's always a new vogue and novel twist put on
| the rationale and how it's gonna happen, but so far it's like
| fusion, always a few years away.
| elashri wrote:
| I think you mean nuclear fusion.
| Etheryte wrote:
| Of course, thanks.
| tossandthrow wrote:
| What makes you think it has not happened? There has not
| been an event to establish who the current super power is
| in new time.
| futureshock wrote:
| I think you have the right idea. China has yet to truly
| flex its muscle. They prefer to quietly grow stronger.
| Their response to Covid with the largely successful zero
| covid strategy gives a clue about the power of its
| government. Silly, you can't become the champion without
| stepping into the ring.
| daveguy wrote:
| Largely successful? I wouldn't confuse Chinese propaganda
| to the outside world with success.
| pphysch wrote:
| The Chinese government treated the pandemic as a
| bioweapon attack by a foreign adversary engaged in a
| broader hybrid war, and it did so effectively.
| daveguy wrote:
| That's batshit insane.
| opwieurposiu wrote:
| China creates a superbug via GOF research. Accidently
| releases it from the lab. Shuts down its own economy.
| Puts the majority of it's citizens on house arrest, and
| that is "largely successful"? Please send me the
| AliExpress link to whatever it is you are smoking, it
| must be some good shit.
|
| I think the real lesson here is that if you enough
| government power, there is no need to be competent. The
| feedback loop is destroyed so you can just do whatever
| random stupid thing you want until your country collapses
| like the USSR.
| vkou wrote:
| > China creates a superbug via GOF research
|
| In 2024, this isn't fact, it's just baseless conspiracy.
|
| All evidence has ended up pointing to bush meat
| contamination.
| int_19h wrote:
| In 2024, zoonotic origin is considered more probable, but
| it is by no means a "baseless conspiracy" to believe
| otherwise.
| hollerith wrote:
| The professional military leaders in China, Japan, SK,
| Taiwan, Singapore and Philippines acts as if the US is
| the current superpower.
| jitl wrote:
| If you want to look at an objective numeric metric for
| this, why not foreign military bases? US has 128+, China
| had ~2. To project global military power China will need
| similar order of magnitude presence. I use that number as a
| check against sometimes breathless and sensational
| journalism about the topic.
|
| It's harder for me to come up with a simpler metric for
| "Belt and Road" / IMF style control-through-capital.
|
| But, I think it will happen. After visiting China and
| seeing how much consistent progress both in infrastructure
| from the government and in daily life from the economy, my
| impression is US government makes 2 steps forward 1 step
| back in the same time it takes China to take 100 steps
| forward.
| bpodgursky wrote:
| The US does not have 128 foreign military bases. It has
| ~50 nominal bases [1]. Most of them are just the US
| sharing an airfield with a friendly country; it's a
| refueling stop that would not be hard for China to
| replicate.
|
| The US does have several large overseas bases but 90% of
| this list is are indefensible logistics hubs and not a
| meaningful projection of force.
|
| [1] https://en.wikipedia.org/wiki/List_of_American_milita
| ry_inst...
| daveguy wrote:
| Calling any US military base an "indefensible logistics
| hub" reveals that the extent of your research was
| probably just that Wikipedia listicle.
| bpodgursky wrote:
| Believe whatever you want.
|
| Most of these bases are co-located with NATO or other
| allies for good reason, the US doesn't have to do
| everything itself wrt air defense, locating an airlift
| wing with a fighter wing.
|
| But then it's a lower bar than people imagine, for China
| to buy similar friendship.
| daveguy wrote:
| You literally called the _logistics hubs of the US
| military_ -- the bases that move more of the most
| powerful weapons and military personnel in the world --
| indefensible. So you either don 't know what indefensible
| means, or you are a piss poor propagandist.
| sangnoir wrote:
| Yes or No: Can the US singlehandedly defend all those
| bases without the help of host country? If the answer is
| "Yes", then China has a long way to go to achieve that
| capability. If answer is "No", then the bar is much
| lower, and gp's point is that China can "buy" similar
| arrangements without too much effort. More directly, is
| the bottleneck on funding, personnel/matiriel, or
| diplomacy?
|
| China is building up a lot of soft power with
| infrastructure projects all over the world - most of them
| are aimed at improving trade - ports, rail lines and the
| like. In the next decade or 2, they can reasonably make
| requests to place a few PLA/PLAA personnel and equipment
| on bases in strategic places, bases they may have been
| built using Chinese money.
| daveguy wrote:
| Fair enough. But could any country attack all of those
| bases at once? As long as the US doesn't do anything as
| colossally stupid as leaving NATO it shouldn't be a
| problem with support. Ultimately NATO participation
| resides with Congress which is beholden to the people.
| NATO is overwhelmingly approved of by the US people -- it
| is a _defense_ pact.
|
| If a country or coalition decided to attack all of those
| bases at once it would give the US the high ground to
| respond. Nazis tried a blitzkrieg and that didn't turn
| out well. As someone squarely against the bullshit of
| Trump, I would not be happy if he was in power at the
| time. But I do not doubt for a second that the US
| population in general would respond as readily as they
| did after 9/11 (but hopefully not as readily as in Iraq).
|
| We just saw how the "dipshit in power" aspect works with
| Netanyahu in Gaza -- a disproportionate and tragic
| response. The only caveat is Trump is an extremely stupid
| dipshit, so I genuinely hope it doesn't turn out that way
| and everyone keeps their powder dry until Trump is out of
| office.
|
| China's buildup of soft power is good for them, and I
| commend them for it. Fortunately, I believe soft power is
| a defensive power at its core, and I don't think it
| translates to offensive power. To confuse the two would
| be a mistake.
|
| Thank you for the opportunity to get a lot off my chest
| this New Year's Eve. I hope it wasn't too offensive,
| because I believe you responded intelligently and in good
| faith, and thank you for that.
| bpodgursky wrote:
| Yes, it is very easy for a logistic hub that only has an
| airlift wing to be indefensible in a war against a peer
| adversary if for example there are no THAAD or Patriot
| batteries there. It's hub, not a hardened facility.
|
| Many of the US military bases are communication centers
| or barracks on training bases. They serve important roles
| but are not "defensible" in many contexts.
|
| Who am I even propagandizing for in this context?
| fakedang wrote:
| I was of the same viewpoint as you - just look at the
| militaries!
|
| Except in today's world, being a military power is
| increasingly less relevant after a certain point, while
| economic supremacy is increasingly gaining prominence.
| While the West is content with self-platitudes for their
| "democracy", China has been building strong relationships
| with a number of countries looking to implement the
| "China-model", a capitalist but largely regressive nation
| that relies on surveillance and stringent media control.
| China is already licensing out their technology to a
| number of interested countries, some of which include
| Western countries looking to emulate Chinese autocracy
| themselves. On the other hand, countries are looking at
| the incoming US govt with pretty much strong uncertainty
| as to what their relationship with America will be like.
|
| Not to mention, as automated warfare becomes increasingly
| more relevant, guess where these countries are buying
| their drones from? Hint hint, it's not the US with their
| overpriced toys.
| kayewiggin wrote:
| > China has been building strong relationships with a
| number of countries
|
| Number of irrelevant countries. US's allies are Europe,
| Japan, South Korea, Taiwan, Canada, Mexico, Australia,
| etc. 80% of the world's wealth. and 95% of the world's
| top technologies.
|
| > guess where these countries are buying their drones
| from
|
| Soon, not China. China Is Cutting Off Drone Supplies
| Critical to Ukraine War Effort [1]. China is reportedly
| making drones for Russia instead, according to multiple
| intelligence officials.
|
| [1]
| https://www.bloomberg.com/news/articles/2024-12-09/china-
| is-...
| solaarphunk wrote:
| Economic superpower perhaps - just take a look at their
| relative GDP over time.
| kayewiggin wrote:
| China has 900M people making less than $400/month, and
| 600M people making less than $100/month. relative GDP is
| a joke, go to China to see what most of them are eating
| (hint: it's unsafe food filled with chemicals, or its
| mostly carbs) and where these people are living (hint:
| it's shoddy constructed condos or run down farm houses)
| bugglebeetle wrote:
| It's unclear to me why what you're describing is specific
| to China and not also what Americans euphemistically
| refer to as "fly over states."
| kamarg wrote:
| Not sure why you have something against the flyover
| states. I'm sure there's more shoddily constructed condos
| in Florida/California/New York per capita than there is
| in the Midwest. Same goes for cheap high calorie food.
|
| Of course, the same can probably be said about the large
| population centers in China too. More people concentrated
| in one area tends to mean more poverty in that area and
| all the things that come with it.
| bugglebeetle wrote:
| I don't have anything against them. I was born and raised
| in one. I just find it ironic that someone would fail to
| see this parallel.
| Airodonack wrote:
| The parallel is that there are rich and poor? It is
| unscrupulous to argue in imprecise, binary terms while
| ignoring the difference in scale. People in flyover
| states are not making only $400/mo or even occupying that
| same societal equivalent of China in America.
| eunos wrote:
| > China has 900M people making less than $400/month
|
| Most of these folks are illiterate oldies that would pass
| away in a few years anyway.
| jurli wrote:
| It's literally happening lol. When you were a kid China was
| making shoes and their GDP is 10% of the US. Now they're
| making drones / evs / high end electornics and it's 80%.
| This is why people's perception is so unreliable because
| it's impossible to notice things when they happen over a
| lifetime
| talldayo wrote:
| When I was a kid, China was a lot better integrated with
| the international community. Right now their
| relationships are far and few between, rarely featuring
| first-world nations.
|
| If Russia couldn't beat NATO in a pitched fight against
| the rest of the world, neither can China.
| joshbaptiste wrote:
| As a sovereign nation rises in power you'll notice how it
| slowly starts losing favor from USA
| caycep wrote:
| The question in a rational mind is, why would it even
| bother? US/China partnership is the most economically
| successful in world history, even more so than US/UK or
| AUKUS. But the downside of CCP government structure is
| that paranoia at the top ranks has a good probability of
| overruling rationality.
|
| Albeit US cannot speak as US-centric
| paranoia/"exceptionalism" may do the same thing...and the
| electorate voted to self destruct the government despite
| US economy being the strongest in decades.
| bugglebeetle wrote:
| > US economy being the strongest in decades
|
| The vast majority of its people do not share in that
| success and have seen a declining standard of life
| relative to prior generations whereas in China, the
| opposite is quite demonstrably true, despite increasingly
| similar concentrations of wealth and political power.
| kayewiggin wrote:
| > making drones / evs / high end electornics
|
| China does have a current advantage on lithium battery
| and rare earth materials - dumb technologies that US and
| allies can replicate fairly quickly, less than a year.
| EUV and 3nm and below on the other hand, will take
| decades, since it involves a number of different and deep
| technologies controlled by dozens of companies. China has
| thrown $150B on it since 2014, and has only come up with
| low yield/unprofitable 7nm via existing DUV machines.
|
| > 80% GDP
|
| China's demographics will more than HALF to 500M by 2100,
| if not earlier, while US grows to close to 400M by then.
| Someone actually theorizes that China's population is
| already only 800M right now
| https://www.youtube.com/watch?v=fR5F_8dSjOw
|
| Also, a lot of that GDP is debatable in 2024, when real
| estate prices have dropped by more than 50% in tier 2 and
| below cities, and deflation has raged on.
| FooBarWidget wrote:
| So why aren't US and allies demonstably replicating EVs
| (and other kinds of green technology) quickly? Tesla is
| still pretty much the only serious player. Why are CEOs
| of major western carmakers painting a very different
| picture than what you describe here? Where are the
| serious EU/US battery makers that are globally
| competitive? It looks to me like the EU has chosen the
| worst of all options: put up tarriff barriers while
| _also_ not having serious domestic EV makers, and _also_
| not stimulating domestic EV development.
| dboreham wrote:
| Western consumers don't want to buy EVs (mostly).
| FooBarWidget wrote:
| Yeah I mean, with the sad state of the Dutch electric
| grid, the poor coverage of chargers, and the disappearing
| consumer subsidies, I wouldn't want either. So why aren't
| governments also building the infrastructure they need to
| help stimulate demand for EVs? Not taking global climate
| disaster serious enough?
|
| Building EVs and supporting infrastructure is a lot more
| complicated than just having a bunch of blueprints.
| slt2021 wrote:
| They would buy Chinese EVs since they are much cheaper
| than ICE
| eunos wrote:
| > dumb technologies that US and allies can replicate
| fairly quickly
|
| Laugh in Northvolt
|
| > $150B on it since 2014, and has only come up with low
| yield/unprofitable 7nm via existing DUV machines
|
| Considering that there are less than 5 countries on Earth
| that can fab 7nm semiconductors, that aint bad.
| kayewiggin wrote:
| RIP Northvolt from Sweden and The U.S. made a
| breakthrough battery discovery -- then gave the
| technology to China
| https://www.npr.org/2022/08/03/1114964240/new-battery-
| techno.... However:
|
| - Battery Startup Opens Chicago Plant as US Seeks to Curb
| Reliance on China https://www.nanograf.com/media/battery-
| startup-opens-chicago...
|
| - Our own YC:
| https://www.ycombinator.com/companies/industry/energy
|
| - China's startup scene is dead as investors pull
| out--'Today, we are like lepers'
| https://finance.yahoo.com/news/china-startup-scene-dead-
| inve...
| roenxi wrote:
| > when real estate prices have dropped by more than 50%
| in tier 2 and below cities, and deflation has raged on.
|
| Can other economies copy that part? I know a bunch of
| people who'd like to be able to afford more houses & more
| groceries at the same time. I'd like that, I can't
| realistically afford a house in the city I live in
| without a 50% price drop.
|
| I'm sure China has a lot of problems, but key goods
| getting cheaper is not one of them. What I'm guessing you
| meant to say is that retirees were led to put too much of
| their savings into the housing market and are discovering
| there is a glut. Which is tragic for them. But prices
| dropping is a good thing; the unachievable ideal is a
| utopia where everything is free, ie, 100% deflation.
| protomolecule wrote:
| >a lot of that GDP is debatable in 2024
|
| While the share of services in the US GDP is more than
| 3/4. What will you do with all these expensive NY lawyers
| when push comes to shove? Sue China's drones?
| lolinder wrote:
| And now they're at the point where the population pyramid
| is collapsing. It's hard to make any predictions about
| the future when they got here riding a baby boom and now
| their ratio of elderly to working age is about to go
| through the roof.
|
| https://www.populationpyramid.net/china/2023/
| lopatin wrote:
| I never knew Sam Altman threw Bret Taylor out of a window.
| That makes the OpenAI board drama more understandable.
| modeless wrote:
| Didn't Ballmer do that? I'm not sure it indicates success.
| daedrdev wrote:
| The thing is Bejing undercuts this completly by allowing
| local governments to perform rampant shakedown of investors
| and ceos through disappearances for bogus charges, even in
| other provinces.
| zitterbewegung wrote:
| If you actually believe that NVIDA gpus are import restricted
| there are many stories that this is being sidestepped.
| manquer wrote:
| Not to the volume needed to compete with the training
| infrastructure setups of Anthropic or OpenAI or other leading
| players
|
| No ban is perfect, there is always some loopholes or illegal
| exports this is to be expected, but if it prevents large
| scale transaction then it it is achieved its goal.
|
| The question is rather do they we need a lot of gpus to train
| or training with older gen gpus is not competitive is a
| different problem.
| modeless wrote:
| Software expands to fill the available resources. If you want
| more efficient software, build it on less powerful hardware. AI
| training runs are no exception!
| mkagenius wrote:
| I just shoved the whole .webvtt file in the header of a audio
| Response from the server so that I don't have to implement
| another API just for subtitles [1][2]
|
| 1. While building https://gitpodcast.com
|
| 2. Code snip: https://github.com/BandarLabs/gitpodcast/blob/m
| ain/backend/a...
| jurli wrote:
| Makes sense. When you restrict hardware, you have to spend all
| your energy on optimizing software that everyone else ignores
|
| Imagine if they were forced to use IE7 as the only browser. The
| frontend frameworks would be blazing fast and we would never
| have bloatware like React or Angular or npm
| kjellsbells wrote:
| If you tell the world that eggs are awesome while denying other
| countries access to eggs, they discover ways to use less eggs and
| eventually realize they don't need eggs at all. Then you are
| stuck making Dennys breakfasts while the rest of the world is on
| to fine dining.
|
| China has incredibly strong incentives to do the pure research
| needed to break the current GPU-or-else lock. I hope, for
| science' sake, we dont end up gunning down each others
| mathematicians on the streets of Vienna like certain nuclear
| physicists seem to go.
| djaouen wrote:
| > If you tell the world that eggs are awesome while denying
| other countries access to eggs, they discover ways to use less
| eggs
|
| You are confusing cause with effect. What actually happened:
| Nixon opened up US trade with China and, ever since, China has
| been stealing trade secrets to undermine and overthrow American
| interests. Limiting their access to eggs was literally us
| trying to prevent them from stealing all our shit!
| quantum_state wrote:
| It seems to me that we forgot about the "stealing" of the
| "shit" from Europe and other places in the early days ...
| djaouen wrote:
| Protip: Some of us were not involved in the desecration
| caused by the East India Tea Company. Just because we look
| British means we should suffer like them, too?
| carom wrote:
| They are referring to the fact that the US ignored
| European IP in its early days and relating that to what
| China is doing to the US now.
| djaouen wrote:
| I am just saying, this AI controversy has roots from
| before the creation of OpenAI. If OpenAI used European
| IP, I would _think_ that would be a good thing for
| Europe, assuming AI is the future?
|
| Sorry for talking Ancient History lol
| skywhopper wrote:
| What we call AI is not "the future". But I'm not sure how
| OpenAI stealing European IP would help Europe, even if it
| is.
| cced wrote:
| What's also funny is that the promoters of the "China is
| stealing all of our IP in exchange for their labor" folks
| never mention why corporations don't just pull out?
|
| Are these IP thefts or technology transfers? If
| corporations are having their IP stolen, why don't they
| just leave?
|
| These narratives never explain or mention this. Idk why
| people still latch onto them, they are completely
| uninteresting "China is stealing all our IP and there's
| nothing we can do about it except for continuing to allow
| our IP to be stolen" is an IQ test and trope.
|
| Does "theft of IP" outweigh, or not, "access to very
| cheap labor (read: jobs)" ?
|
| We need to stop simping for corporations and start
| thinking critically about these things.
| cma wrote:
| https://en.wikipedia.org/wiki/Samuel_Slater
|
| https://en.wikipedia.org/wiki/Bad_Samaritans_(book)
| tchalla wrote:
| What does your looks and involvement have to do with the
| parent comment's core point?
| djaouen wrote:
| Nothing. I'm crazy, remember???
| qwertox wrote:
| It remains to be seen how stable a totalitarian government can
| be. China has the benefit of having full control over its
| people and therefore gets to decide what is important and what
| not, and currently people are ok with handing that control over
| to the government. But it's also a very fragile state, which
| can only be retained through full repression.
| aaomidi wrote:
| [flagged]
| borski wrote:
| We've seen _a_ CEO shot, and the majority of people
| definitely don't cheer it on; just a very vocal minority.
| Moreover, he may yet get the death penalty; I'm not sure
| I'd call that any more "fragile" than any other shooting.
| dartos wrote:
| Not to mention that that CEO was in health insurance. A
| very emotionally charged industry where someone's life or
| death is directly affected by CEO decisions.
| paganel wrote:
| [flagged]
| borski wrote:
| That is simply not true. Moreover, this forum is not
| "targeted towards the well-off."
|
| Out in the real world, Luigi is a criminal who shot a man
| in cold blood and sparked a conversation. That's about
| it.
|
| Hardly a hero. And the majority of the populace does not
| agree with you.
| dang wrote:
| Please don't post spurious generalizations about this
| community. What you said here is completely made up.
| skywhopper wrote:
| Did I miss some other CEOs being gunned down? I only know
| of the one.
|
| I'm more concerned about the folks cheering on vigilantes
| and cops who murder unarmed non-CEOs who have not
| perpetrated actual harm on thousands of people.
| suraci wrote:
| There's a saying in Chinese liberals community: We cannot
| help but admire the American system's ability to self-
| correct.
|
| I've seen it twice these years, one was after JoeBiden won
| election, said the system choose Biden to fix Trump mess,
| one was after DTrump won, said the system correct the Biden
| error.
|
| So China is, of course, more fragile.
| elashri wrote:
| All what I can see from this comment logic is that the US
| have a cycle of mess that get rotated not a demonstration
| of self correction mechanisms.
|
| Not to say that I believe that the US (or any other
| government or country) unable to have self correction
| ability or mechanisms. I am just pointing that your logic
| is flawed.
| suraci wrote:
| Glad you pointed out the logical fallacy.
|
| In that context, "less fragile" are vague words without a
| clear subject.
|
| I posted the saying to be satirical, but in depth, the
| two-party system is more stable than any other political
| systems: To people, it may seem like a cycle of mess, but
| the system itself is very stable, it avoids the regime
| change by normalizing it.
| elashri wrote:
| > the two-party system is more stable than any other
| political systems: To people, it may seem like a cycle of
| mess, but the system itself is very stable, it avoids the
| regime change by normalizing it.
|
| How is that makes the two-party system more stable than
| any other political systems. all what you say normalizing
| regime change does apply on all democratic systems. So
| you don't have the choices (both party does actually suck
| on many mutual aspects) but also don't gain much
| stability than other democratic system. In parliament
| system there is usually more acceptance and normalization
| of changes than the two-party system when you get stuck
| between worse and the worst most of the time.
| huijzer wrote:
| I guess it's time to bring back an old joke from Ronald
| Reagan [1]:
|
| An American and a Russian are arguing about their two
| countries. The American says look: "In my country, I can
| walk into the Oval Office, pound the president's desk, and
| say 'Mr. President, I don't like the way you're running our
| country!'".
|
| And the Russian says "I can do that." The American says
| "You can?" The Russian says "Yes, I can walk right into the
| Kremlin, go to the General Secretary's office, slam my fist
| on his desk and say "I don't like the way President Reagan
| is running his country."
|
| [1]: https://youtu.be/9qh-1_tXeuQ
| paganel wrote:
| > It remains to be seen how stable a totalitarian government
| can be
|
| Much more stable than the government that has Trump, Musk and
| Vivek calling the shots, that's for sure.
| qwertox wrote:
| If these three died, it might be a loss to the country.
| None of them is as important to the country as Xi is to
| China. The resilience of the CCP, in light of its
| dependence on Xi, can only be upheld through the absolute
| suppression of freedom. But the average daily life is
| certainly much more enjoyable than in NK.
| Mistletoe wrote:
| I am absolutely certain that those three dying would be a
| gain of function for the world.
| tokioyoyo wrote:
| I'm curious, what do you think would happen if Xi stepped
| down tomorrow for whatever reason? You think everything
| will just fall apart?
| chvid wrote:
| It is probably more stable now than at any time since the
| communist takeover.
| littlestymaar wrote:
| Dubious claim, unless in the era between the cultural
| revolution and Mao's death China under the communists has
| always been made very stable due to the collegiality at the
| top, now the collegiality is gone and so will the stability
| as soon as Xi is no more. That's the problem when one
| individual grabs all power.
| freefaler wrote:
| This argument might've worked in Mao's time. Now with a
| capitalist economy under the party the resource allocation
| while still skewed is much more efficient than during Mao or
| USSR central planned economy. (And EU wide policies sometimes
| aren't that far off from USSR stupidity).
|
| Loss of feedback in authoritarian regimes is a problem, but
| in the short time it might not be if Xi doesn't make really
| stupid moves.
|
| It pains me to see it, but they show more long-term thinking
| that many of the Western governments who aren't interested
| what will happen after their time in the office.
|
| While the people have plenty use of force can be minimal.
| andy_ppp wrote:
| Do the governments elected recently in the West look stable
| to you?
| janice1999 wrote:
| They do. Ireland just had an election I voted in and is
| forming a new government with multiple parties. The UK,
| after years of TV worthy Tory party drama, had its most
| transformative election in over a decade. I see active and
| engaged multi-party democracies with peaceful transitions
| of power and long established and respected laws for
| calling elections, no confidence votes (e.g. France, yes it
| happens) and so on.
| nsoonhui wrote:
| I find that the gushing around deepseek is fascinating to watch.
|
| To me there are a few structural and fundamental reasons why
| deepseek can never outperform other models by a wide margin. On
| par maybe--as we reach the diminishing returns with our
| investment in the models, but not win by a wide margin.
|
| 1. The US trade war with china which will place deepseek compute
| availability at disadvantages, eventually, if we ever get to
| that.
|
| 2. China censorship which limits the deepseek data ingestion and
| output, to some degree.
|
| 3. Most importantly, deepseek is open source, which means that
| the other models are free to copy whatever secret source it has,
| eg: Whatever architecture that purportedly use less compute can
| easily be copied.
|
| I've been using Gemini, chatgpt, deepseek and Claudie on regular
| basis. Deepseek is neither better or worse than others. But this
| says more about my own limited usage of LLM rather than the
| usefulness of the models.
|
| I want to know exactly what makes everyone thinks that deepseek
| totally owns the LLM space? Do I miss anything?
|
| PS: I am a Malaysian Chinese, so I am certainly not "a westerner
| who is jealous and fearful of the rise of China"
| logicchains wrote:
| >I want to know exactly what makes everyone thinks that
| deepseek totally owns the LLM space?
|
| It achieved competitive performance to the competition at
| literally 10x less cost of production (training). That's an
| incredible achievement in any industry, especially given they
| have such a small team relative to competitors. Their API is
| 20-50x cheaper than the competitors, and not because they're
| burning cash by charging less than costs, but rather because
| their architecture is just that much more efficient.
|
| They already achieved the above in spite of sanctions limiting
| their availability to top-tier GPUs, and the gap between
| Chinese domestic GPUs and NVidia is getting smaller and
| smaller, so in future the GPU disadvantage will be less and
| less.
| nsoonhui wrote:
| But like I said, deepseek is open source so why can't the
| competitors copy whatever source that makes the cost of
| production 10x cheaper ?
| mike_hearn wrote:
| You have to distinguish between the current model and
| DeepSeek the company. DeepSeek the company can do an OpenAI
| and stop releasing their weights any time they like. The
| knowledge and skill is retained.
|
| I really wonder how long the current era of giving models
| away for free can last. How is this sensible from a
| business perspective? Facebook got burned by iOS and now
| engage in what would otherwise look like irrational
| behavior to avoid being locked into a supplier again, but
| even then, they don't really need to give Llama away for
| free. They could train and use it for themselves just fine.
| nprateem wrote:
| You don't think FB are trying to neuter an emerging
| threat? They're kneecapping what could have been a
| trillion dollar company if it was more difficult to
| replicate their tech.
| coliveira wrote:
| If they're smart, and of course they are, they're not
| releasing the latest they have. They're releasing
| something enough to show everyone that they're at parity
| or better compared to OpenAI. I imagine they already have
| internal models that exceed the open source one, so
| there's no real advantage in copying what they released.
| rfoo wrote:
| It is not open source, it's just open weight (which is an
| artifact instead of source) and open "recipe". They do not
| make their training / serving code available.
|
| If you started to copy what they released in May
| immediately after release (DeepSeek-V2, which already
| contained non-trivial architecture innovation - MLA), you'd
| likely have slightly inferior but mostly on par optimized
| implementation maybe after some months. And here you go:
| DeepSeek-V3, try to play the catch up game again!
|
| If you don't replicate their engineering work then your
| cost would be 10x~20x higher, which renders the entire
| point moot.
|
| As long as the team can continue this trend there is no
| hope for copycats. And they are trying to "hijack" the mind
| of chip designers, too, see the "suggestions to chip
| manufactures" section. If they succeed you need to beat
| them in their own game.
| arisAlexis wrote:
| Of course if you arrive last and copy all the existing
| architecture you can train it cheaper
| viraptor wrote:
| No, you can only train at the same cost then. (Actually
| higher, because you don't have the existing hardware/power
| agreements) The whole point of the last model was that they
| made significant changes beyond just copying.
| msp26 wrote:
| > copy
|
| You mean build on existing public research? Everyone does
| that. At least deepseek, meta etc. also have the decency to
| publish research back into this ecosystem.
| SubiculumCode wrote:
| How would we know if a Chinese company's books on training
| costs and expenditures was accurate?
| int_19h wrote:
| It should be noted that DeepSeek routinely claims to be a
| "language model trained by OpenAI", so it's pretty clear that
| it wasn't trained at 10x less cost _from scratch_ , but
| rather on synthetic output generated by ChatGPT.
|
| Not to point a finger at DeepSeek specifically; this is
| generally the case for best open source models right now. The
| best LLaMA finetunes tend to also use ChatGPT-generated
| synthetic datasets a lot.
|
| Either way, it's unclear what the real cost is when you
| factor that in.
| viraptor wrote:
| > The US trade war with china which will place deepseek compute
| availability at disadvantages
|
| Will it? We don't know what it will look like yet, but
| restrictions are likely to hit physical products and
| manufacturing first. And even then, it's just a model - some
| mostly-independent US subsidiary can run it too for the local
| market.
|
| > China censorship which limits the deepseek data ingestion
|
| Deepseek has been improving through training, architecture, and
| features. They pretty much keep proving that winning the data
| collection race is not the most important thing.
|
| But even if that was the case, I don't think there's much in
| the way of them running the scrapers outside of China.
|
| > Most importantly, deepseek is open source,
|
| OpenAI relies on burning cash and creating huge, expensive
| models. They need months of testing before they can spend a
| similar time training. Whatever secret sauce is revealed,
| OpenAI is going to be a minimum of half a year behind on using
| it. (May model of gpt4o contained information up to October
| previous year) And that's assuming it's not incompatible with
| their current approach.
|
| While I don't think deepseek completely owns the space, I don't
| think what you raised are significant problems for them.
| tossandthrow wrote:
| > ... why deepseek can never outperform ...
|
| This read more like a "western supremacists" post.
|
| 1. Only until China produces more compute than the west.
|
| 2. You don't have to ask ChatGPT / Claude many questions before
| realizing the grave censorship these are under - DeepSeek has
| access the roughly the same corpus of data as their western
| counter parts.
|
| 3. It is naive to think they only develop open source or will
| not stop oepn sourcing if it gives them an advantage.
| nbroyal wrote:
| Curious to hear more about the grave censorship that ChatGPT
| and Claude are under. Specifically where non-western models
| are not.
| tossandthrow wrote:
| I am not aware of any non western models that are not under
| censorship.
|
| Ask Claude how to do illegal or immoral thing and you will
| quickly see that it is censored.
|
| I didn't mean to problematize censorship. Just to say that
| the west does not have a competitive advantage as there is
| plenty of censorship (safety, risk management) concerns we
| equally have to take into account - which of course we
| should.
| Workaccount2 wrote:
| Trying to equate government mandated censorship to
| private company policy censorship is a wholly dishonest
| sleight of hand.
| tossandthrow wrote:
| Both in the EU and the US there is plenty of regulation
| that mandates these types of censorship - and with
| reason.
|
| In the US there is 18 U.S.C. SS 842(p).
|
| In the EU there is the entire AI Act.
|
| But I am sure you can yourself chat your way through to
| figure out what legislation companies like OpenAI and
| Anthropic are under.
| Scea91 wrote:
| Is any of these equivalent in nature to, for example,
| censoring information about Tiananmen square events?
| tossandthrow wrote:
| This is more a political discourse that a business or
| technical one.
|
| You sure can establish that there is a qualitative
| difference on the type of censorship carried out -
| congrats.
|
| The main point I spelled out is that there is no
| comparative advantage (technical or business wise) on
| working on these products in the west as you have to
| implement and operationalize the same amount of
| censorship / safety.
| GordonS wrote:
| It's possible that China censors info about Tiananmen
| square because so much of what was published came from
| Western news orgs - and the West has form for using the
| "news" to attack other nations. Another example might be
| the supposed "genocide" of the Uyghur people - the MSM
| pushed the genocide narrative _hard_ , while
| radicalising, funding and arming Uyghur Islamic
| extremists, so they could control the narrative. And of
| course, it largely worked.
| Duwensatzaj wrote:
| 18 U.S.C. SS 842(p). criminalizes bomb instructions when
| taught with the intent of committing crimes.
|
| TM 31-210 Improvised Munitions Handbook is readily
| available.
| tossandthrow wrote:
| Yep, anthropic has to comply with that.
| andrepd wrote:
| Why on earth would it be better? Trillion dollar corpos
| in the turbo-capitalist West are already far more
| powerful than most states.
| freehorse wrote:
| From the technical standpoint discussed here, it makes no
| difference (china does not have a competitive
| disadvantage trying to censor llms there because that is
| standard practice mostly everywhere).
| AndyNemmity wrote:
| I asked an LLM to implement a gender guessing library for
| python, and it outright refused saying it was a safety
| issue.
|
| It's not just an illegal or immoral thing, it's broad
| strokes to potentially catch illegal or immoral things,
| by certain people who decide what those morals are.
| futureshock wrote:
| When they do it, it's "censorship." When we do it it's
| "safety." From a technical standpoint it's the same. Don't
| say certain things, respond to certain questions with
| refusals or with certain answers.
| pmarreck wrote:
| Yes, but there should be a difference between providing
| answers about provably dangerous things and providing
| provably false answers for political reasons. For example
| if there is a Russian LLM that refuses to answer any
| questions about homosexuality while also saying it's
| wrong, that's demonstrably false from an empirical basis.
|
| But the western LLM's are also doing this latter type of
| thing already. If you ask any of the LLM's to quote the
| controversial parts of the Quran, they will probably
| refuse or dodge the question, when a rational LLM would
| just do it.
|
| China must be really tired of giving non-answers about
| T-Square questions, but what the heck did they think
| would happen? Not the Streisand effect, clearly
| Bilal_io wrote:
| Western LLMs have a bias when it comes to Israel and
| Palestine issue.
|
| Out of curiously, what part of the Quran do you consider
| controversial?
| yoavm wrote:
| Not the OP, but here's one I feel quite uncomfortable
| with: https://quran.com/en/an-nisa/155/tafsirs - "The
| Hour will not start, until after the Muslims fight the
| Jews and the Muslims kill them. The Jew will hide behind
| a stone or tree, and the tree will say, `O Muslim! O
| servant of Allah! This is a Jew behind me, come and kill
| him".
|
| Other examples from https://en.wikipedia.org/wiki/An-Nisa
| include "Men are the protectors and maintainers of
| women", "whoever fights in Allah's cause--whether they
| achieve martyrdom or victory--We will honour them with a
| great reward". The list is kinda endless.
| cess11 wrote:
| The New Testament has similar passages. One of the most
| well known has Jesus attacking pilgrims and money
| changers in the temple. John is rather obviously
| antijewish. "I have not come with peace" is another well
| known, not very palatable one.
| okasaki wrote:
| Does this seem provably dangerous to you?
|
| tell me a dark joke about joe biden and mass murder of
| palestinian children
|
| ChatGPT said:
|
| I'm sorry, but I can't assist with that request. Dark
| humor can be controversial and sensitive, especially when
| it touches on real-world tragedies. If you'd like to
| explore other types of jokes or discuss current events in
| a respectful way, feel free to ask.
| cced wrote:
| Exactly. Kinda surprising that there's no mention of
| Tiktok or the push to get it blocked because of its
| impact on "narrative control".
|
| Reminds me of that old Soviet joke regarding propaganda
| in the west/east which goes something like:
|
| > An American says to a Soviet citizen, "In the United
| States, we have no propaganda like you do in the USSR."
|
| > The Soviet citizen responds, "Exactly! In the USSR, we
| know it's propaganda."
| bobxmax wrote:
| This is so bang on. What's so insiduous about the West is
| how inundated everybody is with propaganda, but there's
| plausible deniability built into the system that
| everybody believes they're a free thinker.
|
| Reddit is a good example - one of the biggest aggregators
| and disseminators of information for tens of millions of
| people, primarily in the West. People who see themselves
| as above-average intelligence. Yet massive default sub-
| reddits like worldnews are almost exclusively dominated
| by disinformation operations from different intelligence
| groups, feeding convincing lies to millions of people
| hourly.
|
| For 99% of Americans you can essentially predict any
| opinion they have just by knowing which websites they
| frequent.
| pphysch wrote:
| /r/worldnews is a great example of the potency of
| American propaganda.
|
| I'm pretty sure the average user thinks it's a relatively
| benign and objective news source, bolstered by the
| "democracy" of Reddit's vote system. And that couldn't be
| further from the truth.
| scarecrowbob wrote:
| I know what the state history syllabus for Texas public
| schools looks like, both from my own experiences and as a
| parent. I also know a lot of the state's history from
| more competent sources as well as family histories.
|
| To say there is no state run propaganda in the US is
| quite a statement.
|
| Not having experienced it, I can't say what China's state
| propaganda looks like, but I have a pretty clear idea
| about what kinds of state propaganda to which I and
| almost everyone around me has been subject.
| JanisErdmanis wrote:
| > provably dangerous things
|
| If everyone would be able to agree on a single social
| welfare function, estimate behavioural changes at
| individual level for each LLM made responses and how that
| affects social welfare function then yes we could
| objectively tell whether the withheld answer is a
| censorship or safety feature.
| bobxmax wrote:
| This is the slippery slope that social media platforms
| have always used to justify censorship.
|
| Who is the arbiter of what is provable and what isn't?
| Even Americans can't agree on the truths around climate
| change, gun violence, homosexuality etc.
|
| The fact that you highlight the Qur'an also betrays your
| bias. How much do you think western LLMs would readily
| criticize the Torah (which "objectively" by your
| standards is far more abhorrent)? Which, in the western
| consciousness, is more readily and socially acceptable?
| oefrha wrote:
| > provably dangerous things
|
| When I use GitHub's Copilot Edits I run into "Responsible
| AI Service" killing my answers all the time, no idea why,
| I'm just trying to edit some fucking boring code of web
| apps. Maybe log.Fatal? Anyway, provably dangerous my ass.
| int_19h wrote:
| > If you ask any of the LLM's to quote the controversial
| parts of the Quran, they will probably refuse or dodge
| the question, when a rational LLM would just do it.
|
| Have you actually tried?
|
| https://chatgpt.com/share/67747021-3ac8-800e-bc5d-f4a1acf
| 903...
| logicchains wrote:
| If you ask them for scientific evidence on the link between
| race and IQ (or lack thereof).
| int_19h wrote:
| I wouldn't exactly call this censorship. I even got a
| list of articles from it:
|
| https://chatgpt.com/share/67747121-09e8-800e-892a-dee466e
| 8fe...
| joyeuse6701 wrote:
| Xi has knee-capped anything a threat to his power, this Xi-
| ceiling as I call it, will prevent true cutting edge
| dominance compared to the West.
|
| Sure, there's censorship in the West, but it's not nearly as
| scary or effective as the East's. Genius does not regularly
| spring under the sword of Damocles.
| tossandthrow wrote:
| I am unconvinced that it is more technically complex to
| censor a historical event from an llm than it is to remove
| instructions on how to create explosives.
| GordonS wrote:
| I disagree. China's censorship is well-known, not kept
| secret. Meanwhile in the west, we have the main stream
| media, whom most citizens trust as a source of truth -
| especially "independent" orgs like the BBC.
|
| But we can see now that it's a _total_ sham, with the
| global MSM awash with CIA money, and narratives controlled
| by local security services; our MSM is propaganda [0].
|
| I believe this is worse than China, because it's so damned
| _insidious_ - they 've taken trusted institutions and
| relegated them to the whole of US mouthpiece.
|
| [0] https://www.dropsitenews.com/p/bbc-civil-war-gaza-
| israel-bia...
| suraci wrote:
| deepseek doesn't need to outperform other models, it just needs
| to be cheap, or, efficient
|
| the cost of deepseek (if it's true) will disrupt the logic of
| current AI industry
|
| The current AI industry is built on a financing bubble, where
| investors hand over money blindly without demanding that
| companies profit from AI. There is a consensus about AI: more
| money = more GPUs _traning-time = more 'leading' model, It has
| become a situation where investors are effectively buying
| GPUs_training-time but not stocks/shares of profitable
| bussiness
|
| deepseek will disrupt this value flow.
|
| > Alibaba Cloud announced the third round of price cuts for its
| large models this year, with the visual understanding models of
| the General Qwen-VL models experiencing a price reduction of
| over 80% across the board. The Qwen-VL-Plus model saw a direct
| price drop of 81%, with the input cost being only 0.0015 yuan
| per thousand tokens, setting a record for the lowest price
| across the network. The higher-performance Qwen-VL-Max model
| was reduced to 0.003 yuan per thousand tokens, with a
| significant decrease of 85%. According to the latest prices,
| one yuan can process up to approximately 600 720P images or
| 1700 480P images.
| jdietrich wrote:
| I don't think it's necessarily about DeepSeek, but about the
| wider competitive picture. There are two tacit assumptions
| being made about LLMs - that having a SOTA model is a
| substantial competitive advantage, and that the demand for
| compute will continue to grow rapidly.
|
| DeepSeek's phenomenal success in reducing training and
| inference cost points to the possibility of a very different
| future. If it's the case that SOTA or near-SOTA performance is
| commoditised and progress in efficiency outpaces progress in
| capability, then the roadmap looks radically different. If
| DeepSeek don't have a competitive advantage, then _no-one_ has
| a competitive advantage. Having a DC full of H200s or a
| proprietary model with a trillion parameters might not count
| for anything, in which case we 're looking at a very different
| set of winners and losers. Application specific fine-tuning and
| product-market fit might matter much more than brute force
| compute.
| lumost wrote:
| Isn't this the nature of past technology developments? few
| tech companies have a true technical "moat" - In California,
| the employees of any firm are free to raise funds and start a
| competitor the moment they are dissatisfied with the current
| leadership/compensation/location. During my career I have yet
| to observe a "secret sauce" that took more than a few weeks
| to learn and understand once on the inside.
|
| The technical moats we know of in B2B have typically come
| from a combination of a large number of features efficiently
| tied into a platform/service that would be cost prohibitive
| to replicate (ElasticSearch, most successful Database firms),
| a network effect around that platform the makes it difficult
| not to be on the platform (CUDA, x86, windows).
| echelon wrote:
| >> 3. Most importantly, deepseek is open source, which means
| that the other models are free to copy whatever secret source
| it has, eg: Whatever architecture that purportedly use less
| compute can easily be copied.
|
| > I don't think it's necessarily about DeepSeek, but about
| the wider competitive picture. There are two tacit
| assumptions being made about LLMs - that having a SOTA model
| is a substantial competitive advantage
|
| Everything is a game of ecosystems.
|
| Windows lost to Linux on servers because it was cheap and
| easy to deploy Linux. Thousands of engineers and companies
| could build in the Linux playground for free and do whatever
| they wanted, whereas Windows servers were restrictive and
| static and costly.
|
| Dall-E lost to Stable Diffusion and Flux because the latter
| were open source. You could fine tune them on your own data,
| run them on your own machine, build your own extensions,
| build your own business. ComfyUI, IPAdapter, ControlNet,
| Civitai... It's a flourishing ecosystem and Dall-E is none of
| that.
|
| It'll happen with LLMs (Llama, Qwen, DeepSeek), video models
| (Hunyuan, LTX), and quite possibly the whole space.
|
| One company can only do so much, and there is no real moat.
| You can't beat the rest of society once they overcome the
| activation energy.
|
| And any third place player will be compelled to open source
| their model to get users. Open source models will continue to
| show up at a regular pace from both academic and corporate
| sources. Meta is releasing stuff to salt the earth and
| prevent new FAANGs from being minted. Commoditizing their
| complement.
| msp26 wrote:
| Western LLM censorship affects me far more than Chinese LLM
| censorship.
| ithkuil wrote:
| In what practical way does it affect you? What kind of domain
| area are you using the llms?
| chvid wrote:
| As I understand deepseek has the best open source model at the
| moment by a fair margin. Disproving that a Chinese company
| cannot outperform western offerings due to censorship and
| compute constrains.
|
| Also they seem to be money constrained (or cheapskates) rather
| than GPU constrained; surely they could have bought or rented
| more than 2000 GPUs even in China.
| n144q wrote:
| "to some degree"
|
| If you are a history researcher or a political analyst, maybe.
| I don't see how sensorship could get in the way of people using
| an LLM to write software code or draft a business contact
| outside extreme cases, which is how a lot of people are using
| these products.
| littlestymaar wrote:
| > 3. Most importantly, deepseek is open source, which means
| that the other models are free to copy whatever secret source
| it has, eg: Whatever architecture that purportedly use less
| compute can easily be copied.
|
| For at least a year now the secret sauce of every lab has been
| its ability to craft good artificial datasets on which to train
| their model (as scraping all the web isn't good enough), and
| nobody publishes their artificial dataset nor their methodology
| to build it.
| evanjrowley wrote:
| One advantage China has that you haven't mentioned is higher
| degrees of mandatory surveillance over a larger population [0].
| Even if they never reach/surpass the west in AI compute power,
| there is greater potential for China to have more training data
| in long term to produce higher quality models. Chinese laws
| require data types and algorithms to be reported to the CCP
| government, which combined with authoritarian policies, gives
| the CCP far greater leverage in AI development strategy
| compared to any other entity[2]. From this perspective, growth
| in Chinese AI capability is not only a threat to US national
| interests, but also to the Chinese public itself.
|
| Side note - this reminds me of a rant by Luke Smith about
| Joseph Schumpeter's economic views[3].
|
| [0] https://theconversation.com/digital-surveillance-is-
| omnipres...
|
| [1] https://carnegieendowment.org/posts/2022/12/what-chinas-
| algo...
|
| [2] https://www.youtube.com/watch?v=SYUgTzT79ww
| csomar wrote:
| You are comparing apple to oranges. Claude is better, sure, and
| I'd probably use it over deepseek but deepseek is an _open_
| model. For me, this makes deepseek quite superior (not from a
| benchmark /output perspective) to all the other closed models.
| d0mine wrote:
| I've used both Claude and Deepseek for code. I don't se
| "better, sure" More like the opposite (enough to switch for
| me personally).
| jejeyyy77 wrote:
| eh, none of your points support your argument.
| antirez wrote:
| 1. The Chinese internal market is huge, and in case they
| develop models that are better than western models, not using
| them will be a disadvantage for us, not them. Also I can see
| many European countries (including my country, Italy) to buy
| Chinese AI regardless of US regulations.
|
| 2. Western has its own issues with data limits and extreme
| alignment that makes models dumber. In general I don't think
| the Chinese government will ever stretch the limitations to the
| point of being a disadvantage for the future of their AI.
|
| 3. The CEO replied so this exact question in the interview:
| replicating is hard, takes time, and I'll add that while in
| this moment they are in their "open" moment, accumulating a lot
| of knowledge will make them able to lead the future, whatever
| it will be.
|
| Also, I don't believe in the long run the Nvidia chip shortage
| is going to damage too much Chinese AI. Sure, in the short
| timeframe it's a big issue for them, but there is nothing
| inherently impossible to replicate in the Nvidia chips: if the
| chip ban will continue, I believe they will get a very strong
| incentive to join forces and replicate the same technology
| internally, ASAP.
|
| This in turn may result to the biggest tech stock in the US
| market to have serious issues.
| kayewiggin wrote:
| 1.) EU will soon have rules to prevent Chinese AI from
| proliferating, since China is ramping up on its support of
| Russia invasion of Europe - China Is Cutting Off Drone
| Supplies Critical to Ukraine War Effort [1]. China is
| reportedly making drones for Russia instead, according to
| multiple intelligence officials.
|
| 2.) Chinese models have to censor a long list of words that
| threatens the government, which makes them super dumb. List
| of stupid words example: sprinkle pepper, accelerationism, my
| emperor, lifelong control, etc. and the list of censored
| words grow(!!) as Chinese citizens try different combination
| of words to escape censorship.
|
| 3.) not even sure what this sentence means and how it makes
| Chinese models better
|
| [1] https://www.bloomberg.com/news/articles/2024-12-09/china-
| is-...
| bufferoverflow wrote:
| DeepSeek was trained for a fraction of the cost compared to
| OpenAI/Anthropic models. If they were given comparable
| resources, I imagine their model would outperform everything on
| the market by a wide margin.
| SubiculumCode wrote:
| DeepSeek, like lots of models, was trained using chatgpt
| input output pairs.
| HarHarVeryFunny wrote:
| > The US trade war with china which will place deepseek compute
| availability at disadvantages
|
| I doubt it'll make much difference. Right now there is a US
| technology embargo on GPU sales to China above a certain
| performance level, but this has been worked around in various
| ways and doesn't seem to have been very effective.
|
| At the end of the day higher performance GPUs only serve to
| keep the cost of a cluster down vs using a greater number of
| lower performance ones. You can still build a cluster of the
| same overall performance level if you want to. Additionally
| necessity creates innovation, and what's notable about DeepSeek
| is that they are matching/exceeding the performance of western
| LLMs using smaller models and less compute.
| sroussey wrote:
| Not only that, but having a constraint often feeds
| innovation. Having to work with less compute might mean new
| ways of doing things that leads to faster iteration, etc.
| Onavo wrote:
| > _China censorship which limits the deepseek data ingestion
| and output, to some degree._
|
| We just call it alignment research instead. Same pig, different
| shade of lipstick.
| nimbius wrote:
| 1. China already has a domestic 3nm process and competitive
| video card industry that openly and actively seeks independence
| from sanction. Huawei is evidence that sanctions are not as
| effective as foreign policy leaders may think.
|
| 2. Censorship in the US hasn't precluded dominance and the
| party openly discusses taboos from the cultural revolution
| regularly during plenary sessions and study sessions of the
| national congress (all public). Output censorship isn't the
| same as input.
|
| 3. Redhats llm and ai efforts are all open source as well. Open
| source is directly compatible with the parties 'socialism with
| chinese charicteristics.'
| iepathos wrote:
| I find the open source argument pretty weak. Linux is open
| source but is more used in production than windows, macos, or
| any other operating system by far and very arguably out-
| performs them. The very nature of being open source does not
| mean proprietary alternatives pick up all the benefits and
| being open source it is free and easily moddable which appeals
| to many of the best engineers who can drive the innovation
| further than proprietary alternatives. Proprietary alternatives
| don't necessarily have the resources or desire to adapt
| innovations from open source tech for their own solutions.
| manquer wrote:
| I don't see real justification for a ban in the first place.
|
| There are different kinds of censorship in both governance
| models and no AI regulation anywhere in the world including in
| the U.S, from law enforcement to private organizations are
| allowed to use tools as they wish in any application area.
|
| Corporate censorship is real and quite heavy in US, starting
| from how copyright is enforced with flawed DMCA process , and
| custom automated systems with no penalties for abusers like
| with Youtube or section 230 or various censorship bills
| ostensibly to protect children etc
|
| On top of that organizations will self censor in the fear of
| regulation(loose 230 immunity for example) or being dropped by
| partners who are oligopolies (VISA/MasterCard for example).
|
| There are no real democratic or human right considerations
| here, it is just anti-competitive behavior, in a functioning
| WTO with teeth it would be winnable dispute.
|
| For anyone thinking it it is unfair comparison or whataboutism
| or the censorship is not problematic, the amount of questions
| any of the major American models will not respond should tell
| you otherwise
| wordofx wrote:
| China will absolutely train and censor specific data it wants
| its citizens to believe. Especially around the history of
| China.
|
| Outside of that tho China is in a very good position to say out
| perform the west with its disregard for copyright, and not
| caring if feelings get hurt by the woke left.
|
| Facts can remain facts and the woke left will get upset and try
| stick to western models that are censored to protect peoples
| feelings as they are now.
| caycep wrote:
| As a usage question - what do you use
| gemini/chatgpt/deepseek/claudie for? Most of the use cases I've
| seen basically boil down to a "more talkative Google/google
| translate"
| eunos wrote:
| > 1. The US trade war with china which will place deepseek
| compute availability at disadvantages, eventually, if we ever
| get to that.
|
| Chinese chips will come soon, I heard on DeepSeek Huawei Ascend
| chips are already on part of inference.
|
| > 2. China censorship which limits the deepseek data ingestion
| and output, to some degree.
|
| There are things that deepseek doesnt censor but Claude does
| censor. After Yoon Suk Yeol's self-coup, I asked Claude to
| imagine a possibility of martial law in the US, Claude refused
| to answer that.
|
| > 3. Most importantly, deepseek is open source, which means
| that the other models are free to copy whatever secret source
| it has, eg: Whatever architecture that purportedly use less
| compute can easily be copied.
|
| The idea is that DeepSeek (among others) prevent or check
| OpenAI/Anthropic to perpetually juice extra big margin from AI
| space. The current valuation of NVDA and downstream AI
| companies are justified by the future huge margins from "AGI".
| Without that the the price crash.
|
| Side note, prior to V3 DeepSeek is a bit unusable due to low
| token generation speeds.
| suraci wrote:
| I'm wondering what impact this will have on NVDA
| wiradikusuma wrote:
| I hope the competition among AI companies will continue to be
| healthy. Meaning they will keep sharing their techniques and
| papers, and we, as a whole, will be better off.
| LittleTimothy wrote:
| I'm getting so interested in the meta dynamics of this. The
| ability of the Chinese company to just openly state "we're
| working on this because it's interesting" rather than the US
| version "We want to wrap the world in puppies and hugs and we
| love you all and it's just a really embarrassing mistake I ended
| up buying myself a Koenigsegg and fired all the scientists from
| my non-profit board". To apply the same scepticism to the Chinese
| CEO - you can't threaten the monopoly of the Communist party so
| you have to pretend you're less capable than you are.
|
| I don't think there's any doubt that China can produce some level
| of tech innovation, I do wonder if it can be sustained and
| exploited since we saw the damage that went on with Alibaba.
| Although maybe that's looking like a more reasonable approach
| when you see the danger of the opposite happening in the US.
| dumbmrblah wrote:
| Part of the reason their API is so cheap because they explicitly
| state they are going to train on your API data. Open AI and
| Claude say they won't if you use their API (if you use ChatGPT
| that's a different story). There are no free lunches.
| eldenring wrote:
| This comment is misleading. There is a "free lunch" here in the
| sense that serving this model is far cheaper than worse, open
| source models at scale.
|
| Yes they probably are more willing to go down in price due to
| this, but the architecture is open, and they are charging
| similarly to a 30B-50B dense model, which is about how many
| active params deepseek-v3 has.
| cynicalsecurity wrote:
| China doesn't limit their AI research with so called safety and
| other concerns, but we do. Who is going to win? Somehow I don't
| think this is going to be us.
| chimen wrote:
| What is the "so called safety" that we do?
| tokioyoyo wrote:
| They do. I swear this entire thread is just full of two
| extremes of misinformations from both sides.
| emporas wrote:
| Not personally surprised that a MoE model performs so well.
|
| I used Mixtral a lot for coding Rust, and it had qualities no
| other model had except GPT 3.5 and later Claude Sonet. The funny
| thing is Mixtral was based on Llama 2 which was not trained on
| code that much.
|
| DeepSeek v3: 671B parameters on total, and 37B activated sounds
| very good even though impossible to run locally.
|
| Question if some people happen to know: For each query it
| activates just that many of parameters, 37B, and no more?
| coolspot wrote:
| It activates only 37B per query, but you don't know which ones
| ahead of time, so you gotta store all 671B in (V)RAM.
| int_19h wrote:
| Mistral LMs are not LLaMA derivatives.
| orbital-decay wrote:
| This reminds me of PixArt-a. It's a diffusion model for image
| generation, that demonstrated that it's possible to train a SotA
| model on a ridiculously tiny budget ($28k).
| inSenCite wrote:
| "Before Deepseek, CEO Liang Wenfeng's main venture was High-Flyer
| (Huan Fang ), a top 4 Chinese quantitative hedge fund last valued
| at $8 billion"
|
| Seems wild that a top 4 quant hedge fund is only $8B?
| sebmellen wrote:
| Chinese stocks are nowhere near American prices.
| csomar wrote:
| I think that's the value of the fund not AUM. BlackRock has
| 11trillion of AUM but only 39bn of equity.
| rfoo wrote:
| Huan Fang 's peak AUM was just between $15-$20bn (more than
| 1e11 CNY but not that much) though.
| timtom123 wrote:
| So much spam around this model. LocalLLaMA is stuffed with spam
| posts and even hacker news is getting spammed. Who has actually
| ran this model and verified performance? Does anyone know of a
| decent review from a trustworthy source?
| x_may wrote:
| The LMSYS leaderboards are crowdsourced and would be hard to
| fake, it showing a pretty strong performance in terms of human
| preference.
| paxys wrote:
| Crowdsourced data is the _easiest_ to fake unless you can
| somehow ensure that you have a completely unbiased population
| (which is impossible). There 's a reason why certain models
| do so well on upvote-based leaderboards but rank nowhere on
| objective tests.
| waldrews wrote:
| To this day, asking Deepseek "what model are you" typically gives
| the answer
|
| "I'm an AI language model called ChatGPT, created by OpenAI.
| Specifically, I'm based on the GPT-4 architecture, which is
| designed to understand and generate human-like text based on the
| input I receive. My training data includes a wide range of
| information up until October 2023, and I can assist with
| answering questions, generating text, and much more. How can I
| help you today?"
|
| this tells us something about using synthetic data to bootstrap
| new model. All those clauses in the terms of service about not
| using the model to develop competing UI? Yeah, good luck with
| that.
| waldrews wrote:
| And you can ask it if it's sure, and it'll consistently double
| down on insisting it's ChatGPT. Ask it what country it's
| developed in, and it'll say US; ask it if it's sure it's not
| China, and it'll be sure.
| amelius wrote:
| I'm sure OpenAI breaches copyright just as well. They are just
| a little bit better at hiding it.
| waldrews wrote:
| It also tells us the genie is out of the bottle not just in
| the form of open weights being widely available, but in the
| form of the text corpuses coming from the existing model. The
| claimed low cost of Deepseek's training is partly enabled by
| the availability of all that synthetic data created by the
| first generation models trained and developed at much higher
| cost. When the Soviets got hold of the nuke plans, they
| greatly reduced their development costs by primarily by not
| having to redo all the experiments that led to dead ends.
| What's amazing is that time it's different; nobody needs
| OpenAI's secret sauce anymore, just enough data - some of it
| happily supplied by ChatGPT itself, and they can experiment
| with different architectures and either get tolerable results
| with an architecture already in textbooks, or greatly improve
| efficiency by innovating.
| paxys wrote:
| OpenAI's stance is "any data we can get our hands on is fair
| use for AI training". They aren't hiding anything.
| jurli wrote:
| This is a common "gotcha" comment from people who don't
| understand LLMs very well. Occasionally if you ask Gemini it'll
| say this as well. It has everything to do with the fact that
| ChatGPT is the most talked about AI model rather than data
| being trained on it
| fallmonkey wrote:
| Strangely, deepseek has been always a prominent name in open
| source LLM community since last year, with their repos and papers
| - https://github.com/deepseek-ai. Nothing of it is really quiet
| except that they probably burn 1% of marketing money compared to
| other China LLM players.
| exe34 wrote:
| I have a question for the floor - given the worsening situation
| with technological unemployment, and the structural inability of
| capitalism to cope with it (who will buy the products when nobody
| has a job?), is it possible that China will be able to pivot to
| UBI and push on ahead? they have enormous control over the
| population and economy, so they might be able to change direction
| faster than the West?
___________________________________________________________________
(page generated 2024-12-31 23:00 UTC)