[HN Gopher] DeepSeek v2.5 - open-source LLM comparable to GPT-4,...
___________________________________________________________________
DeepSeek v2.5 - open-source LLM comparable to GPT-4, but 95% less
expensive
Author : jchook
Score : 91 points
Date : 2024-10-30 19:24 UTC (3 hours ago)
(HTM) web link (www.deepseek.com)
(TXT) w3m dump (www.deepseek.com)
| viraptor wrote:
| Why say comparable when gpt4o is not included in the comparison
| table? (Neither is the interesting Sonnet 3.5)
|
| Here's an Aider leaderboard with the interesting models included:
| https://aider.chat/docs/leaderboards/ Strangely, v2.5 is below
| the old v2 Coder. Maybe we can count on v2.5 Coder being released
| then?
| uxhacker wrote:
| It's interesting to see a Chinese LLM like DeepSeek enter the
| global stage, particularly given the backdrop of concerns over
| data security with other Chinese-owned platforms, like TikTok.
| The key question here is: if DeepSeek becomes widely adopted,
| will we see a similar wave of scrutiny over data privacy?
|
| With TikTok, concerns arose partly because of its reach and the
| vast amount of personal information it collects. An LLM like
| DeepSeek would arguably have even more potential to gather
| sensitive data, especially as these models can learn from and
| remember interaction patterns, potentially accessing or
| "training" on sensitive information users might input without
| thinking.
|
| The challenge is that we're not yet certain how much data
| DeepSeek would retain and where it would be stored. For countries
| already wary of data leaving their borders or being accessible to
| foreign governments, we could see restrictions or monitoring
| mechanisms placed on similar LLMs--especially if companies start
| using these models in environments where proprietary information
| is involved.
|
| In short, if DeepSeek or similar Chinese LLMs gain traction, it's
| quite likely they'll face the same level of scrutiny (or more)
| that we've seen with apps like TikTok.
| mlyle wrote:
| An open source LLM that is being used for inference can't
| "learn from or remember" interaction patterns. It can operate
| on what's in the context window, and that's it.
|
| As long as the actual packaging is just the model, this is an
| invalid concern.
|
| Now, of course, if you do inference on anyone else's
| infrastructure, there's always the concern that they may retain
| your inputs.
| wongarsu wrote:
| You can run the model yourself, but I wouldn't be surprised
| if a lot of people prefer the pay-as-you-go cloud offering
| over spinning up servers with 8 high-end GPUs. It's fair to
| caution that doing might be handing over your data to China.
| fkyoureadthedoc wrote:
| Is ChatGPT posting on HN spreading open model FUD!?
|
| > especially as these models can learn from and remember
| interaction patterns
|
| All joking aside, I'm pretty sure they can't. Sure the hosted
| service can collect input / output and do nefarious things with
| it, but the model itself is just a model.
|
| Plus it's open source, you can run it yourself somewhere. For
| example, I run deepseek-coder-v2:16b with ollama + Continue for
| tab completion. It's decent quality and I get 70-100 tokens/s.
| nprateem wrote:
| As in significantly worse than..?
| joshhart wrote:
| The benchmarks compare it favorably to GPT-4-turbo but not
| GPT-4o. The latest versions of GPT-4o are much higher in quality
| than GPT-4-turbo. The HN title here does not reflect what the
| article is saying.
|
| That said the conclusion that it's a good model for cheap is
| true. I just would be hesitant to say it's a great model.
| A_D_E_P_T wrote:
| Not only do I completely agree, I've been playing around with
| both of them for the past 30 minutes and my impression is that
| GPT-4o is significantly better across the board. It's faster,
| it's a better writer, it's more insightful, it has a much
| broader knowledgebase, etc.
|
| What's more, DeepSeek doesn't seem capable of handling image
| uploads. I got an error every time. ("No text extracted from
| attachment.") It claims to be able to handle images, but it's
| just not working for me.
|
| When it comes to math, the two seem roughly equivalent.
|
| DeepSeek is, however, politically neutral in an interesting
| way. Whereas GPT-4o will take strong moral stances, DeepSeek is
| an impressively blank tool that seems to have no strong
| opinions of its own. I tested them both on a 1910 article
| critiquing women's suffrage, asking for a review of the article
| and a rewritten modernized version; GPT-4o recoiled, DeepSeek
| treated the task as business as usual.
| theanonymousone wrote:
| Thanks for sharing. How about 4o-mini?
| tkgally wrote:
| > DeepSeek ... seems to have no strong opinions of its own.
|
| Have you tried asking it about Tibetan sovereignty, the
| Tiananmen massacre, or the role of the communist party in
| Chinese society? Chinese models I've tested have had quite
| strong opinions about such questions.
| jchook wrote:
| I updated the title to say GPT-4, but I believe the quality is
| still surprisingly close to 4o.
|
| On HumanEval, I see 90.2 for GPT-4o and 89.0 for DeepSeek v2.5.
|
| - https://blog.getbind.co/2024/09/19/deepseek-2-5-how-does-
| it-...
|
| - https://paperswithcode.com/sota/code-generation-on-humaneval
| jyap wrote:
| This 236B model came out around September 6th.
|
| DeepSeek-V2.5 is an upgraded version that combines
| DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
|
| From: https://huggingface.co/deepseek-ai/DeepSeek-V2.5
| genpfault wrote:
| > To utilize DeepSeek-V2.5 in BF16 format for inference, 80GB*8
| GPUs are required.
| Alifatisk wrote:
| Oh wow, it almost beats Claude3 Opus!
| ziofill wrote:
| What about comparisons to Claude 3.5? Sneaky.
| khanan wrote:
| Did you try to ask it if Winnie the pooh look like the president
| of China?
| gdevenyi wrote:
| What does open source mean here? Where's the code? The weights?
| yieldcrv wrote:
| tl;dr not even close to closed source text-only modes, and a
| lightyear behind the other 3 senses these multimodal ones have
| had for a year
| zone411 wrote:
| In my NYT Connections benchmark, it hasn't performed well:
| https://github.com/lechmazur/nyt-connections/ (see the table).
___________________________________________________________________
(page generated 2024-10-30 23:00 UTC)