[HN Gopher] DeepSeek v2.5 - open-source LLM comparable to GPT-4,...
       ___________________________________________________________________
        
       DeepSeek v2.5 - open-source LLM comparable to GPT-4, but 95% less
       expensive
        
       Author : jchook
       Score  : 91 points
       Date   : 2024-10-30 19:24 UTC (3 hours ago)
        
 (HTM) web link (www.deepseek.com)
 (TXT) w3m dump (www.deepseek.com)
        
       | viraptor wrote:
       | Why say comparable when gpt4o is not included in the comparison
       | table? (Neither is the interesting Sonnet 3.5)
       | 
       | Here's an Aider leaderboard with the interesting models included:
       | https://aider.chat/docs/leaderboards/ Strangely, v2.5 is below
       | the old v2 Coder. Maybe we can count on v2.5 Coder being released
       | then?
        
       | uxhacker wrote:
       | It's interesting to see a Chinese LLM like DeepSeek enter the
       | global stage, particularly given the backdrop of concerns over
       | data security with other Chinese-owned platforms, like TikTok.
       | The key question here is: if DeepSeek becomes widely adopted,
       | will we see a similar wave of scrutiny over data privacy?
       | 
       | With TikTok, concerns arose partly because of its reach and the
       | vast amount of personal information it collects. An LLM like
       | DeepSeek would arguably have even more potential to gather
       | sensitive data, especially as these models can learn from and
       | remember interaction patterns, potentially accessing or
       | "training" on sensitive information users might input without
       | thinking.
       | 
       | The challenge is that we're not yet certain how much data
       | DeepSeek would retain and where it would be stored. For countries
       | already wary of data leaving their borders or being accessible to
       | foreign governments, we could see restrictions or monitoring
       | mechanisms placed on similar LLMs--especially if companies start
       | using these models in environments where proprietary information
       | is involved.
       | 
       | In short, if DeepSeek or similar Chinese LLMs gain traction, it's
       | quite likely they'll face the same level of scrutiny (or more)
       | that we've seen with apps like TikTok.
        
         | mlyle wrote:
         | An open source LLM that is being used for inference can't
         | "learn from or remember" interaction patterns. It can operate
         | on what's in the context window, and that's it.
         | 
         | As long as the actual packaging is just the model, this is an
         | invalid concern.
         | 
         | Now, of course, if you do inference on anyone else's
         | infrastructure, there's always the concern that they may retain
         | your inputs.
        
           | wongarsu wrote:
           | You can run the model yourself, but I wouldn't be surprised
           | if a lot of people prefer the pay-as-you-go cloud offering
           | over spinning up servers with 8 high-end GPUs. It's fair to
           | caution that doing might be handing over your data to China.
        
         | fkyoureadthedoc wrote:
         | Is ChatGPT posting on HN spreading open model FUD!?
         | 
         | > especially as these models can learn from and remember
         | interaction patterns
         | 
         | All joking aside, I'm pretty sure they can't. Sure the hosted
         | service can collect input / output and do nefarious things with
         | it, but the model itself is just a model.
         | 
         | Plus it's open source, you can run it yourself somewhere. For
         | example, I run deepseek-coder-v2:16b with ollama + Continue for
         | tab completion. It's decent quality and I get 70-100 tokens/s.
        
       | nprateem wrote:
       | As in significantly worse than..?
        
       | joshhart wrote:
       | The benchmarks compare it favorably to GPT-4-turbo but not
       | GPT-4o. The latest versions of GPT-4o are much higher in quality
       | than GPT-4-turbo. The HN title here does not reflect what the
       | article is saying.
       | 
       | That said the conclusion that it's a good model for cheap is
       | true. I just would be hesitant to say it's a great model.
        
         | A_D_E_P_T wrote:
         | Not only do I completely agree, I've been playing around with
         | both of them for the past 30 minutes and my impression is that
         | GPT-4o is significantly better across the board. It's faster,
         | it's a better writer, it's more insightful, it has a much
         | broader knowledgebase, etc.
         | 
         | What's more, DeepSeek doesn't seem capable of handling image
         | uploads. I got an error every time. ("No text extracted from
         | attachment.") It claims to be able to handle images, but it's
         | just not working for me.
         | 
         | When it comes to math, the two seem roughly equivalent.
         | 
         | DeepSeek is, however, politically neutral in an interesting
         | way. Whereas GPT-4o will take strong moral stances, DeepSeek is
         | an impressively blank tool that seems to have no strong
         | opinions of its own. I tested them both on a 1910 article
         | critiquing women's suffrage, asking for a review of the article
         | and a rewritten modernized version; GPT-4o recoiled, DeepSeek
         | treated the task as business as usual.
        
           | theanonymousone wrote:
           | Thanks for sharing. How about 4o-mini?
        
           | tkgally wrote:
           | > DeepSeek ... seems to have no strong opinions of its own.
           | 
           | Have you tried asking it about Tibetan sovereignty, the
           | Tiananmen massacre, or the role of the communist party in
           | Chinese society? Chinese models I've tested have had quite
           | strong opinions about such questions.
        
         | jchook wrote:
         | I updated the title to say GPT-4, but I believe the quality is
         | still surprisingly close to 4o.
         | 
         | On HumanEval, I see 90.2 for GPT-4o and 89.0 for DeepSeek v2.5.
         | 
         | - https://blog.getbind.co/2024/09/19/deepseek-2-5-how-does-
         | it-...
         | 
         | - https://paperswithcode.com/sota/code-generation-on-humaneval
        
       | jyap wrote:
       | This 236B model came out around September 6th.
       | 
       | DeepSeek-V2.5 is an upgraded version that combines
       | DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
       | 
       | From: https://huggingface.co/deepseek-ai/DeepSeek-V2.5
        
         | genpfault wrote:
         | > To utilize DeepSeek-V2.5 in BF16 format for inference, 80GB*8
         | GPUs are required.
        
       | Alifatisk wrote:
       | Oh wow, it almost beats Claude3 Opus!
        
       | ziofill wrote:
       | What about comparisons to Claude 3.5? Sneaky.
        
       | khanan wrote:
       | Did you try to ask it if Winnie the pooh look like the president
       | of China?
        
       | gdevenyi wrote:
       | What does open source mean here? Where's the code? The weights?
        
       | yieldcrv wrote:
       | tl;dr not even close to closed source text-only modes, and a
       | lightyear behind the other 3 senses these multimodal ones have
       | had for a year
        
       | zone411 wrote:
       | In my NYT Connections benchmark, it hasn't performed well:
       | https://github.com/lechmazur/nyt-connections/ (see the table).
        
       ___________________________________________________________________
       (page generated 2024-10-30 23:00 UTC)