[HN Gopher] QVQ-Max: Think with Evidence
       ___________________________________________________________________
        
       QVQ-Max: Think with Evidence
        
       Author : wertyk
       Score  : 97 points
       Date   : 2025-04-03 14:55 UTC (3 days ago)
        
 (HTM) web link (qwenlm.github.io)
 (TXT) w3m dump (qwenlm.github.io)
        
       | xpe wrote:
       | The about page doesn't shed light on the composition of the core
       | team nor their sources of incomes or funding. Am I overlooking
       | something?
       | 
       | > We are a group of people with diverse talents and interests.
        
         | tough wrote:
         | I think Qwen team is alibaba AI arm
         | https://qwenlm.github.io/about/
        
       | gatienboquet wrote:
       | Isn't "thinking" in image mode basically what chatgpt 4o image
       | generation do ?
        
         | simonw wrote:
         | Not at all. GPT-4o is image output - this model (and previous
         | Qwen release QvQ - https://simonwillison.net/2024/Dec/24/qvq/)
         | are image input only with a "reasoning" chain of thought to
         | help analyze the images.
        
       | drapado wrote:
       | Unfortunately, no open weights this time :(
        
         | xpe wrote:
         | The wisdom of open weights is hotly debated.
        
           | moffkalast wrote:
           | Is there a good wisdom benchmark we can run on those weights?
           | /s
        
           | drapado wrote:
           | wisdom? I don't get what you meant with that. What is clear
           | is that open weights benefits society as we can run it
           | locally and privately.
        
         | fxj wrote:
         | https://ollama.com/joefamous/QVQ-72B-Preview
         | 
         | Experimental research model with enhanced visual reasoning
         | capabilities.
         | 
         | Supports context length of 128k.
         | 
         | Currently, the model only supports single-round dialogues and
         | image outputs. It does not support video inputs.
         | 
         | Should be capable of images up to 12 MP.
        
           | drapado wrote:
           | >Last December, we launched QVQ-72B-Preview as an exploratory
           | model, but it had many issues.
           | 
           | That's an earlier version released some months ago. They even
           | acknowledge it.
           | 
           | The version they present in the blog post and you can run in
           | their chat platform is not open or available to download.
        
       | ttul wrote:
       | Just going off the blog post, this seems like a multimodal LLM
       | that uses thinking tokens. That's pretty cool. Is this the first
       | of its kind?
        
       | torginus wrote:
       | I wonder why are we getting these drops during the weekend. Is
       | the AI race truly that heated?
        
         | jsemrau wrote:
         | Judging from my blog, I get much more engagement on the
         | weekends.
        
         | frainfreeze wrote:
         | I guess a lot of people do their regular 9-5 through week and
         | play with new stuff on the weekends. But also yes, it is truly
         | that heated
        
         | daxfohl wrote:
         | IIUC engineers in China only get one day off per week.
        
         | daxfohl wrote:
         | IIUC engineers in China only get one day off per week. IDK if
         | that's hyperbole or not.
        
         | mohsen1 wrote:
         | > March 28, 2025
        
       | mcbuilder wrote:
       | I think this is old news, but this model does better than llama 4
       | maverick on coding.
        
         | int_19h wrote:
         | LLaMA 4 is pretty underwhelming across the board.
        
       | unixhero wrote:
       | So how do I run it locally?
        
       ___________________________________________________________________
       (page generated 2025-04-06 23:00 UTC)