[HN Gopher] WizardLM 2
       ___________________________________________________________________
        
       WizardLM 2
        
       Author : tosh
       Score  : 120 points
       Date   : 2024-04-15 15:54 UTC (7 hours ago)
        
 (HTM) web link (wizardlm.github.io)
 (TXT) w3m dump (wizardlm.github.io)
        
       | brokensegue wrote:
       | seems to be roughly the same capacity as Command R+ if the arena
       | leader board is to be trusted. I think Command R+ has fewer
       | params?
        
         | lolinder wrote:
         | I don't see WizardLM 2 on the leaderboard yet--are you looking
         | at the previous version? Or do you have a link to a different
         | leader board?
         | 
         | https://chat.lmsys.org
        
           | brokensegue wrote:
           | I said this based on the comparisons in the OP to other
           | models which are on the leaderboard
        
             | reissbaker wrote:
             | WizardLM-2 8x22B appears to beat Command-R+ in their
             | measurements.
        
       | exe34 wrote:
       | Any idea what the context length is? Can't see it from the page
       | linked.
        
         | titaniumtown wrote:
         | I assume the same as the models that they're fine tuned off of.
        
       | tosh wrote:
       | a bit hidden but afaiu:
       | 
       | * 8x22B (based on Mistral 8x22B)
       | 
       | * 70B (based on llama2?)
       | 
       | * 7B (based on Mistral-7B-v0.1)
       | 
       | and the 70B model is not available yet on huggingface
       | 
       | https://huggingface.co/collections/microsoft/wizardlm-661d40...
        
       | spxneo wrote:
       | WizardLLM beats GPT4-0314 nearly close to the latest release
       | GPT-4-1106-preview
        
         | mrtesthah wrote:
         | By that benchmark, Claude Sonnet beats both.
        
           | MacsHeadroom wrote:
           | And Claude Sonnet is great. It's what I recommend people use
           | if they don't want to pay for an LLM or run one locally.
           | 
           | But Claude Sonnet is not open source and can't be run on your
           | own hardware like Wizard.
        
             | stavros wrote:
             | How do you use Sonnet for free?
        
               | MacsHeadroom wrote:
               | At https://poe.com/
        
               | stavros wrote:
               | Thanks!
        
               | Vetch wrote:
               | Poe's Sonnet is limited to 15 free messages per day. The
               | best freely accessible LLM with a generous daily
               | allotment (300 msgs/day) is Bing Green Precise Mode. It's
               | at about GPT4 level.
        
               | spxneo wrote:
               | how are they offering this at discounts from the
               | original? looks like the goal is to get subs to cover for
               | the most frequent users?
        
               | nilsherzig wrote:
               | On the official site Claude dot ao
        
       | smusamashah wrote:
       | What does "state-of-art" mean? Every one seems to be using this
       | while in my head SOTA means "top of the line". When it says
       | comparable results in benchmarks, how is it SOTA? Or does SOTA
       | mean topping on really really specific benchmarks?
        
         | striking wrote:
         | There's a table of MT-Bench scores. I presume we'll have to
         | wait for the preprint before we know anything else.
        
       | YetAnotherNick wrote:
       | While I want to be very optimistic about open models, the fact
       | that almost all top open models heavily use GPT-4 for synthetic
       | data is discouraging. Even now it is against TOS and over time I
       | think over time OpenAI would become better at detecting meaning
       | the gap between open and closed models would increase.
        
         | dimask wrote:
         | Considering these models are released by "Microsoft AI", I
         | doubt they do anything against the ToS of "Open"AI.
        
         | dartos wrote:
         | I doubt it. They're using gpt-4 data because it's cheaper than
         | getting real data, not because it's better.
         | 
         | If openai cares enough to stop people scraping responses, then
         | we'll just crowdsource them like open assistant did with their
         | dataset.
        
       | Patrick_Devine wrote:
       | The 7B model is available on ollama if you want to try it:
       | `ollama run wizardlm2` or `ollama run wizardlm2:7b`.
       | 
       | We're still crunching the 8x22B model to get it ready, and the
       | 70B model isn't yet available.
        
         | syntaxing wrote:
         | If you can computationally afford it, 7b-q5_K_M is a way better
         | choice. Default :7B goes to q4_0 which might give you subpar
         | results.
        
         | dimask wrote:
         | And quantised gguf files by Bartowski if somebody wants to
         | download and run through llama.cpp directly
         | 
         | https://huggingface.co/bartowski/WizardLM-2-7B-GGUF
        
       | mcbuilder wrote:
       | Mixtral8x22B looking very strong! Finetunes seem comparable to
       | GPT4!
        
         | littlestymaar wrote:
         | That's quite crazy to see that a model that's barely bigger
         | than GPT-3 (and only uses a fraction of the compute due to its
         | MoE architecture) can achieve such a thing.
         | 
         | It looks like the people who focasted that AI models would need
         | to keep growing to improve their performance where completely
         | misguided. I wonder if in 3 to 4 years we'll end up with models
         | with less than 4B parameters we comparable performance as
         | today's State of the Art.
        
           | int_19h wrote:
           | It can also simply mean that benchmarks are not particularly
           | representative of real-world performance on challenging
           | reasoning tasks.
        
             | littlestymaar wrote:
             | Of course they aren't, but it's still pretty evident that
             | most opensource models are miles ahead of GTP-3, even the
             | ones that are only a fraction of its size so there's still
             | some massive improvement that doesn't depend on the model
             | size itself.
        
           | Vetch wrote:
           | They weren't misguided, they just over-focused on scaling
           | parameters instead of appropriately scaling quality data in
           | tandem.
           | 
           | A 4B has limited capacity to encode knowledge and algorithmic
           | circuits. It's also too small to learn programs whose
           | execution exceed the sizes of circuits it can encode. There
           | is a hard cap on how much we can squeeze out of small models.
           | What we need is better consumer hardware, so we don't have to
           | hope for miracles.
           | 
           | Another hope is that the 1.58 bit/ternary quantization aware
           | training of model scaling pans out. That'd be another axis of
           | inefficiency beyond just parameter count.
        
       | zamalek wrote:
       | How are folks running 8x22B (or MoE in general) in an affordable
       | way?
        
         | dartos wrote:
         | For personal inference, llama.cpp can run some MoEs on CPU.
         | 
         | So like that.
        
       ___________________________________________________________________
       (page generated 2024-04-15 23:01 UTC)