[HN Gopher] WizardLM 2
___________________________________________________________________
WizardLM 2
Author : tosh
Score : 120 points
Date : 2024-04-15 15:54 UTC (7 hours ago)
(HTM) web link (wizardlm.github.io)
(TXT) w3m dump (wizardlm.github.io)
| brokensegue wrote:
| seems to be roughly the same capacity as Command R+ if the arena
| leader board is to be trusted. I think Command R+ has fewer
| params?
| lolinder wrote:
| I don't see WizardLM 2 on the leaderboard yet--are you looking
| at the previous version? Or do you have a link to a different
| leader board?
|
| https://chat.lmsys.org
| brokensegue wrote:
| I said this based on the comparisons in the OP to other
| models which are on the leaderboard
| reissbaker wrote:
| WizardLM-2 8x22B appears to beat Command-R+ in their
| measurements.
| exe34 wrote:
| Any idea what the context length is? Can't see it from the page
| linked.
| titaniumtown wrote:
| I assume the same as the models that they're fine tuned off of.
| tosh wrote:
| a bit hidden but afaiu:
|
| * 8x22B (based on Mistral 8x22B)
|
| * 70B (based on llama2?)
|
| * 7B (based on Mistral-7B-v0.1)
|
| and the 70B model is not available yet on huggingface
|
| https://huggingface.co/collections/microsoft/wizardlm-661d40...
| spxneo wrote:
| WizardLLM beats GPT4-0314 nearly close to the latest release
| GPT-4-1106-preview
| mrtesthah wrote:
| By that benchmark, Claude Sonnet beats both.
| MacsHeadroom wrote:
| And Claude Sonnet is great. It's what I recommend people use
| if they don't want to pay for an LLM or run one locally.
|
| But Claude Sonnet is not open source and can't be run on your
| own hardware like Wizard.
| stavros wrote:
| How do you use Sonnet for free?
| MacsHeadroom wrote:
| At https://poe.com/
| stavros wrote:
| Thanks!
| Vetch wrote:
| Poe's Sonnet is limited to 15 free messages per day. The
| best freely accessible LLM with a generous daily
| allotment (300 msgs/day) is Bing Green Precise Mode. It's
| at about GPT4 level.
| spxneo wrote:
| how are they offering this at discounts from the
| original? looks like the goal is to get subs to cover for
| the most frequent users?
| nilsherzig wrote:
| On the official site Claude dot ao
| smusamashah wrote:
| What does "state-of-art" mean? Every one seems to be using this
| while in my head SOTA means "top of the line". When it says
| comparable results in benchmarks, how is it SOTA? Or does SOTA
| mean topping on really really specific benchmarks?
| striking wrote:
| There's a table of MT-Bench scores. I presume we'll have to
| wait for the preprint before we know anything else.
| YetAnotherNick wrote:
| While I want to be very optimistic about open models, the fact
| that almost all top open models heavily use GPT-4 for synthetic
| data is discouraging. Even now it is against TOS and over time I
| think over time OpenAI would become better at detecting meaning
| the gap between open and closed models would increase.
| dimask wrote:
| Considering these models are released by "Microsoft AI", I
| doubt they do anything against the ToS of "Open"AI.
| dartos wrote:
| I doubt it. They're using gpt-4 data because it's cheaper than
| getting real data, not because it's better.
|
| If openai cares enough to stop people scraping responses, then
| we'll just crowdsource them like open assistant did with their
| dataset.
| Patrick_Devine wrote:
| The 7B model is available on ollama if you want to try it:
| `ollama run wizardlm2` or `ollama run wizardlm2:7b`.
|
| We're still crunching the 8x22B model to get it ready, and the
| 70B model isn't yet available.
| syntaxing wrote:
| If you can computationally afford it, 7b-q5_K_M is a way better
| choice. Default :7B goes to q4_0 which might give you subpar
| results.
| dimask wrote:
| And quantised gguf files by Bartowski if somebody wants to
| download and run through llama.cpp directly
|
| https://huggingface.co/bartowski/WizardLM-2-7B-GGUF
| mcbuilder wrote:
| Mixtral8x22B looking very strong! Finetunes seem comparable to
| GPT4!
| littlestymaar wrote:
| That's quite crazy to see that a model that's barely bigger
| than GPT-3 (and only uses a fraction of the compute due to its
| MoE architecture) can achieve such a thing.
|
| It looks like the people who focasted that AI models would need
| to keep growing to improve their performance where completely
| misguided. I wonder if in 3 to 4 years we'll end up with models
| with less than 4B parameters we comparable performance as
| today's State of the Art.
| int_19h wrote:
| It can also simply mean that benchmarks are not particularly
| representative of real-world performance on challenging
| reasoning tasks.
| littlestymaar wrote:
| Of course they aren't, but it's still pretty evident that
| most opensource models are miles ahead of GTP-3, even the
| ones that are only a fraction of its size so there's still
| some massive improvement that doesn't depend on the model
| size itself.
| Vetch wrote:
| They weren't misguided, they just over-focused on scaling
| parameters instead of appropriately scaling quality data in
| tandem.
|
| A 4B has limited capacity to encode knowledge and algorithmic
| circuits. It's also too small to learn programs whose
| execution exceed the sizes of circuits it can encode. There
| is a hard cap on how much we can squeeze out of small models.
| What we need is better consumer hardware, so we don't have to
| hope for miracles.
|
| Another hope is that the 1.58 bit/ternary quantization aware
| training of model scaling pans out. That'd be another axis of
| inefficiency beyond just parameter count.
| zamalek wrote:
| How are folks running 8x22B (or MoE in general) in an affordable
| way?
| dartos wrote:
| For personal inference, llama.cpp can run some MoEs on CPU.
|
| So like that.
___________________________________________________________________
(page generated 2024-04-15 23:01 UTC)