[HN Gopher] Qwen1.5-Moe: Matching 7B Model Performance with 1/3 ...
___________________________________________________________________
Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated
Parameters
Author : GaggiX
Score : 47 points
Date : 2024-03-29 18:45 UTC (4 hours ago)
(HTM) web link (qwenlm.github.io)
(TXT) w3m dump (qwenlm.github.io)
| transformi wrote:
| HOW is it compare to phi-2?
| sp332 wrote:
| Higher on MMLU (62.5 vs 56.7 for phi-2) and GSM8k (61.5 vs
| 61.1). https://www.microsoft.com/en-us/research/blog/phi-2-the-
| surp... The phi-2 numbers are for 5-shot MMLU and 8-shot GSM8k.
| The blog post doesn't get that specific for Qwen, but it's very
| likely they tested the same way.
___________________________________________________________________
(page generated 2024-03-29 23:00 UTC)