[HN Gopher] Qwen1.5-Moe: Matching 7B Model Performance with 1/3 ...
       ___________________________________________________________________
        
       Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated
       Parameters
        
       Author : GaggiX
       Score  : 47 points
       Date   : 2024-03-29 18:45 UTC (4 hours ago)
        
 (HTM) web link (qwenlm.github.io)
 (TXT) w3m dump (qwenlm.github.io)
        
       | transformi wrote:
       | HOW is it compare to phi-2?
        
         | sp332 wrote:
         | Higher on MMLU (62.5 vs 56.7 for phi-2) and GSM8k (61.5 vs
         | 61.1). https://www.microsoft.com/en-us/research/blog/phi-2-the-
         | surp... The phi-2 numbers are for 5-shot MMLU and 8-shot GSM8k.
         | The blog post doesn't get that specific for Qwen, but it's very
         | likely they tested the same way.
        
       ___________________________________________________________________
       (page generated 2024-03-29 23:00 UTC)