[HN Gopher] EagleX 1.7T: Soaring past LLaMA 7B 2T in both Englis...
       ___________________________________________________________________
        
       EagleX 1.7T: Soaring past LLaMA 7B 2T in both English and Multi-
       lang evals
        
       Author : lhofer
       Score  : 18 points
       Date   : 2024-03-16 20:50 UTC (2 hours ago)
        
 (HTM) web link (substack.recursal.ai)
 (TXT) w3m dump (substack.recursal.ai)
        
       | LorenDB wrote:
       | To be clear, this is a 7B model. It's just trained on 1.7
       | trillion tokens. At first I was confused why they were making
       | such a big deal of of a massive 1.7T model outperforming a 7B
       | model.
       | 
       | By the way, GPT-4 has been predicted to be using a 1.7T model,
       | although OpenAI has not confirmed this to my knowledge.
        
         | mmoskal wrote:
         | The most interesting bit is that this is RWKV model, meaning
         | constant size state (no quadratic attention). AFAIK the biggest
         | open weights non-transformer model.
        
       | nhggfu wrote:
       | sounds dreamy. anyone know how I can install this on my m1 mac?
        
       | YetAnotherNick wrote:
       | > All evals are 0 shot
       | 
       | My bet is that this is the reason they are scoring high in
       | "their" benchmarks. For model which are just trained on
       | completely unlabelled data like llama, 0 shot won't work well.
       | 
       | e.g. For llama Hellaswag accuracy is 57.13% in their benchmark
       | compared to 78.59% in [1].
       | 
       | [1]:
       | https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
        
       | GaggiX wrote:
       | >Let's look what went really badly: Math.
       | 
       | >We dug through the dataset we used for training, and realized we
       | missed out the entire math dataset (along with a few others) due
       | to an error. Oops.
       | 
       | This is kinda hilarious.
        
         | turnsout wrote:
         | Makes me excited for the next model though!
        
       | squigz wrote:
       | > All data shown here is made available in the Google Sheet over
       | here:
       | 
       | Over where?
        
       ___________________________________________________________________
       (page generated 2024-03-16 23:01 UTC)