[HN Gopher] EagleX 1.7T: Soaring past LLaMA 7B 2T in both Englis...
___________________________________________________________________
EagleX 1.7T: Soaring past LLaMA 7B 2T in both English and Multi-
lang evals
Author : lhofer
Score : 18 points
Date : 2024-03-16 20:50 UTC (2 hours ago)
(HTM) web link (substack.recursal.ai)
(TXT) w3m dump (substack.recursal.ai)
| LorenDB wrote:
| To be clear, this is a 7B model. It's just trained on 1.7
| trillion tokens. At first I was confused why they were making
| such a big deal of of a massive 1.7T model outperforming a 7B
| model.
|
| By the way, GPT-4 has been predicted to be using a 1.7T model,
| although OpenAI has not confirmed this to my knowledge.
| mmoskal wrote:
| The most interesting bit is that this is RWKV model, meaning
| constant size state (no quadratic attention). AFAIK the biggest
| open weights non-transformer model.
| nhggfu wrote:
| sounds dreamy. anyone know how I can install this on my m1 mac?
| YetAnotherNick wrote:
| > All evals are 0 shot
|
| My bet is that this is the reason they are scoring high in
| "their" benchmarks. For model which are just trained on
| completely unlabelled data like llama, 0 shot won't work well.
|
| e.g. For llama Hellaswag accuracy is 57.13% in their benchmark
| compared to 78.59% in [1].
|
| [1]:
| https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
| GaggiX wrote:
| >Let's look what went really badly: Math.
|
| >We dug through the dataset we used for training, and realized we
| missed out the entire math dataset (along with a few others) due
| to an error. Oops.
|
| This is kinda hilarious.
| turnsout wrote:
| Makes me excited for the next model though!
| squigz wrote:
| > All data shown here is made available in the Google Sheet over
| here:
|
| Over where?
___________________________________________________________________
(page generated 2024-03-16 23:01 UTC)