[HN Gopher] Yi 1.5
___________________________________________________________________
Yi 1.5
Author : tosh
Score : 105 points
Date : 2024-05-12 16:23 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xhevahir wrote:
| "Yi-1.5 is an upgraded version of Yi" is not a very informative
| beginning.
| kkzz99 wrote:
| "It is continuously pre-trained on Yi with a high-quality
| corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning
| samples.
|
| Compared with Yi, Yi-1.5 delivers stronger performance in
| coding, math, reasoning, and instruction-following capability,
| while still maintaining excellent capabilities in language
| understanding, commonsense reasoning, and reading
| comprehension.
|
| Yi-1.5 comes in 3 model sizes: 34B, 9B, and 6B. For model
| details and benchmarks, see Model Card."
|
| Literally after that...
| Jaxan wrote:
| So it's a large language model?
| gardnr wrote:
| Yi is led by Dr. Kai-Fu Lee.
|
| They have been releasing a lot of really good models over the
| last ~6 months. Their previous (1.0?) Yi-34B-Chat model ranks
| similar to GPT-3.5 on Chatbot Arena. [1] A quantized version of
| that model can be run on a single consumer video card like the
| RTX 4090.
|
| This new set of models should raise the bar again by adding more
| options to the open source LLM ecosystem. If you inspect the
| config.json[2] in the model repo on HuggingFace, you can see that
| the model architecture is LlamaForCausalLM (the same as Meta's
| Lllama). The difference between the Yi models and a simple fine-
| tuning is that Yi models have had a different set of data,
| configuration, and process going back to the pre-training stage.
|
| Their models perform well in Chinese and in English.
|
| There are a lot of good models coming out of China, some of which
| are only published to ModelScope. I haven't spent much time on
| ModelScope because I don't have a Chinese mobile number to use to
| create an account. Fortunately, Yi publish to HuggingFace as
| well.
|
| [1] https://huggingface.co/spaces/lmsys/chatbot-arena-
| leaderboar...
|
| [2]
| https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/fa695ee438...
| option wrote:
| Try asking their "chat" variants about topics sensetive to CCP,
| like what has happened on Tiananmen square. Same for Baichan
| models.
|
| What other values and biases have been RLHFed there and for
| what purpose?
| polygamous_bat wrote:
| This is an interesting question. Is there a "controversy-
| benchmark" perhaps, to measure this?
| ekianjo wrote:
| the American models are similarly censored for specific
| topics...
| tosh wrote:
| Benchmark charts on model card:
| https://huggingface.co/01-ai/Yi-1.5-34B-Chat#benchmarks
|
| Yi 34b with results similar to Llama 3 70b and Mixtral 8x22b
|
| Yi 6b and 9b with results similar to Llama 3 8b
| GaggiX wrote:
| We need to wait for LMSYS Chatbot Arena to actually see the
| performance of the model.
| tosh wrote:
| I had good results with the previous Yi-34b and its fine
| tunes like Nous-Capybara-34B. Will be interesting to see what
| Chatbot Arena thinks but my expectations are high.
|
| https://huggingface.co/NousResearch/Nous-Capybara-34B
| zone411 wrote:
| No, Lmsys is just another very obviously flawed benchmark.
| qeternity wrote:
| Pretraining on the test set is all you need.
|
| LLM benchmarks are horribly broken. IMHO there is better signal
| in just looking at parameter counts.
| mountainriver wrote:
| Is it the same bad license?
| tosh wrote:
| It looks like they switched to Apache 2.0 for the weights.
| Havoc wrote:
| Never had any luck with the Yi family of models. They tend to get
| sidetracked and respond in Chinese. Maybe my setup is somehow
| flawed
| segmondy wrote:
| Your setup is flawed.
| qeternity wrote:
| No, it's not. This is a common issue with Yi models.
| 999900000999 wrote:
| Is 16 GB of ram enough to run these locally?
|
| I'm considering a new laptop later this year and the ram is now
| fixed to 16GB on most of them.
|
| I plan on digging deep into ML during my coming break from paid
| work.
| coolestguy wrote:
| No - 16gb of ram is barely enough to run regular applications
| if you're a power user let alone the most breakthrough
| computationally heavy workloads ever invented
| 999900000999 wrote:
| The price difference is about 150$ give or take for the
| laptops I'm looking at.
|
| I'll keep this in mind!
| tosh wrote:
| 16 GB is enough to run quantized versions of 9b and 6b.
| adt wrote:
| https://lifearchitect.ai/models-table/
| Hugsun wrote:
| This page is confusing to me. How is it useful to you? I can
| see some utility but am curious if there's something I'm
| missing.
| smcleod wrote:
| While interesting, Yi 1.5 only has a 4K context window, which
| means it's not going to be useful for a lot of use cases.
___________________________________________________________________
(page generated 2024-05-12 23:00 UTC)