[HN Gopher] LiteLlama-460M-1T has 460M parameters trained with 1...
___________________________________________________________________
LiteLlama-460M-1T has 460M parameters trained with 1T tokens
Author : dmezzetti
Score : 34 points
Date : 2024-01-07 20:36 UTC (2 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| rustybolt wrote:
| Seems very glitchy? Example 6 gives
|
| > My name is Merve and my favorite color is blue. I am a very
| active person and I love to exercise. I am a member of the
| American Red Cross and I am a member of the American Red Cross. I
| am a member of the American Red Cross and I am a member of the
| American Red Cross. I am a member ...
| slavboj wrote:
| This kind of stuck-in-loop behavior was a characteristic of
| previous low parameter count LLMs. For large amounts of actual
| text generation they're not ideal.
| coder543 wrote:
| This model does not appear to be fine-tuned for chat. I
| observed the same looping behavior with virtually any direct
| prompt.
|
| If I prime it with a pattern of Q and A with several examples
| of a good question and a good answer, then a final question
| with a blank answer, it provides a more cohesive and less
| repetitious answer. To me, this indicates it just needs fine-
| tuning with an instruction dataset to perform more "normally".
|
| Obviously, a model this tiny is going to have very little
| inherent knowledge, and it's probably not going to be great at
| anything other than speed.
| coder543 wrote:
| For example, I asked Mixtral to generate 4 questions and
| short answers following a prompt format that I provided. Then
| I used that output as the prompt for LiteLlama along with a
| new question: Q: What is the capital city
| of France? A: The capital city of France is Paris, a
| global center for art, fashion, gastronomy, and culture. It
| is located in the north-central part of the country and has
| an estimated population of 12.4 million in its wider
| metropolitan area. Q: Who was the first person
| to walk on the moon? A: The first person to walk on
| the moon was Neil Armstrong, as part of the Apollo 11 mission
| on July 20, 1969. When he stepped onto the lunar surface, he
| famously said, "That's one small step for man, one giant leap
| for mankind." Q: What is the chemical symbol for
| gold? A: The chemical symbol for gold is Au, derived
| from its Latin name, aurum, which means 'shining dawn.' Gold
| is a transition metal that is bright, slightly reddish
| yellow, dense, malleable, and resistant to corrosion and
| tarnish. Q: What is the tallest mountain in the
| world? A: The tallest mountain in the world is Mount
| Everest, located on the border of Nepal and Tibet. It has a
| height of approximately 8,849 meters (29,032 feet) above sea
| level. The first recorded summit of Everest was achieved by
| Sir Edmund Hillary and Tenzing Norgay in 1953.
| Q: What is the LHC? What has it accomplished? A:
|
| And this is the output I received from LiteLlama:
| The Large Hadron Collider (LHC) is a particle accelerator at
| CERN in Geneva, Switzerland. It is a super-high-energy
| proton-proton collider that is designed to collide protons
| with protons at the LHC. Q: [Omitted since I
| don't care. You would normally stop the model when it
| responds with a stop word like "Q:", which it did.]
|
| Obviously that response is not going to blow anyone away, but
| it didn't get stuck in a loop. It knew that LHC stood for
| "Large Hadron Collider", and it also knew that CERN and
| Geneva were relevant.
| jackblemming wrote:
| What is 10 + 10?
|
| Answer: 0 + 10 = 10 + 10 = 10 + 10 = 10 + 10 = 10 + 10 =
|
| This seems like a waste of compute and time.
| DarmokJalad1701 wrote:
| The model may not be fine-tuned for instruct/chat.
| coder543 wrote:
| The linked page claims that LiteLlama scored a _zero_ on the
| GSM8K benchmark, so let 's just say math probably isn't its
| forte.
| badgersnake wrote:
| Pretty sure your computer can calculate 10+10 without an LLM.
| minimaxir wrote:
| It's a test case.
| diggan wrote:
| Throwing math problems at a LLM just shows your level of
| understanding about the basics of LLMs. They're not trained to
| solve straight math calculations. I'm guess you could train one
| to be, I suppose, but the ones being released today are not.
|
| You could instead ask it how to calculate something, and it
| could give you accurate instructions for how to achieve that.
| Then you either perform the calculation yourself, or use
| something like ChatGPT that has a built in python evaluator, so
| it can perform the calculation.
|
| Quick example:
| https://chat.openai.com/share/9f76f5e5-d933-48fb-99e8-4a6530...
| jackblemming wrote:
| You've actually shown your poor understanding of LLMs. I just
| asked Llama-2 7b the same question and it answered perfectly
| fine. It did not need to use an external python interpreter
| or a function call, or need to be prompted with chain of
| thought reasoning.
|
| You're correct, LLMs are not (usually) explicitly trained to
| solve math calculations, this does not mean they cannot solve
| basic math equations (they can!).
| batch12 wrote:
| Or combine it with something like llama.cpp's grammer or
| microsoft's guidance-ai[0] (which I prefer) which would allow
| adding some react-style prompting and external tools. As
| others have mentioned, instruct tuning would help too.
|
| [0] https://github.com/guidance-ai/guidance
| qwertox wrote:
| " _The mountain in front of me_ " was a mountain, and I was a
| mountain.
|
| I was a mountain.
|
| I was a mountain.
|
| I was a mountain.
|
| I was a mountain.
___________________________________________________________________
(page generated 2024-01-07 23:01 UTC)