[HN Gopher] LiteLlama-460M-1T has 460M parameters trained with 1...
       ___________________________________________________________________
        
       LiteLlama-460M-1T has 460M parameters trained with 1T tokens
        
       Author : dmezzetti
       Score  : 34 points
       Date   : 2024-01-07 20:36 UTC (2 hours ago)
        
 (HTM) web link (huggingface.co)
 (TXT) w3m dump (huggingface.co)
        
       | rustybolt wrote:
       | Seems very glitchy? Example 6 gives
       | 
       | > My name is Merve and my favorite color is blue. I am a very
       | active person and I love to exercise. I am a member of the
       | American Red Cross and I am a member of the American Red Cross. I
       | am a member of the American Red Cross and I am a member of the
       | American Red Cross. I am a member ...
        
         | slavboj wrote:
         | This kind of stuck-in-loop behavior was a characteristic of
         | previous low parameter count LLMs. For large amounts of actual
         | text generation they're not ideal.
        
         | coder543 wrote:
         | This model does not appear to be fine-tuned for chat. I
         | observed the same looping behavior with virtually any direct
         | prompt.
         | 
         | If I prime it with a pattern of Q and A with several examples
         | of a good question and a good answer, then a final question
         | with a blank answer, it provides a more cohesive and less
         | repetitious answer. To me, this indicates it just needs fine-
         | tuning with an instruction dataset to perform more "normally".
         | 
         | Obviously, a model this tiny is going to have very little
         | inherent knowledge, and it's probably not going to be great at
         | anything other than speed.
        
           | coder543 wrote:
           | For example, I asked Mixtral to generate 4 questions and
           | short answers following a prompt format that I provided. Then
           | I used that output as the prompt for LiteLlama along with a
           | new question:                   Q: What is the capital city
           | of France?         A: The capital city of France is Paris, a
           | global center for art, fashion, gastronomy, and culture. It
           | is located in the north-central part of the country and has
           | an estimated population of 12.4 million in its wider
           | metropolitan area.              Q: Who was the first person
           | to walk on the moon?         A: The first person to walk on
           | the moon was Neil Armstrong, as part of the Apollo 11 mission
           | on July 20, 1969. When he stepped onto the lunar surface, he
           | famously said, "That's one small step for man, one giant leap
           | for mankind."              Q: What is the chemical symbol for
           | gold?         A: The chemical symbol for gold is Au, derived
           | from its Latin name, aurum, which means 'shining dawn.' Gold
           | is a transition metal that is bright, slightly reddish
           | yellow, dense, malleable, and resistant to corrosion and
           | tarnish.              Q: What is the tallest mountain in the
           | world?         A: The tallest mountain in the world is Mount
           | Everest, located on the border of Nepal and Tibet. It has a
           | height of approximately 8,849 meters (29,032 feet) above sea
           | level. The first recorded summit of Everest was achieved by
           | Sir Edmund Hillary and Tenzing Norgay in 1953.
           | Q: What is the LHC? What has it accomplished?         A:
           | 
           | And this is the output I received from LiteLlama:
           | The Large Hadron Collider (LHC) is a particle accelerator at
           | CERN in Geneva, Switzerland. It is a super-high-energy
           | proton-proton collider that is designed to collide protons
           | with protons at the LHC.              Q: [Omitted since I
           | don't care. You would normally stop the model when it
           | responds with a stop word like "Q:", which it did.]
           | 
           | Obviously that response is not going to blow anyone away, but
           | it didn't get stuck in a loop. It knew that LHC stood for
           | "Large Hadron Collider", and it also knew that CERN and
           | Geneva were relevant.
        
       | jackblemming wrote:
       | What is 10 + 10?
       | 
       | Answer: 0 + 10 = 10 + 10 = 10 + 10 = 10 + 10 = 10 + 10 =
       | 
       | This seems like a waste of compute and time.
        
         | DarmokJalad1701 wrote:
         | The model may not be fine-tuned for instruct/chat.
        
         | coder543 wrote:
         | The linked page claims that LiteLlama scored a _zero_ on the
         | GSM8K benchmark, so let 's just say math probably isn't its
         | forte.
        
         | badgersnake wrote:
         | Pretty sure your computer can calculate 10+10 without an LLM.
        
           | minimaxir wrote:
           | It's a test case.
        
         | diggan wrote:
         | Throwing math problems at a LLM just shows your level of
         | understanding about the basics of LLMs. They're not trained to
         | solve straight math calculations. I'm guess you could train one
         | to be, I suppose, but the ones being released today are not.
         | 
         | You could instead ask it how to calculate something, and it
         | could give you accurate instructions for how to achieve that.
         | Then you either perform the calculation yourself, or use
         | something like ChatGPT that has a built in python evaluator, so
         | it can perform the calculation.
         | 
         | Quick example:
         | https://chat.openai.com/share/9f76f5e5-d933-48fb-99e8-4a6530...
        
           | jackblemming wrote:
           | You've actually shown your poor understanding of LLMs. I just
           | asked Llama-2 7b the same question and it answered perfectly
           | fine. It did not need to use an external python interpreter
           | or a function call, or need to be prompted with chain of
           | thought reasoning.
           | 
           | You're correct, LLMs are not (usually) explicitly trained to
           | solve math calculations, this does not mean they cannot solve
           | basic math equations (they can!).
        
           | batch12 wrote:
           | Or combine it with something like llama.cpp's grammer or
           | microsoft's guidance-ai[0] (which I prefer) which would allow
           | adding some react-style prompting and external tools. As
           | others have mentioned, instruct tuning would help too.
           | 
           | [0] https://github.com/guidance-ai/guidance
        
       | qwertox wrote:
       | " _The mountain in front of me_ " was a mountain, and I was a
       | mountain.
       | 
       | I was a mountain.
       | 
       | I was a mountain.
       | 
       | I was a mountain.
       | 
       | I was a mountain.
        
       ___________________________________________________________________
       (page generated 2024-01-07 23:01 UTC)