[HN Gopher] TinyLlama: An Open-Source Small Language Model
       ___________________________________________________________________
        
       TinyLlama: An Open-Source Small Language Model
        
       Author : matt1
       Score  : 63 points
       Date   : 2024-01-05 21:15 UTC (1 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ronsor wrote:
       | GitHub repo with links to the checkpoints:
       | https://github.com/jzhang38/TinyLlama
        
       | andy99 wrote:
       | I've been using one of the earlier checkpoints for benchmarking a
       | Llama implementation. Completely anecdotally I feel at least as
       | good or better about this one than the earlier openllama 3B. I
       | wouldn't use either of them for RAG or anything requiring more
       | power, just to say that it's competitive as a smaller model,
       | whatever you use those for, and easy to run on CPU at FP16
       | (meaning without serious quantization).
        
         | rnd0 wrote:
         | >I wouldn't use either of them for RAG
         | 
         | What's RAG?
        
           | andy99 wrote:
           | Retrieval augmented generative, basically giving it some text
           | passage and asking questions about the text.
        
           | dmezzetti wrote:
           | If you want more on RAG with a concrete example:
           | https://neuml.hashnode.dev/build-rag-pipelines-with-txtai
        
         | sroussey wrote:
         | What is good for RAG?
        
           | andy99 wrote:
           | The smallest model your users agree meets their needs. It
           | really depends.
           | 
           | The retrieval part is way more important.
           | 
           | I've used the original 13B instruction tuned llama2,
           | quantized, and found it gives coherent answers about the
           | context provided, ie the bottleneck was mostly getting good
           | context.
           | 
           | When I played with long context models (like 16k tokens, and
           | this was a few months ago, maybe they improved) they sucked.
        
         | eachro wrote:
         | What use cases would you say it is good enough for?
        
       | matt1 wrote:
       | OP here with a shameless plug: for anyone interested, I'm working
       | on a site called Emergent Mind that surfaces trending AI/ML
       | papers. This TinyLlama paper/repo is trending #1 right now and
       | likely will be for a while due to how much attention it's getting
       | across social media:
       | https://www.emergentmind.com/papers/2401.02385. Emergent Mind
       | also looks for and links to relevant discussions/resources on
       | Reddit, X, HackerNews, GitHub, and YouTube for every new arXiv
       | AI/ML paper. Feedback welcome!
        
         | ukuina wrote:
         | I visit your site every day. Thank you for creating it and
         | evolving it past simple summaries to show paper details!
         | 
         | I recall you were looking to sell it at some point. Was
         | wondering what that process looked like, and why you ended up
         | holding on to the site.
        
           | matt1 wrote:
           | Hey, thanks for the kind words.
           | 
           | To answer your question: an earlier version of the site
           | focused on surfacing AI news, but that space is super
           | competitive and I don't think Emergent Mind did a better job
           | than the other resources out there. I tried selling it
           | instead of just shutting it down, but ultimately decided to
           | keep it. I recently decided to pivot to covering arXiv
           | papers, which is a much better fit than AI news. I think
           | there's an opportunity with it to not only help surface
           | trending papers, but help educate people about them too using
           | AI (the GPT-4 summaries are just a start). A lot of the
           | future work will be focused in that direction, but I'd also
           | love any feedback folks have on what I could add to make it
           | more useful.
        
         | tmaly wrote:
         | I am new to this space. Is it hard to fine tune this model?
        
       | dmezzetti wrote:
       | Link to model on HF Hub:
       | https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
        
       | minimaxir wrote:
       | It was fun to follow the public TinyLlama loss curves in near
       | real-time, although it showed that it can be frustrating since
       | the loss curves barely moved down even after an extra trillion
       | tokens: https://wandb.ai/lance777/lightning_logs/reports/metric-
       | trai... (note the log-scaled X-axis)
       | 
       | But they _did_ move down and that 's what's important.
       | 
       | There should probably be more aggressive learning rate annealing
       | for models trying to be Chinchilla-optimal instead of just
       | cosine-with-warmup like every other model nowadays.
        
       | sroussey wrote:
       | Needs an onnx folder to use it with transformer.js out of the
       | box.
       | 
       | Hopefully @xenova will make a copy with it soon.
        
       ___________________________________________________________________
       (page generated 2024-01-05 23:00 UTC)