[HN Gopher] Full LLM training and evaluation toolkit
       ___________________________________________________________________
        
       Full LLM training and evaluation toolkit
        
       Author : testerui
       Score  : 132 points
       Date   : 2024-11-24 15:44 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | timhigins wrote:
       | Might be worth updating the title to "SmolLM: state-of-the-art
       | small language model trained on open datasets" (See the first
       | table of https://huggingface.co/blog/smollm for benchmarks)
       | 
       | It was fascinating digging into this to find their dataset
       | weights defined in a declarative YAML file [2]. 70% is from
       | FineWeb/Commoncrawl but filtered using a classifier trained on
       | Llama-70b's rating from 0-5 of the educational content of the
       | text [3]. This is something we know small models like Phi-3 have
       | been doing for a while, but it's great to see a fully open
       | reproduction of it that beats their benchmarks. Definitely
       | supports the idea you can get even better reasoning at smaller
       | model sizes by carefully filtering and curating your training
       | data (and generating good synthetic data from/distilling bigger
       | models).
       | 
       | You can see the 450k Llama educational value scores here:
       | https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu-ll...
       | It's interesting, I think the text with 3 scores is really good,
       | but the 5 scores pick content that is not very reasoning or
       | information-heavy but just mentions education or a worksheet. For
       | SmolLM they just took the documents with scores >= 3 so it
       | doesn't matter a ton.
       | 
       | 2.
       | https://github.com/huggingface/smollm/blob/9efce803bc7e37727...
       | 3. https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier
        
         | timhigins wrote:
         | Update: While SmolLM was SOTA at the time of release in July,
         | SmolLM 2 1.7B (which is the newest release) is not currently
         | the best model under 2B params on
         | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...
        
       | abeppu wrote:
       | While it's great that this is open source, and I understand the
       | pressure for smaller models that can be run in a wider range of
       | contexts, I continue to be annoyed that authors keep posting
       | comparisons to models which are slightly smaller.
       | 
       | In this page, SmolLM2-1.7B does a bit better than Qwen2.5-1.5B
       | which is ahead of Llama3.2-1B. At the next size level up, in
       | other comparisons I've seen that e.g. Phi-3.5 (which is ~3.8B
       | params) does a bit better than Llama 3.2 3B. Gemma 2 has a 9B
       | size, llama 3.1 has an 8B size and I think when that came out
       | Mistral had a 7B model -- so whenever a new "small" thing does
       | "better" than its peers, we can't easily see whether it's because
       | of any of the many small choices that the authors made were
       | actually better.
        
       | bashfulpup wrote:
       | Pythia is stupidly easy to use.
       | 
       | Then hookup a simple test harness. - this is like a grand total
       | of 3 commands - git pull, install, point and run a model
        
       ___________________________________________________________________
       (page generated 2024-11-24 23:00 UTC)