[HN Gopher] Mistral 7B
       ___________________________________________________________________
        
       Mistral 7B
        
       Author : fgfm
       Score  : 98 points
       Date   : 2023-10-11 09:56 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | BrunoJo wrote:
       | I just started a simple service to use Mistral as a replacement
       | for OpenAI. If anyone is interested you can sign up at
       | https://lemonfox.ai
        
       | arxiv_papers wrote:
       | [dead]
        
       | anon1253 wrote:
       | It works really really well for chatbots and roleplay
       | applications (at least for me). The fine-tune on the instruct
       | version is rather meh however, and I recommend
       | https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/ if you plan
       | on using it out-of-the-box. Take note of the prompt template,
       | you'll get really undesired results otherwise (basically just
       | garbage). I've been running it on my pet projects with llama.cpp
       | and the inference is blazing fast even with my mediocre 2080
       | Super
        
       | visarga wrote:
       | Thin paper for a thin & capable model, it is great to have it. It
       | made my 2080Ti smarter than ever. But why emulate OpenAI style of
       | white papers?
        
       | joennlae wrote:
       | Llama1 --> 1.0T Llama2 --> 2.0T Mistral --> ??
       | 
       | They do not publish how many tokens it is pre-trained on,
       | additionally to sharing no info on datasets used (except for
       | fine-tuning).
       | 
       | To my knowledge, no one has trained a larger LLM (>250M) to the
       | capacity limit. As discussed in the original GPT3 paper
       | (https://twitter.com/gneubig/status/1286731711150280705?s=20)
       | 
       | TinyLlama is trying to do that for 1.1B:
       | https://github.com/jzhang38/TinyLlama
       | 
       | As long as we are not at the capacity limit, we will have a few
       | of these 7B beats 13B (or 7B beats 70B) moments.
        
       | kiraaa wrote:
       | the paper does not live up to the quality of model lol
        
         | ramesh31 wrote:
         | Is it better than llama 2?
        
           | TheRoque wrote:
           | Yes
        
           | tarruda wrote:
           | It is better than llama 2 7b and 13b. I tried the OpenOrca
           | fine tune and it is very good, even when 4-bit quantized
        
           | sebzim4500 wrote:
           | For its size, yes. In absolute terms it is obviously less
           | capable than llama-2-70B
        
             | espadrine wrote:
             | For now. Huggingface[0] mentioned a DPO-fine-tuned version,
             | Zephyr 7B, which it claims is competitive with
             | Llama2-70B[1].
             | 
             | [0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-
             | chat
             | 
             | [1]:
             | https://twitter.com/huggingface/status/1711780979574976661
        
             | andai wrote:
             | I found llama-2-70B to be a bit worse than GPT-4. (So,
             | pretty good!) But I did not compare with GPT-3.
             | 
             | How do llama-2-70B and Mistral 7B compare with GPT-3?
        
       | brucethemoose2 wrote:
       | > To evaluate the generalization capabilities of Mistral 7B, we
       | fine-tuned it on instruction datasets publicly available on the
       | Hugging Face repository.
       | 
       | Heh, they won't even say what datasets they used for chat
       | finetuning.
       | 
       | > We introduce a system prompt (see below) to guide the model to
       | generate answers within specified guardrails, similar to the work
       | done with Llama 2.
       | 
       | This was totally undocumented in the initial model release.
       | 
       | Other than that... Not much really new? We already know it uses
       | SWA, though it works without SWA in current llama
       | implementations, and SWA isnt new either.
       | 
       | If most upcoming base models are this mysterious on release, the
       | field is going to be... weird.
        
         | riedel wrote:
         | Weird is the right term: do they want to demonstrate with this
         | arxiv paper that they manage reformat a blog post into latex
         | and upload it to a preprint site after publication?
        
       | fgfm wrote:
       | The research paper by Mistral about their Mistral 7B v0.1
        
       | Nischalj10 wrote:
       | what is the best way to fine-tune these models? any good
       | resources would be very helpful. TIA /\
       | 
       | PS - I have a brief background in Machine Learning, more in
       | development.
        
         | code_biologist wrote:
         | Jeremy Howard talks about it in his recent video "A Hackers'
         | Guide to Language Models": https://youtu.be/jkrNMKz9pWU?t=4808
         | 
         | That link goes directly to the timestamp where he discusses
         | fine tuning, but the whole talk is great. Punchline, check out
         | Axolotl: https://github.com/OpenAccess-AI-Collective/axolotl
        
       | yieldcrv wrote:
       | can someone explain why the AI or language model community
       | circles around arxiv?
       | 
       | I really hate the pseudo-academic gatekeeping in the AI/ML
       | community, Google said you have no moat, we all know you have no
       | moat, including that degree. we can all fine tune with consumer
       | hardware we already have or even better cheaply on readily
       | accessible clouds for this specific purpose. why are they still
       | doing this fake academic junk.
        
         | SimplyUnknown wrote:
         | I mean, you can't just share the weights of the model and call
         | it a day, right? You have to share details on what and why you
         | are doing. You must communicate this somehow. In theory, you
         | might be able to do this in a github readme, but a paper-style
         | document on arxiv is nicely suited for this.
        
           | yieldcrv wrote:
           | > I mean, you can't just share the weights of the model and
           | call it a day, right?
           | 
           | you can't?
        
             | SimplyUnknown wrote:
             | Obviously you can, but in the grand scheme of things people
             | should share more details about their method so people can
             | improve on it in the future, no?
        
       | mark_l_watson wrote:
       | I look forward to more released Mistral 7B docs in the future. I
       | spent more time with Mistral 7B tuned version yesterday and it
       | really is amazing. Subjectively, I find it better than any of the
       | 13B models I have used. I support Camenduru on Patreon and I used
       | one of his many Colab notebooks yesterday
       | https://colab.research.google.com/drive/1-UK_PE8R3xktlwoXqCf...
        
       | benxh wrote:
       | It's missing a lot of crucial details. Nothing on the dataset
       | used, nothing on the data mix, nothing on their data cleaning
       | procedures, nothing on the tokens trained.
        
         | jmac01 wrote:
         | I cud almost tell this would be the case when the title of the
         | paper was simply Mistral 7B. A little more info would be
         | useful!
        
         | dazed_confused wrote:
         | What we get when it is on arxiv first before being peer
         | reviewed.
        
       | sp332 wrote:
       | Still no mention of what data was used for training.
        
         | jagrsw wrote:
         | HN comments' section most likely :)
        
         | thawab wrote:
         | that's how facebook was sued. Their paper mentioned a data
         | sources that was crawling books from pirated sites.
        
           | diggan wrote:
           | Maybe the correct way of addressing this problem is by using
           | data sources that won't make others sue you, rather than
           | hiding what data sources you're using.
        
             | jcuenod wrote:
             | You seem to be unfamiliar with the mantra of silicon
             | valley.
        
       ___________________________________________________________________
       (page generated 2023-10-11 16:00 UTC)