[HN Gopher] Mistral 7B
___________________________________________________________________
Mistral 7B
Author : fgfm
Score : 98 points
Date : 2023-10-11 09:56 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| BrunoJo wrote:
| I just started a simple service to use Mistral as a replacement
| for OpenAI. If anyone is interested you can sign up at
| https://lemonfox.ai
| arxiv_papers wrote:
| [dead]
| anon1253 wrote:
| It works really really well for chatbots and roleplay
| applications (at least for me). The fine-tune on the instruct
| version is rather meh however, and I recommend
| https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/ if you plan
| on using it out-of-the-box. Take note of the prompt template,
| you'll get really undesired results otherwise (basically just
| garbage). I've been running it on my pet projects with llama.cpp
| and the inference is blazing fast even with my mediocre 2080
| Super
| visarga wrote:
| Thin paper for a thin & capable model, it is great to have it. It
| made my 2080Ti smarter than ever. But why emulate OpenAI style of
| white papers?
| joennlae wrote:
| Llama1 --> 1.0T Llama2 --> 2.0T Mistral --> ??
|
| They do not publish how many tokens it is pre-trained on,
| additionally to sharing no info on datasets used (except for
| fine-tuning).
|
| To my knowledge, no one has trained a larger LLM (>250M) to the
| capacity limit. As discussed in the original GPT3 paper
| (https://twitter.com/gneubig/status/1286731711150280705?s=20)
|
| TinyLlama is trying to do that for 1.1B:
| https://github.com/jzhang38/TinyLlama
|
| As long as we are not at the capacity limit, we will have a few
| of these 7B beats 13B (or 7B beats 70B) moments.
| kiraaa wrote:
| the paper does not live up to the quality of model lol
| ramesh31 wrote:
| Is it better than llama 2?
| TheRoque wrote:
| Yes
| tarruda wrote:
| It is better than llama 2 7b and 13b. I tried the OpenOrca
| fine tune and it is very good, even when 4-bit quantized
| sebzim4500 wrote:
| For its size, yes. In absolute terms it is obviously less
| capable than llama-2-70B
| espadrine wrote:
| For now. Huggingface[0] mentioned a DPO-fine-tuned version,
| Zephyr 7B, which it claims is competitive with
| Llama2-70B[1].
|
| [0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-
| chat
|
| [1]:
| https://twitter.com/huggingface/status/1711780979574976661
| andai wrote:
| I found llama-2-70B to be a bit worse than GPT-4. (So,
| pretty good!) But I did not compare with GPT-3.
|
| How do llama-2-70B and Mistral 7B compare with GPT-3?
| brucethemoose2 wrote:
| > To evaluate the generalization capabilities of Mistral 7B, we
| fine-tuned it on instruction datasets publicly available on the
| Hugging Face repository.
|
| Heh, they won't even say what datasets they used for chat
| finetuning.
|
| > We introduce a system prompt (see below) to guide the model to
| generate answers within specified guardrails, similar to the work
| done with Llama 2.
|
| This was totally undocumented in the initial model release.
|
| Other than that... Not much really new? We already know it uses
| SWA, though it works without SWA in current llama
| implementations, and SWA isnt new either.
|
| If most upcoming base models are this mysterious on release, the
| field is going to be... weird.
| riedel wrote:
| Weird is the right term: do they want to demonstrate with this
| arxiv paper that they manage reformat a blog post into latex
| and upload it to a preprint site after publication?
| fgfm wrote:
| The research paper by Mistral about their Mistral 7B v0.1
| Nischalj10 wrote:
| what is the best way to fine-tune these models? any good
| resources would be very helpful. TIA /\
|
| PS - I have a brief background in Machine Learning, more in
| development.
| code_biologist wrote:
| Jeremy Howard talks about it in his recent video "A Hackers'
| Guide to Language Models": https://youtu.be/jkrNMKz9pWU?t=4808
|
| That link goes directly to the timestamp where he discusses
| fine tuning, but the whole talk is great. Punchline, check out
| Axolotl: https://github.com/OpenAccess-AI-Collective/axolotl
| yieldcrv wrote:
| can someone explain why the AI or language model community
| circles around arxiv?
|
| I really hate the pseudo-academic gatekeeping in the AI/ML
| community, Google said you have no moat, we all know you have no
| moat, including that degree. we can all fine tune with consumer
| hardware we already have or even better cheaply on readily
| accessible clouds for this specific purpose. why are they still
| doing this fake academic junk.
| SimplyUnknown wrote:
| I mean, you can't just share the weights of the model and call
| it a day, right? You have to share details on what and why you
| are doing. You must communicate this somehow. In theory, you
| might be able to do this in a github readme, but a paper-style
| document on arxiv is nicely suited for this.
| yieldcrv wrote:
| > I mean, you can't just share the weights of the model and
| call it a day, right?
|
| you can't?
| SimplyUnknown wrote:
| Obviously you can, but in the grand scheme of things people
| should share more details about their method so people can
| improve on it in the future, no?
| mark_l_watson wrote:
| I look forward to more released Mistral 7B docs in the future. I
| spent more time with Mistral 7B tuned version yesterday and it
| really is amazing. Subjectively, I find it better than any of the
| 13B models I have used. I support Camenduru on Patreon and I used
| one of his many Colab notebooks yesterday
| https://colab.research.google.com/drive/1-UK_PE8R3xktlwoXqCf...
| benxh wrote:
| It's missing a lot of crucial details. Nothing on the dataset
| used, nothing on the data mix, nothing on their data cleaning
| procedures, nothing on the tokens trained.
| jmac01 wrote:
| I cud almost tell this would be the case when the title of the
| paper was simply Mistral 7B. A little more info would be
| useful!
| dazed_confused wrote:
| What we get when it is on arxiv first before being peer
| reviewed.
| sp332 wrote:
| Still no mention of what data was used for training.
| jagrsw wrote:
| HN comments' section most likely :)
| thawab wrote:
| that's how facebook was sued. Their paper mentioned a data
| sources that was crawling books from pirated sites.
| diggan wrote:
| Maybe the correct way of addressing this problem is by using
| data sources that won't make others sue you, rather than
| hiding what data sources you're using.
| jcuenod wrote:
| You seem to be unfamiliar with the mantra of silicon
| valley.
___________________________________________________________________
(page generated 2023-10-11 16:00 UTC)