[HN Gopher] Mistral-8x7B-Chat
___________________________________________________________________
Mistral-8x7B-Chat
Author : birriel
Score : 76 points
Date : 2023-12-10 20:13 UTC (2 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| antirez wrote:
| llama.cpp support needed ASAP, not just quantized, but I hope it
| will be albe to run in a way that keeps the same two experts for
| minimum N tokens generated and re-valuate the expert to select
| only then, so that it will be possible to do the inference in low
| ram doing something similar to "paging".
| kristianp wrote:
| There is an issue for it:
|
| https://github.com/ggerganov/llama.cpp/issues/4381
| rgbrgb wrote:
| what a sick project to be able to attract a billionaire
| programmer [0] _and_ c royalty.
|
| [0]: https://github.com/ggerganov/llama.cpp/issues/4216#issue
| comm...
| jaipilot747 wrote:
| For those out of the loop, who are the billionaire
| programmer and C royalty people in this link?
| antirez wrote:
| Tobi is the founder of Shopify.
| a_wild_dandan wrote:
| PSA: This is broken for Apple silicon. It relies on the CUDA-only
| `flash_attn` package. Alas, I wanted to try Mistral's MoE on my
| MacBook. :( Understandable though! It's a new model with an
| incredibly bizarre release. My phalanges are crossed.
| lannisterstark wrote:
| Man this LLM stuff gets released faster than I can keep up.
|
| Is there a centralized list somewhere that tests "use this for x
| purpose, use that for y?"
| brucethemoose2 wrote:
| Even the OSS trainers/finetuners themselves don't really know
| the use case for their models. They just hot drop them and move
| on, lol.
| lxe wrote:
| This is a finetune of the torrent-dropped moe model from a few
| days ago. Not a "release" per se, but rather a display of
| community excitement over this.
| behnamoh wrote:
| > Is there a centralized list somewhere that tests "use this
| for x purpose, use that for y?"
|
| Yeah, "don't use these models for production, use OpenAI for
| production, ignore Claude/Gemini/etc.".
| simonw wrote:
| I thought the new MoE Mistral model was called Mixtral.
| lab wrote:
| We just made it available on OpenRouter:
|
| https://openrouter.ai/models/fireworks/mixtral-8x7b-fw-chat
|
| Chat playground:
|
| https://openrouter.ai/playground?models=fireworks/mixtral-8x...
| Jimbabwe wrote:
| There's probably a better place to ask this highly specific
| technical question, but I'm avoiding Reddit these days so just
| throwing it out I guess. I've been trying to run these in a
| container but it's verrrry slow, I believe, because of the lack
| of gpu help. All the instructions I find are for nvidia gpus and
| my server is a qnap tvs-473e with an embedded amd cpu/gpu (I
| know, I know). The only good news is that I've upgraded the ram
| to 32gb, and I have a 1TB ssd. Any idea of how I can get my own
| self-hosted LLM/chat service on this funky hardware? The
| nvidia/docker option requires installing the nvidia runtime
| alongside docker, but I can't find an amd equivalent. Thanks.
| Sorry for the wall of text nobody cares about.
| brucethemoose2 wrote:
| > qnap tvs-473e
|
| Specs say this runs an AMD RX-421BD. This is a 2015 AMD CPU
| with 2 bulldozer cores and a tiny IGP.
|
| ...To be blunt, you would be much better off running LLMs on
| your phone. Even an older phone. Or literally whatever device
| you are reading HN on. But if you _insist_ , the runtime you
| want in MLC-LLM's Vulkan runtime.
| refulgentis wrote:
| This. Sibling llama.cpp comment is standard "I know
| llama.cpp, I assume that's 80% of the universe instead of
| .8%, and I assume that's all anyone needs. So I know just
| enough to be dangerous with ppl looking for advice".
|
| You'll see it over and over again when you're looking for
| help, be careful, it's 100% a blind alley in your case. It's
| very likely you'll be disappointed by MLC as well,
| simultaneously it's your only real option. You definitely
| won't hit 1 tkn/sec, and honestly, id bet 0.1 tkn / sec
| Jimbabwe wrote:
| Thanks, I'll look into it! Especially if the llama.cpp route
| is a dud, like the other response says it will be. My little
| qnap clunker handles all the self hosting stuff I throw at
| it, but I won't be surprised if it simply has met its match
| curious_cat_163 wrote:
| You'll want to try llama.cpp [1]. The set of models that it can
| support is expanding [2]. Folks have also written services [3]
| that wrap around it.
|
| [1] https://github.com/ggerganov/llama.cpp [2]
| https://huggingface.co/TheBloke [3]
| https://github.com/abetlen/llama-cpp-python
| Jimbabwe wrote:
| Thanks! I was just following the thread about their recent
| addition of the OpenCl support and was on the verge of trying
| it out last weekend. I'll definitely continue once I'm home
| again!
| loxias wrote:
| > my server is a qnap tvs-473e with an embedded amd cpu/gpu
|
| That's your problem. I googled and it looks like one of these
| all-in-one appliances like a drobo or whatever's popular these
| days. That's not a server. (At least, I wouldn't call it a
| server. It's an all-in-one appliance, or toy, depending on
| perspective) And yegods, that price...
|
| Spend $500, get an actual computer, not some priced up
| appliance, and you'll have a _much_ better time. Regardless of
| if you spend it on more CPU or more GPU. You can get a used
| computer off ebay for $100 and shove a $400 graphics card in
| it. Or maybe get a ryzen 7 7700x, I 'm looking at a mobo+cpu
| combo with that for $500 right now.
|
| Finally, to make sure this response does contain a answer to
| what you asked: ;-)
|
| if you can run this stuff in a container on your appliance
| already, but it's very slow, congrats! I'd call that a win. I
| looked up the chip, the RX-421BD, it's of similar power as an
| Athelon circa 2017. I think my router might have more compute
| power. You _do_ have those 512 shader cores, given effort, you
| could try and get them to do something useful. But I wouldn't
| _assume_ it 's possible (well, maybe you don't mind writing
| your own shaders ;-)). Just because the chip has "some gpu"
| doesn't mean it has "the right kind of gpu you'd need to hijack
| for lots of matrix multiplies, without writing the assembly
| yourself".
|
| Sorry this isn't more helpful, but it's the truth.
| vizzah wrote:
| This model is better by many other contenders, but still far from
| GPT4.
|
| "what famous brands are there which change one letter from a
| common word to make a non-existent, but a catchy name, such as
| "musiq" instead of "music".. etc?"
|
| There are several brands that have played with words by changing
| a letter or adding a letter to create a new and memorable name.
| Here are a few examples: Qatar Airways - This
| airline's name was inspired by the country of Qatar, but with the
| letter "Q" replaced by "Q" to create a unique and catchy name.
| Lush Cosmetics - This cosmetics company's name is a play on the
| word "lush," which means "profuse" or "luxuriant."
| Pinterest - This social media site's name is a combination of the
| words "pin" and "interest," but with the letter "e" replaced by
| "i" to create a more unique and memorable name.
| Tumblr - This social media platform's name is a play on the words
| "tumble" and "blogger," but with the letter "l" replaced by "r"
| to create a more unique and catchy name.
|
| ---------
|
| that is compared to hilarious Yi 34B Chat below: =)
|
| There are several famous brands that have used this strategy of
| changing one letter from a common word to create a unique and
| catchy name. Here are a few examples: eBay -
| Instead of "cBay," which would be a reference to "crab," eBay was
| named after the first three letters of the word "baY" and the "e"
| from the word "electronic," as in "electronic marketplace."
| Google - Originally named "BackRub," the company changed its name
| to Google, a play on the word "googol," which refers to the
| number one followed by one hundred zeros, expressing the vastness
| of the internet. Kodak - The name Kodak is
| derived from the word "cod" with an "k" added. It was chosen
| because it is short, easy to remember, and not suggestive of the
| photographic nature of the business.
| brucethemoose2 wrote:
| Here is my response from Yi, which is actually pretty good:
|
| One well-known example is the brand Apple. The company's name
| comes from a play on the phrase "Apple of My Eye." Another
| example is Google, which is derived from the mathematical term
| "googolplex," which refers to the number 1 followed by 100
| zeroes. It was originally meant to be a play on the word
| "googol" (1 followed by 100 zeroes), but a spelling mistake led
| to the creation of the new word "google."
|
| Other examples include PepsiCo, which combines the words
| "pepsin" and "kola," and Amazon, which was inspired by the
| world's largest river, the Amazon River.
|
| In the case of "musiq," it seems like a made-up name without
| any specific reference or known origin. However, some possible
| variations could be created using existing words, such as
| "dramatize" instead of "dramatize" or "fashin" instead of
| "fashion."
| brucethemoose2 wrote:
| Actually I just realized I had the prompt template and
| sampling wrong. First shot once that's fixed:
|
| " Some examples of famous brands that use this strategy
| include:
|
| 1. KFC (Kentucky Fried Chicken) - The brand takes the first
| letter of each word in its full name (Kentucky Fried Chicken)
| and combines them to create a memorable acronym.
|
| 2. Pepsi - The brand's name is a play on the word "pepsin,"
| an enzyme used in digestion. The name was chosen because it
| has a positive connotation and is easy to remember.
|
| 3. Adidas - The brand's name is derived from the founder's
| name, Adolf Dassler, with the 'Das' replaced by an 'a'.
|
| 4. Asics - The brand's name is an acronym for "Animal, Speed,
| Intelligence, and Spirit."
|
| 5. H&M (Hennes & Mauritz) - The brand's name is a combination
| of two Swedish words: "Hennes" means "her" in English, and
| "Mauritz" refers to the men's clothing store "Mauritz
| Widforss."
|
| 6. Nike - The brand's name comes from the Greek goddess of
| victory, "Nike."
|
| 7. Amazon - The brand's name was inspired by the South
| American river Amazon, which is one of the world's largest
| rivers.
|
| 8. Apple - The brand's name was chosen due to co-founder
| Steve Jobs' belief that "Apple" was simple, memorable, and
| had a positive connotation.
|
| 9. Google - The brand's name is a play on the mathematical
| term "googol," which represents the number 1 followed by 100
| zeros.
|
| 10. Twitter - The brand's name is a shortened version of
| "tweet," which refers to short messages sent through the
| platform.
|
| These brands demonstrate how changing or combining letters
| from existing words can create unique and memorable names for
| businesses. "
|
| - This is not vanilla yi, but a merge of several finetunes.
| mirekrusin wrote:
| LLMs work on tokens where characters are hidden away. They'd
| have to be explicitly trained on spelling each token out into
| single letter tokens and as they are bad at information
| symmetry - from single letter tokens back onto tokens as well.
| I don't think anybody does this so they're left with what's in
| training data only. Otherwise they don't have chance to
| reconstruct this information as tokens could map to any
| equivalent visual language ie. looking like hieroglyphs where
| questions you're asking would be nonsensical.
| jakderrida wrote:
| I thought so, too. But then I asked it to define fake words
| that were portmanteaus I made up. Believe me, my
| understanding of BERT and discriminant models aligned
| perfectly with what you're saying. But testing out the theory
| that it can break down and make meaning of fake words with
| accurate depictions of what words I'm combining proved me
| wrong. Generative models must work differently than you and I
| thought.
| nl wrote:
| But both are completely wrong! And technically the Google
| example is closer to correct than any others.
|
| The Yi 34B eBay and Kodak examples are both (wrong but) very
| interesting because it does seem to get the idea of changing
| one letter.
|
| Of GPT4 examples, the Qatar example (replacing "Q" with "Q" !?)
| is the only one that is internally consistent. The Pinterest
| and Tumblr examples are wrong in very odd ways in that the
| explanation doesn't match the spelling.
| behnamoh wrote:
| Are you comparing a 8x7b model with GPT-4? Come on...
| behnamoh wrote:
| Somewhere between shiny Google releases and Mistral's magnet link
| tweet, there's gotta be a sweet spot where you release the model
| but also have enough decency to tell people how to use it
| optimally. Mistral, if you're reading this, I'm talking about
| you.
___________________________________________________________________
(page generated 2023-12-10 23:00 UTC)