[HN Gopher] Mistral AI Valued at $2B
___________________________________________________________________
Mistral AI Valued at $2B
Author : marban
Score : 265 points
Date : 2023-12-10 18:25 UTC (4 hours ago)
(HTM) web link (www.unite.ai)
(TXT) w3m dump (www.unite.ai)
| Racing0461 wrote:
| With the new AI regulations the EU is going to adopt, how long
| will mistral be paris based?
| rolisz wrote:
| Maybe the regulations will be Mistral shaped.
| Barrin92 wrote:
| there's nothing in the new AI regulations hindering Mistral's
| work. Open Source foundation models are in no way impacted.
|
| https://x.com/ylecun/status/1733481002234679685?s=20
| Racing0461 wrote:
| We both know that's not how regulations work. Mistral is
| going to have to get a legal team to understand the
| regulations, have a line item for each provision, verify each
| one doesn't apply to them, get it signed off and continously
| monitor for changes both to the laws and the code to make
| sure it stays compliant. This will just be a mandate from
| HR/Legal/Investors.
|
| Alot of work for a company with no commercial offering off
| the bat. And possibly an insurmountable amount of work for
| new players trying to enter.
| arlort wrote:
| > Alot of work for a company with no commercial offering
| off the bat
|
| If you have no commercial offering it doesn't apply to you
| at all in the first place
| bsaul wrote:
| If you never have any commercial offering, you have a 0
| valuation.
| andsoitis wrote:
| Regardless of where a company is headquartered, it has to
| comply with local regulations.
| Racing0461 wrote:
| Only if it wants to do business there. If a company is just
| headquartered there, they have to comply with regulations no
| matter what.
| kozikow wrote:
| Or another way to put it - if you are an enterprise based in
| Europe that needs to stay compliant, future regulation will
| make it very hard to not use Mistral :P.
| matmulbro wrote:
| LLM space is so cringe so much excitement from supply side and no
| excitement/cringe from supposed demand side
| huytersd wrote:
| I don't know what you're talking about. I use chatGPT
| extensively. Probably more than 50 times a day. I am extremely
| excited for anything that can top the already amazing thing we
| have now. They have a massive paying customer base.
| 4death4 wrote:
| What do you use it for?
| dartos wrote:
| I usually go to it before google now if I'm looking for an
| answer to a specific question.
|
| I know it can be wrong, but usually when it is, it's
| obviously wrong
| sjfjsjdjwvwvc wrote:
| Not OP but I used it very successfully (not OpenAI but some
| wrapper solution) for technical/developer support. Turns
| out a lot of people prefer talking to a bot that gives a
| direct answer than reading the docs.
|
| Support workload on our Slack was reduced by 50-75% and the
| output is steadily improving.
|
| I wouldn't want to go back tbh.
| kozikow wrote:
| Not OP, but For me:
|
| - Writing: emails, documentation, marketing - Write a bunch
| of unstructured skeleton of information. Add a prompt about
| the intended audience and a purpose. Possibly ask it to add
| some detail.
|
| - Coding: Especially things like "Is there a method for
| this in this library" - a lot quicker than browsing through
| documentation. Some errors - copy-paste the error from the
| console, maybe a little bit for context, and quite often I
| get the solution.
|
| And API based:
|
| - Support bot
|
| - Prompt engineering of some text models that normally
| would require labeling, training, and evaluation for weeks
| or months. A couple of use cases - unstructured text as an
| input + prompt, JSON as an output.
| ipaddr wrote:
| Bash scripts
| s1artibartfast wrote:
| I used it to write my wedding vows
| huytersd wrote:
| Based
| huytersd wrote:
| A lot of very varied things so it's hard to remember.
| Yesterday I used it extensively to determine what I need to
| buy for a chicken coop. Calculating the volume of concrete
| and cinder blocks needed, the type and number of bags of
| concrete I would need, calculating how many rolls of
| chicken wire I would need, calculating the number of
| shingles I would need, questions on techniques, and drying
| times for using those things, calculating how much mortar I
| would need for the cinderblocks (it took into account that
| I would mortar only on the edges, the thickness of mortar
| required for each joint, it accounted for the cores in the
| cinderblocks, it correctly determined I wouldn't need
| mortar on the horizontal axis on the bottom row) etc. All
| of this, I could've done by hand, but I was able to sit and
| literally use my voice to determine all of this in under
| five minutes.
|
| I use DALLE3 extensively for my woodworking hobby, where I
| ask it to come up with ideas for different pieces of
| furniture, and have constructed several based on those
| suggestions.
|
| For work I use it to write emails, to come up with
| skeletons for performance reviews, look back look ahead
| documents, ideas for what questions to bring up during
| sprint reviews based on data points I provide it etc.
| aantix wrote:
| It's replaced Google for me, for most queries.
|
| It's just so much more efficient in getting the answers I
| need. And it makes a great pair programmer partner.
| jay-barronville wrote:
| 100%. ChatGPT is used heavily in my household (my wife and I
| both have paid subscriptions) and it's absolutely worth it.
| One of the most interesting things for me has actually been
| watching my wife use it. She's an academic in the field of
| education and I've seen her come up with so many creative
| uses of the technology to help with her work. I'm a power
| user too, but my usage, as a software engineer, is likely
| more predictable and typical.
| rogerkirkness wrote:
| Microsoft Cloud AI revenue went $90M, $900M, $2.7B in three
| quarters. How much more hard dollar demand growth could there
| possibly be at this point?
| matmulbro wrote:
| it's shovels all the way down
| sjfjsjdjwvwvc wrote:
| shovelling what in your opinion? Or it's just a giant house
| of cards?
| cgearhart wrote:
| Right now they're shoveling "potential". LLMs demonstrate
| capabilities we haven't seen before, so there's high
| uncertainty about the eventual impact. The pace of
| progress makes it _seem_ like an LLM "killer app" could
| appear any day and creating a sense of FOMO.
| shrimpx wrote:
| There's also the race to "AGI" -- companies spending tens
| of billions on training, hoping they'll hit a major
| intelligence breakthrough. If they don't hit anything
| significant that would have been money (mostly) down the
| drain, but Nvidia made out like a bandit.
| quickthrower2 wrote:
| I think there are enough genuine use cases. People are
| saving time using AI tools. There are a lot of people in
| office jobs. It is a huge market. Not to say it won't
| overshoot. With high interest rates valuations should be
| less frothy anyway.
| echelon wrote:
| They're selling to startups, not consumers.
|
| The good startups are building, fine tuning, and running
| models locally.
| Xenoamorphous wrote:
| I can't think of any software/service that's grown more in
| terms of demand over a single year than ChatGPT (in all its
| incarnations, like the MS Azure one).
| itronitron wrote:
| Yeah, the demand side consists solely of those that think they
| will be supply side.
| airspresso wrote:
| Too many superlatives and groundbreaking miracles reported.
| Probably written by AI.
| jay-barronville wrote:
| > In a significant development for the European artificial
| intelligence sector, Paris-based startup Mistral AI has
| achieved a noteworthy milestone. The company has successfully
| secured a substantial investment of EUR450 million, propelling
| its valuation to an impressive $2 billion.
|
| I'm cracking up. I don't need to be a rocket scientist to read
| this and immediately conclude it's AI-generated. I mean, they
| didn't even try to hide that. Haha.
| VirusNewbie wrote:
| A competitor to OpenAI in like, benchmarks?
| consumer451 wrote:
| At least a competitor to Llama, for now.
|
| https://medium.com/@datadrifters/mistral-7b-beats-llama-v2-1...
| z7 wrote:
| Mistral has a lot of potential, but there's the obvious risk that
| without proper monetization strategies it might not achieve
| sustainable profitability in the long term.
| nothrowaways wrote:
| Nothing stops them from launching a chat app.
| quickthrower2 wrote:
| The old open source, but we'll host it for you? I think Bezos
| is going to be in fits of evil laughter about that model in 5
| years, as all the open source compute moves to the clouds,
| with dollars flowing his way.
|
| But one thing Mistral could do is have a free foundational
| model, and have non-free (as in beer, as in speech) "pro"
| models. I think they will have to.
| dartos wrote:
| Release small, open, foundational models.
|
| Deploy larger, fine tuned variants and charge for them.
|
| There's a reason we don't have the data set or original
| training scripts for mistral
| behnamoh wrote:
| it's a "mistry" ;)
| teekert wrote:
| Here's to hoping such models run on dedicated chips
| locally, on Phones and PCs etc...
| emadm wrote:
| They already do, we just released a model equivalent to
| most 40-60b base models that runs on a MacBook Air no
| problem.
|
| It's like 1.6gb, ones coming are better and smaller
| https://x.com/EMostaque/status/1732912442282312099?s=20
|
| I think the large language model paradigm is pretty much
| done as we move to satisficing tbh
| simonw wrote:
| There are huge economy of scale benefits from providing
| hosted models.
|
| I've been trying out all sorts of open models, and some of
| them are really impressive - but for my deployed web apps
| I'm currently sticking with OpenAI, because the performance
| and price I get from their API is generally much better
| than I can get for open models.
|
| If Mistral offered a hosted version which didn't have any
| spin-up time and was price competitive with OpenAI I would
| be much more likely to build against their models.
| quickthrower2 wrote:
| This only is defensible for closed models though.
| echelon wrote:
| Zero moat. Everybody's doing it.
|
| I suppose they could be the Google to everyone else's Yahoo
| and Dogpile, but I expect that to be a hard game to play
| these days.
| digitcatphd wrote:
| I was wondering this. What is their business model exactly?
| Almost seems like Europe's attempt to say "hey, look, we are
| relevant too"
| lolive wrote:
| Being acquired.
| skue wrote:
| At this valuation and given the strength of the team, it's not
| hard to imagine a future acquisition yielding a significant
| ROI.
|
| Besides, we don't know what future opportunities will unfold
| for these technologies. Clearly there's no shortage of smart
| investors happy to place bets on that uncertainty.
| jsemrau wrote:
| Model-as-a-service should work just fine.
| stillwithit wrote:
| Wait what? If company don't make $ it don't survive?
|
| HN could really elevate the discourse if they flagged the
| submarine ads of VCs
| minimaxir wrote:
| It is a relevant question in the AI industry specifically due
| to new concerns about ROI given the intense compute costs.
| lolive wrote:
| Same concern I have regarding Spotify. [Which seems to have
| insane recurring costs. Plus some risky expansive strategic
| moves]
| polygamous_bat wrote:
| Coupled with the concern that once you're charging users money
| for a product, you are also liable for sketchy things they do
| with it. Not so much when you post a torrent link on twitter
| that happens to have model weights.
| niemandhier wrote:
| The French have a urge to be independent, the French government
| will hand them some juicy contract as soon as the can provide
| any product that justifies that.
| emadm wrote:
| Yeah they shouldn't worry, they'll get a big French
| government deal at worst
| lolive wrote:
| One of the French tycoons will eventually buy them.
| yodsanklai wrote:
| > The French have a urge to be independent
|
| They lose that fight a long time ago though. It seems they
| don't even try to pretend anymore.
| dharma1 wrote:
| On their pitch deck it said they will monetise serving of their
| models.
|
| While it may feel like a low moat if anyone can spin up a cloud
| instance with the same model, it's still a reasonable starting
| point. I think they will also be getting a lot of EU clients
| who can't/don't want to use US providers.
| foolfoolz wrote:
| this is inevitable. at some point companies like this will be too
| big to fail like air bus. maybe it's already there
| jaspa99 wrote:
| Curious to see how this will impact Aleph Alpha
| emadm wrote:
| Aleph Alpha raised even more ^_^
|
| https://sifted.eu/articles/ai-startup-aleph-alpha-raises-500...
| quickthrower2 wrote:
| What is the business model?
| hnarayanan wrote:
| Sshh
| quickthrower2 wrote:
| Sorry I forgot, in AI $2Bn is preseed
| malermeister wrote:
| Get the French government to throw a ton of money at you for
| sovereignty reasons
| I_am_tiberius wrote:
| I really hope that a European startup can successfully compete
| with the major companies. I do not want to see privacy
| violations, such as OpenAI's default use of user prompts for
| training, become standard practice.
| quickthrower2 wrote:
| Does Anthropic count as European?
| htrp wrote:
| Dario is italian-american?
| quickthrower2 wrote:
| That doesn't matter too much, the corporate structure is
| more interesting.
| pb7 wrote:
| Elon is South African but that doesn't make Tesla a South
| African company.
| uxp8u61q wrote:
| How on Earth would it count as European? It's a completely
| American company. Founded in the US, by Americans,
| headquartered in the US, funded by American VCs... I
| genuinely don't get how you arrived at the idea that it's
| European.
| quickthrower2 wrote:
| Big office and lots of jobs in UK. And with complex tax
| setups these days I wasn't sure.
| uxp8u61q wrote:
| By that measure I guess Apple is Irish...?!
| totolouis wrote:
| UK is not in the Europe anymore.
| baal80spam wrote:
| Interesting, TIL.
| quickthrower2 wrote:
| They cut through the continental shelf as part of Brexit.
| denlekke wrote:
| maybe not the distinction you meant but the UK is still
| in Europe (the continent) and to me, European is a word
| based on location not membership of the European Union
| (which the UK left)
| mark_l_watson wrote:
| There is a lot of hype around LLMs, but (BUT!) Mistral well
| deserves the hype. I use their original 7B model, as well as some
| derived models, all the time. I can't wait to see what they
| release next (which I expect to be a commercial product, although
| the MoE model set they just released is free).
|
| Another company worthy of some hype is 01.AI which released their
| Yi-34B model. I have been running Yi locally on my Mac (use "
| ollama run yi:34b") and it is amazing.
|
| Hype away Mistral and 01.AI, hype away...
| jay-barronville wrote:
| You mind sharing what you find so amazing about Yi-34B? I
| haven't had a chance to try it.
| mark_l_watson wrote:
| I just installed it on my 32B Mac yesterday, first
| impressions: it does very well reasoning, it does very well
| answering general common sense world knowledge questions, and
| so far when it generates Python code, the code works and is
| well documented. I know this is just subjective, but I have
| been running a 30B model for a while in my Mac and Yi-34B
| just feels much better. With 4bit quantization, I can still
| run Emacs, terminal windows and a web browser with a few tabs
| without seeing much page faulting. Anyway, please try it and
| share a second opinion.
| brucethemoose2 wrote:
| The 200K finetunes are also quite good at understanding their
| huge context.
| dmos62 wrote:
| How do you use these models? If you don't mind sharing. I use
| GPT-4 as an alternative to googling, haven't yet found a reason
| to switch to something else. I'll for example use it to learn
| about the history, architecture, cultural context, etc of a
| place when I'm visiting. I've found it very ergonomic for that.
| teaearlgraycold wrote:
| I've use lm studio. It's not reached peak user friendliness,
| but it's a nice enough GUI. You'll need to fiddle with
| resource allocation settings and select an optimally
| quantized model for best performance. But you can do all that
| in the UI.
| risho wrote:
| lm studio is an accessible simple way to use them. that said
| expecting them to be anywhere near as good as gpt-4 is going
| to lead to disappointment.
| davidkunz wrote:
| I use them in my editor with my plugin
| https://github.com/David-Kunz/gen.nvim
| 3abiton wrote:
| Interesting use case, but the issue is wasting all this
| compute energy for prediction?
| HorizonXP wrote:
| Can you explain what you mean by this question?
| loufe wrote:
| If you want to experiment Kobold.cpp is a great interface and
| goes a long distance to guarantee backwards compatibility of
| outdated model formats.
| gdiamos wrote:
| I host them here: https://app.lamini.ai/playground
|
| You can play with them, tune them, and download the weights
|
| It isn't exactly the same as open source because weights !=
| source code, but it is close in the sense that it is editable
|
| IMO we just don't have great tools for editing LLMs like we
| do for code, but they are getting better
|
| Prompt engineering, RAG, and finetuning/tuning are effective
| for editing LLMs. They are getting easier and better tooling
| is starting to emerge
| p1esk wrote:
| How do these small models compare to gpt4 for coding and
| technical questions?
|
| I noticed that gpt3.5 is practically useless to me (either
| wrong or too generic), while gpt4 provides a decent answer 80%
| of the time.
| modeless wrote:
| They are not close to GPT-4. Yet. But the rate of improvement
| is higher than I expected. I think there will be open source
| models at GPT-4 level that can run on consumer GPUs within a
| year or two. Possibly requiring some new techniques that
| haven't been invented yet. The rate of adoption of new
| techniques that work is incredibly fast.
|
| Of course, GPT-5 is expected soon, so there's a moving
| target. And I can't see myself using GPT-4 much after GPT-5
| is available, if it represents a significant improvement. We
| are quite far from "good enough".
| p1esk wrote:
| I'm both excited and scared to think about this
| "significant improvement" over GPT-4.
|
| It can make our jobs a lot easier or it can take our jobs.
| stavros wrote:
| Isn't that the same? At some point, your job becomes so
| easy that anyone can do it.
| Spivak wrote:
| It's weird for programmers to be worried about getting
| automated out of a job when my job as a programmer is
| basically to try as hard as I can to automate myself out
| of a job.
| rmbyrro wrote:
| I expect the demand for SWE to grow faster than
| productivity gains.
| __loam wrote:
| LLMs are going to spit out a lot of broken shit that
| needs fixing. They're great at small context work but
| full applications require more than they're capable of
| imo.
| OfSanguineFire wrote:
| Curious thought: at some point a competitor's AI might
| become so advanced, you can just ask it to tell you how to
| create your own, analogous system. Easier than trying to
| catch up on your own. Corporations will have to include
| their own trade secrets among the things that AIs aren't
| presently allowed to talk about like medical issues or sex.
| p1esk wrote:
| How to create my own LLM?
|
| Step 1: get a billion dollars.
|
| That's your main trade secret.
| chongli wrote:
| What is inherent about AIs that requires spending a
| billion dollars?
|
| Humans learn a lot of things from very little input.
| Seems to me there's no reason, in principle, that AIs
| could not do the same. We just haven't figured out how to
| build them yet.
|
| What we have right now, with LLMs, is a very crude brute-
| force method. That suggests to me that we really don't
| understand how cognition works, and much of this brute
| computation is actually unnecessary.
| nemothekid wrote:
| If we knew how to build humans for cheap, then it
| wouldn't require spending a billion dollars. Your
| reasoning is circular.
|
| It's precisely because we don't know how to build these
| LLMs cheaply that one must so spend so much money to
| build them.
| chongli wrote:
| The point is that it's not inherently necessary to spend
| a billion dollars. We just haven't figured it out yet,
| and it's not due to trade secrets.
|
| Transistors used to cost a billion times more than they
| do now [1]. Do you have any reason to suspect AIs to be
| different?
|
| [1] https://spectrum.ieee.org/how-much-did-early-
| transistors-cos...
| jryle70 wrote:
| > Transistors used to cost a billion times more than they
| do now
|
| However you would still need billions of dollars if you
| want state of the art chips today, say 3nm.
|
| Similarly, LLM may at some point not require a billion
| dollars, you may be able to get one, on par or surpass
| GPT4, easily for cheap. The state of the art AI will
| still require substantial investment.
| pixl97 wrote:
| >Humans learn a lot of things from very little input
|
| And also takes 8 hours of sleep per day, and are mostly
| worthless for the first 18 years. Oh, also they may tell
| you to fuck off while they go on a 3000 mile nature walk
| for 2 years because they like the idea of free love
| better.
|
| Knowing how birds fly ready doesn't make a useful
| aircraft that can carry 50 tons of supplies, or one that
| can go over the speed of sound.
|
| This is the power of machines and bacteria. Throwing
| massive numbers at the problem. Being able to solve
| problems of cognition by throwing 1GW of power at it will
| absolutely solve the problem of how our brain does it
| with 20 watts in a faster period of time.
| janalsncm wrote:
| Because that billion dollars gets you the R&D to know how
| to do it?
|
| The original point was that an "AI" might become so
| advanced that it would be able to describe how to create
| a brain on a chip. This is flawed for two main reasons.
|
| 1. The models we have today aren't able to do this. We
| are able to model existing patterns fairly well but
| making new discoveries is still out of reach.
|
| 2. Any company capable of creating a model which had
| singularity-like properties would discover them first,
| simply by virtue of the fact that they have first access.
| Then they would use their superior resources to write the
| algorithm and train the next-gen model before you even
| procured your first H100.
| michaelt wrote:
| Maybe not $1 billion, but you'd want quite a few million.
|
| According to [1] a 70B model needs $1.7 million of GPU
| time.
|
| And when you spend that - you don't know if your model
| will be a damp squib like Bard's original release. Or if
| you've scraped the wrong stuff from the internet, and
| you'll get shitty results because you didn't train on a
| million pirated ebooks. Or if your competitors have a
| multimodal model, and you really ought to be training on
| images too.
|
| So you'd want to be ready to spend $1.7 million more than
| once.
|
| You'll also probably want $$$$ to pay a bunch of humans
| to choose between responses for human feedback to fine-
| tune the results. And you can't use the cheapest workers
| for that, if you need great english language skills and
| want them to evaluate long responses.
|
| And if you become successful, maybe you'll also want $$$$
| for lawyers after you trained on all those pirated
| ebooks.
|
| And of course you'll need employees - the kind of
| employees who are very much in demand right now.
|
| You might not need _billions_ , but $10M would be a
| shoestring budget.
|
| [1]
| https://twitter.com/moinnadeem/status/1681371166999707648
| rmbyrro wrote:
| It might work for fine-tuning an open model to a narrow
| use case.
|
| But creating a base model is out of reach. You need an
| order of probably hundreds of millions of $$ (if not
| billion) to get close to GPT 4.
| Xenoamorphous wrote:
| As someone who doesn't know much about how these models
| work or are created I'd love to see some kind of
| breakdown that shows what % of the power of GPT4 is due
| to how it's modelled (layers or whatever) vs training
| data and the computing resources associated with it.
| janalsncm wrote:
| The limiting factor isn't knowledge of how to do it, it
| is GPU access and RLHF training data.
| 0xDEF wrote:
| >I think there will be open source models at GPT-4 level
| that can run on consumer GPUs within a year or two.
|
| There is indeed already open source models rivaling
| ChatGPT-3.5 but GPT-4 is an order of magnitude better.
|
| The sentiment that GPT-4 is going to be surpassed by open
| source models soon is something I only notice on HN. Makes
| me suspect people here haven't really tried the actual
| GPT-4 but instead the various scammy services like Bing
| that claim they are using GPT-4 under the hood when they
| are clearly not.
| rmbyrro wrote:
| Makes me suspect you don't follow HN user base very
| closely.
| refulgentis wrote:
| You're 100% right and I apologize that you're getting
| downvoted, in solidarity I will eat downvotes with you.
|
| HNs funny right now because LLMs are all over the front
| page constantly, but there's a lot of HN "I am an expert
| because I read comments sections" type behavior. So many
| not even wrong comments that start from "I know LLaMa is
| local and C++ is a programming language and I know
| LLaMa.cpp is on GitHub and software improves and I've
| heard of Mistral."
| vitorgrs wrote:
| I believe one of the problems that OSS models need to
| solve, is... dataset. All of them lack a good and large
| dataset.
|
| And this is most noticiable if you ask anything that is not
| in English-American-ish.
| CSMastermind wrote:
| Mistral's latest just released model is well below GPT-3 out
| of the box. I've seen people speculate that with fine-tuning
| and RLHF you could get GPT-3 like performance out of it but
| it's still too early to tell.
|
| I'm in agreement with you, I've been following this field for
| a decade now and GPT-4 did seem to cross a magical threshold
| for me where it was finally good enough to not just be a
| curiosity but a real tool. I try to test every new model I
| can get my hands on and it remains the only one to cross that
| admittedly subjective threshold for me.
| rmbyrro wrote:
| Still, for a 7B model, this is quite impressive.
| espadrine wrote:
| > _Mistral 's latest just released model is well below
| GPT-3 out of the box_
|
| The early information I see implies it is above. Mind you,
| that is mostly because GPT-3 was comparatively low: for
| instance its 5-shot MMLU score was 43.9%, while Llama2 70B
| 5-shot was 68.9%[0]. Early benchmarks[1] give Mixtral
| scores above Llama2 70B on MMLU (and other benchmarks),
| thus transitively, it seems likely to be above GPT-3.
|
| Of course, GPT-3.5 has a 5-shot score of 70, and it is
| unclear yet whether Mixtral is above or below, and clearly
| it is below GPT-4's 86.5. The dust needs to settle, and the
| official inference code needs to be released, before there
| is certainty on its exact strength.
|
| (It is also a base model, not a chat finetune; I see a lot
| of people saying it is worse, simply because they interact
| with it as if it was a chatbot.)
|
| [0]: https://paperswithcode.com/sota/multi-task-language-
| understa...
|
| [1]: https://github.com/open-compass/MixtralKit#comparison-
| with-o...
| brucethemoose2 wrote:
| Have you played with finetunes, like Cybertron? Augmented
| in wrappers and retrievers like GPT is?
|
| It's not there yet, but its waaaay closer than the plain
| Mistral chat release.
| idonotknowwhy wrote:
| If you can run yi34b, you can run phind-codellama. It's much
| better than yi and mistral for code questions. I use it
| daily. More useful than gpt3 for coding, not as good as gpt4,
| except that I can copy and paste secrets into it without
| sending them to openai.
| mark_l_watson wrote:
| Thanks, I will give codellama a try.
| sharemywin wrote:
| what types of things do you ask ChatGPT to do for you
| regarding coding?
| valval wrote:
| Open source models will probably catch up at the same rate as
| open source search engines have caught up to Google search.
| yodsanklai wrote:
| > I use their original 7B model, as well as some derived
| models, all the time.
|
| How does it compare to other models? and with chatgpt in
| particular?
| valval wrote:
| No comparison to be made.
| brucethemoose2 wrote:
| I concur, Yi 34B and Mistral 7B are fantastic.
|
| But you need to run the top Yi finetunes instead of the vanilla
| chat model. They are far better. I would recommend
| Xaboros/Cybertron, or my own merge of several models on
| huggingface if you want the long context Yi.
| transformi wrote:
| Evaluation based on what? what is the business model?
| antirez wrote:
| I believe that the rationale is that if you can do an
| outstanding 7B model, it is likely that you are able to create,
| in the near future, something that may compete with OpenAI, and
| something that makes money, too.
| minimaxir wrote:
| Of course, the reason Mistral AI got a lot of press and publicity
| in the first place was because they _open-sourced_ Mistral-7B
| despite the not-making-money-in-the-short-term aspect of it.
|
| It's better for the AI ecosystem as a whole to incentive AI
| startups to make a business through good and open software
| instead of building moats and lock-in ecosystems.
| jeron wrote:
| They ought to rename to "ReallyOpenAI"
| sillysaurusx wrote:
| I don't think that counts as open source. They didn't share any
| details about their training, making it basically impossible to
| replicate.
|
| It's more akin to a SaaS company releasing a compiled binary
| that usually runs on their server. Better than nothing, but not
| exactly in the spirit of open source.
|
| This doesn't seem like a pedantic distinction, but I suppose
| it's up to the community to agree or disagree.
| minimaxir wrote:
| It's IMO a pedantic distinction.
|
| A compiled binary is a bad metaphor because it gives the
| implication that Mistral-7B is an as-is WYSIWIG project
| that's not easily modifiable. In contrast, there have been a
| bunch of new powerful new models created by modifying or
| finetuning Mistral-7B such as Zephyr-7B:
| https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
|
| The better analogy to Mistral-7B is something like modding
| Minecraft or Skyrim: although those games are closed source
| themselves, it has enabled innovations which helps the open-
| source community directly.
|
| It would be _nice_ to have fully open-source methodologies
| but lacking them isn 't an inherent disqualifier.
| hedgehog wrote:
| It's a big distinction, if I want to tinker with the model
| architecture I essentially can't because the training
| pipeline is not public.
| minimaxir wrote:
| If you want to tinker with the architecture Hugging Face
| has a FOSS implementation in transformers: https://github
| .com/huggingface/transformers/blob/main/src/tr...
|
| If you want to reproduce the _training pipeline_ , you
| couldn't do that even if you wanted to because you don't
| have access to thousands of A100s.
| hedgehog wrote:
| I'm well aware of the many open source architectures, and
| the point stands. Models like GPT-J have open code and
| data, and that allows using them as a baseline for
| architecture experiments in a way that Mistral's models
| can't be. Mistral publishes weights and code, but not the
| training procedure or data. Not open.
| sillysaurusx wrote:
| We do, via TRC. Eleuther does too. I think it's a bad
| idea to have a fatalistic attitude towards model
| reproduction.
| hedgehog wrote:
| Exactly, nice work BTW. And no hate for Mistral, they're
| doing great work, but let's not confuse weights-available
| with fully open models.
| emadm wrote:
| With all the new national supercomputers scale isn't
| really going to be an issue, they all want large language
| models on 10k GH200s or whatever and the libraries are
| getting easier to use
| mrob wrote:
| According to the Free Software Definition:
|
| "Source code is defined as the preferred form of the program
| for making changes in. Thus, whatever form a developer
| changes to develop the program is the source code of that
| developer's version."
|
| According to the Open Source Definition:
|
| "The source code must be the preferred form in which a
| programmer would modify the program. Deliberately obfuscated
| source code is not allowed. Intermediate forms such as the
| output of a preprocessor or translator are not allowed."
|
| LLM models are usually modified by changing the model weights
| directly, instead of retraining the model from scratch. LLM
| weights are poorly understood, but this is an unavoidable
| side effect of the development methodology, not deliberate
| obfuscation. "Intermediate" implies a form must undergo
| further processing before it can be used, but LLM weights are
| typically used directly. LLMs did not exist when these
| definitions were written, so they aren't a perfect fit for
| the terminology used, but there's a reasonable argument to be
| made that LLM weights can qualify as "source code".
| lmm wrote:
| > LLM models are usually modified by changing the model
| weights directly, instead of retraining the model from
| scratch. LLM weights are poorly understood, but this is an
| unavoidable side effect of the development methodology, not
| deliberate obfuscation.
|
| They're understood based on knowing the training process
| though, and a developer working on them would want to have
| the option of doing a partial or full retraining where
| warranted.
| seydor wrote:
| also because their model is unconstrained/censored. and they
| are commited to that according to what they say, they build it
| so others can build on it. GPTs are not finished business and
| hopefully the open source community with surpass the early
| successes.
| asim wrote:
| I have realised just how meaningless valuations now are. As much
| as we use them as a marker of success, you can find someone to
| write the higher valuation ticket when it suits their agenda too
| e.g the markup, the status signal, or just getting the deal done
| ahead of your more rationale competitors in the investment
| landscape. Now that's not to say Mistral isn't a valuable company
| or that they aren't doing good work. It's just valuation markers
| are meaningless and most of this capital raise in the AI space is
| about offsetting the cloud/GPU spend. Might get downvoted to
| death but watching valuation news feels like no news.
| seydor wrote:
| It's smoke. but where there is smoke, there is some level of
| fire
| jack_riminton wrote:
| Not if it's a smoke machine
| mytailorisrich wrote:
| Perhaps someone can answer this: this is a one year old company.
| Does this mean that barriers to entry are low and replication
| relatively simple?
| emadm wrote:
| Main barrier right now is access to supercompute and how to run
| it, everything is standardising quickly in the space
| cavisne wrote:
| The part of Meta research that worked on LLaMa happened to be
| based in the Paris office. Then some of the leads left and
| started Mistral.
|
| Complex/simple is not really the right way to think about
| training these models, I'd say its more arcane. Every mistake
| is expensive because it takes a ton of GPU time and/or human
| fine tuning time. Take a look at the logbooks of some of the
| open source/research training runs.
|
| So these engineers have some value as they've seen these
| mistakes (paid for by Meta's budget).
| JonChesterfield wrote:
| Anyone else think Nvidia giving companies money to spend on
| Nvidia hardware at very high profit margin is a dubious valuation
| scheme?
| raverbashing wrote:
| You'd be surprised how this is much more common than people
| realize
| candiddevmike wrote:
| It's the heads I win, tails you lose investment model
| SeanAnderson wrote:
| Why would it be a dubious valuation scheme? I guess if an
| investor is looking at just revenue, or only looking at one
| area of their business finances, maybe? Otherwise it seems like
| the loss in funds would be weighed against the increase in
| revenue and wouldn't distort earnings.
| JonChesterfield wrote:
| Say big green gives a company $100M with the rider that it
| needs to spend all that on nvidia's hardware in exchange for
| 10% of the company.
|
| Has Nvidia valued the company at 1B? Say their margin is 80%
| on the sales. So Nvidia has lost some cashflow and $20M for
| that 10%. Has Nvidia valued the company at $200M?
| SeanAnderson wrote:
| I see :) Thanks for clarifying. I would say that I don't
| have a strong enough grasp on biz finances to do more than
| speculate here, but:
|
| 1) Is all the money spent up front? Or does it trickle back
| in over a few years? Cash flow might be impacted more than
| implied, but I doubt this is much of an issue.
|
| 2) I wonder how the 10% ownership at 2B valuation would be
| interpreted by investors. If it's viewed as a fairly liquid
| investment with low risk of depreciation then yeah, I could
| see Nvidia's strategy being quite the way to pad numbers.
| OTOH, the valuation could be seen as pure marketing fluff
| and mostly written off by the markets until regulations and
| profitability are firmly in place.
| wongarsu wrote:
| If it was a good valuation scheme, then Nvidia giving them
| $100 million at a $2 billion valuation would mean that Nvidia
| thinks the company is worth $2 billion. But if Mistral uses
| that money to buy GPUs that Nvidia sells with 75% profit
| margin, the deal is profitable for Nvidia even if they
| believe the company is worth only $0.5 billion (since they
| effectively get 75% of the investment back). And if this deal
| fuels the wider LLM hype and leads other companies to spend
| just $50 million more at Nvidia, this investment is
| profitable for Nvidia even if Mistral had negative value.
| emadm wrote:
| With convertible debt and many of these rounds investors
| get the first money out, so the first 450m would go to the
| investors.
| mcmcmc wrote:
| Kinda like MS giving OpenAI all those Azure credits?
| racoonista wrote:
| Unfortunately, the EU also just passed some AI regulations. Not
| sure how they impact Mistral's work, but just FWIW.
| malermeister wrote:
| Why is that an unfortunately? We need regulations to set the
| rules of the game.
| bsaul wrote:
| we don't even know what AI is truely going to look like in 2
| years, and 2 years ago nobody cared. Isn't it a bit too early
| to regulate a field that's barely starting ?
| b2bsaas00 wrote:
| Anyone has example of products that made large use of LLM API
| that could make economics sense to use self-hosted model
| (Mistral, LLAMA)?
| sroussey wrote:
| Im working on embeddings database of my personal information,
| and ability to query it. Just a privacy reason.
| Frummy wrote:
| That's fair given it's 50 times more difficult to use their model
| fidotron wrote:
| There is a lot of noise here suggesting it is too much, but
| relative to the supposed SV unicorns of two years ago this looks
| like an absolute steal.
| yreg wrote:
| The macroeconomic situation 2 years ago and now was wildly
| different.
| hn_throwaway_99 wrote:
| Perhaps too much off-topic, but I hate how the press (and often
| the startups themselves) focuses on the valuation number when a
| company receives funding. As we've seen in very recent history,
| those valuation numbers are at best a finger in the wind, and of
| course a big capital intensive project like AI requires a
| valuation that is at least a couple multiples of the investment,
| even if it's all essentially based on hope.
|
| I think it would make much more sense to focus on the "reality
| side" of the transaction, e.g. "Mistral AI received a EUR450
| million investment from top tech VC firms."
| shrimpx wrote:
| The valuation is meaningful in the sense of "Mistral sells
| 22.5% of company to VC firms."
| nojvek wrote:
| Valuation means Jack shit for early stage startup. WeWork was
| valued at $50B at its peak.
|
| Until a company is consistently showing growth in revenue and a
| path to sustainable profitability, valuation is essentially wild
| speculation.
|
| OpenAI is wildly unprofitable right now. The revenue they make is
| through nice APIs.
|
| What is Mistral's plan for profitability?
|
| Right now stability AI is in dumps and looking for a buyer.
|
| Only companies I see making money in AI are those who live like
| cockroaches and very capital efficient. Midjourney and Comma.ai
| come to mind.
|
| Very much applaud them for open release of models and weights.
| evantbyrne wrote:
| Valuation matters quite a bit for continued funding.
| hauget wrote:
| His point is with regards to reaching & maintaining
| profitability, not revenue spending.
| evantbyrne wrote:
| It's too early for Mistral to focus on revenue. These AI
| companies are best thought of as moonshot projects.
| toss1 wrote:
| Yes, and it can matter in a very bad way if you need to
| subsequently have a "down round" (more funding at a lower
| valuation).
|
| Initial high valuations mean the founders get a lot of
| initial money giving up little stock. This can be awesome if
| they become strongly cash-flow positive before they run out
| of that much runway. But if not, they'll get crammed hard in
| subsequent rounds.
|
| The more key question is: how much funding did they raise at
| that great valuation, and is it sufficient runway? Looks like
| EUR450 million plus an additional EUR120 million in
| convertible debt. Might be enough, depending on their
| expenses...
| evantbyrne wrote:
| I'm not saying that either of your concerns are invalid.
| The LLM space is just the wrong place to be for investors
| who are worried about cash-flow positivity this early in
| the game. These models are crazy expensive to develop
| _currently_, but they is getting cheaper to train all the
| time. Meaning Mistral spent a fraction of what OpenAI did
| on GPT-3 to train their debut model, and that companies
| started one year from now will be spending a fraction of
| what both are spending presently to train their debut
| models.
| emadm wrote:
| It's kinda weird thinking deep tech companies should be
| profitable a year in.
|
| Like it takes time to make lots of money and it's really hard
| to build state of the art models.
|
| Reality is this market is huge and growing massively as it is
| so much more efficient to use these models than many (but not
| all) tasks.
|
| At stability I told team to focus on shipping models as next
| year is the year for generative media where we are the leader
| as language models go to the edge.
| mpalmer wrote:
| They didn't say that companies should be profitable at a year
| in.
|
| To my mind they just seemed to be responding to the slightly
| clickbait-y title, which focuses on the valuation, which has
| some significance but is still pretty abstract. Still,
| headlines love the word "billion".
|
| The straight-news version of the headline would probably
| focus more on a16z's new round.
| nojvek wrote:
| I acknowledge it's easy to be an armchair critic. You are the
| ones in battlefield doing real work and pushing the edge.
|
| The thing is I don't want the pro-open-source players to
| fizzle out and implode because funding dried up and they have
| no path to self sustainability.
|
| AGI could be 6 months away or 6 decades away.
|
| E.g Cruise has a high probability of imploding. They raised
| too much and didn't deliver. Now California has revoked their
| license for driverless cars.
|
| I'm 100% sure AGI, driverless cars and amazing robots will
| come. Fairly convinced the ones who get us there will be the
| cockroaches and not the dinosaurs.
| emadm wrote:
| I think its also tough at the early stage of the diffusion
| (aha) of innovation curve, we are at the point of early
| adopters and high churn before mass adoption of these
| technologies over the coming years as they are good enough,
| fast enough and cheap enough.
|
| AGI is a bit of a canard imo, its not really actionable on
| a business sense.
| vagrantJin wrote:
| comma.ai is a great example of a good business.
|
| But I might have a bias because I was following along as the
| company was built from whiteboard diagrams to what it became.
| stavros wrote:
| This is just tangential, but I wouldn't call their APIs "nice",
| I'd be far less charitable. I spent a few hours (because that's
| how long it took to figure out the API, due to almost zero
| documentation) and wrote a nicer Python layer:
|
| https://github.com/skorokithakis/ez-openai/
|
| With all that money, I would have thought they'd be able to
| design more user-friendly APIs. Maybe they could even ask an
| LLM for help.
| rmbyrro wrote:
| Generally agree.
|
| Instead of "path to profitability", I think path to ROI is more
| appropriate, though.
|
| WhatsApp never had a path to profitability, but it had a clear
| path to ROI by building a unique and massive user base that
| major social networks would fight for.
| wslh wrote:
| > OpenAI is wildly unprofitable right now.
|
| Do we know some of its numbers? How many paid subscribers do
| they have? I pay for two subscriptions.
| segmondy wrote:
| Profitability likewise means jack shit. You just need to be
| have a successful acquisition by a lazy dinosaur or go make
| enough income to go public. You can lose money for 10yrs
| straight while transferring wealth from the public to the
| investors/owners. With that said, I'm short Mistral for them
| being French. I have absolute zero faith in EU based orgs.
|
| On profitability, For all the new comers, I don't think anyone
| can wager that any of them is going to make money. Capital
| efficiency is overrated so long as they can survive for the
| next year+, they are all trying to corner the market and OpenAI
| is the one that seems to have found a way to milk the cow for
| now. I truly believe that the true hitmakers are yet to enter
| the scene.
| wholien wrote:
| how does Mistral monetize or plan to monetize? create a chat gpt-
| like service and charge? license to other businesses?
| nojvek wrote:
| Gotta give it to Nvidia and TSMC. In the big AI race, they're the
| ones with real moat and no serious competition.
|
| No matter who wins, they'll need those sweet GPUs and fabs.
| Yujf wrote:
| Its the good old "in a gold rush, sell shovels"
| ThalesX wrote:
| My 1st thought as an European, "YAY! EU startup to the moon". My
| 2nd thought was "n'aww, American VC". I guess that's the best we
| can do around here.
| paulddraper wrote:
| It may feel that there are few EU startups and that's true.
|
| But there are even fewer EU VCs.
| ThalesX wrote:
| Was CTO for some European startups. I'll always remember one
| when by the time the EU VC was mid-way through its due
| dilligence for 500k seed, we already had some millions lined
| up from some US VCs no questions asked.
| jamesblonde wrote:
| The problem is that no European VC has that amount of capital.
| European VCs typically have a couple of hundred million under
| mgmt. SV VCs have a few billion under mgmt.
| bsaul wrote:
| There were european VCs investing in the very first round,
| french one in particular. Founders are french. This qualifies
| as european in my book (let's not get too demanding)
| firebot wrote:
| Who comes up with these valuations? The Donald?
| eeasss wrote:
| Some folks on this forum seem to get irritated by the prospect of
| a successful AI company HQed in the EU. Why the hate?
| yodsanklai wrote:
| Noob questions (I don't know anything about LLM, I'm just a
| casual user of ChatGPT)
|
| - is what Mistral does better than Meta or OpenAI?
|
| - will LLM become eventually open-source commodities with little
| room for innovation or shall we expect to see a company with a
| competitive advantage that will make it the new Google? in other
| words, how much better can we expect these LLM to be in the
| future? should we expect significant progress or have we reached
| to diminished returns (after all, this is only statistical
| prediction of next word, maybe there's an intrinsic limitation of
| this method)
|
| - are there some sorts of benchmarks to compare all these new
| models?
| nbzso wrote:
| The old Masters have a saying: Never fall in love with your
| creation. The AI industry is falling into the trap of their own
| making (marketing). LLM's are nice toys, but implementation is
| resource/energy expensive and murky at best. There are a lot of
| real life problems that would be solved trough rational approach.
| If someone is thirsty, the water is the most important part, not
| the type of glass:)
| TeMPOraL wrote:
| If you compared the efficiency of steam engines during
| industrial revolution with the ones used today, or power
| generation from 100 years ago to that of now, or between just
| about any chemical process, manufacturing method or
| agricultural technique at its invention and now, you'd be
| amazed by the difference. In some cases, the activity of today
| was _several orders of magnitude more wasteful_ just 100 years
| ago.
|
| Or, I guess look at how size, energy use and speed of computer
| hardware evolved over the past 70 years. Point is,
| implementation being, right now, "resource/energy expensive and
| murky at best" is how many very powerful inventions look at the
| beginning.
|
| > _If someone is thirsty, the water is the most important part,
| not the type of glass:)_
|
| Sure, except here, we're talking about one group selling a
| glass imbued with breakthrough nanotech, allowing it to keep
| the water at desired temperature indefinitely, and continuously
| refill itself by sucking moisture out of the air. Sometimes,
| the type glass may really matter, and then it's not surprising
| many groups strive to be able to produce it.
| nbzso wrote:
| Don't fall in love with your creation, is not stop creating.
|
| https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3
| qeternity wrote:
| I see a lot of comments asking what or how people are using these
| models for.
|
| The promise of LLMs is _not_ in chatbots (imho). At scale, you
| will not even realize you are interacting with a language model.
|
| It just happens to be that the first, most boring, lowest hanging
| fruit products that OAI, Anthropic, et al pump out are chatbots.
___________________________________________________________________
(page generated 2023-12-10 23:01 UTC)