hngopher.com

       [HN Gopher] Codestral: Mistral's Code Model
       ___________________________________________________________________
        
       Codestral: Mistral's Code Model
        
       Author : alexmolas
       Score  : 345 points
       Date   : 2024-05-29 14:16 UTC (8 hours ago)
        
 (HTM) web link (mistral.ai)
 (TXT) w3m dump (mistral.ai)
        
       | mousetree wrote:
       | How does this compare to Github Copilot? It's not shown in their
       | comparison
        
         | nkozyra wrote:
         | Not sure how much current Copilot varies from the original
         | Codex, but another set of benchmarks here:
         | https://paperswithcode.com/sota/code-generation-on-humaneval
        
         | ramon156 wrote:
         | Knowing the training data GH has I doubt it's comparable, then
         | again I don't have the benchmarks
        
           | ramon156 wrote:
           | After typing this I tried the live chat out and it honestly
           | seems a lot more promising than current GH Copilot, very
           | nice!
        
           | ssgodderidge wrote:
           | Are you saying GH has more than Codestral and therefore GH
           | has a better model? Or that Codestral would be better because
           | Codestral is not littered with "bad" code?
        
             | nkozyra wrote:
             | Bad code is obviously very subjective, but I would wager
             | that GH places a much higher value on feedback mechanisms
             | like stars, issues, PRs, velocity, etc. Their ubiquity
             | likely allows them to automatically cherry-pick less "bad
             | code."
        
               | nicce wrote:
               | Nothing prevents Mistral do the same if they want to.
               | Issues and and PRs are public information, exposed by
               | APIs, and not that much rate limited.
        
         | rohansood15 wrote:
         | Copilot primarily uses GPT-3.5, which is outclassed by
         | Llama3-70B. And this model claims to be slightly better than
         | Llama3-70B.
         | 
         | Edit: For those who don't believe me,
         | https://github.com/microsoft/vscode-copilot-
         | release/issues/6.... Gpt-4 for chat, 3.5 for code.
        
           | jasonjmcghee wrote:
           | GitHub Copilot uses GPT-3.5?
           | 
           | I was under the impression it was a custom codex model with a
           | surrogate local model as per
           | https://github.blog/2023-02-14-github-copilot-now-has-a-
           | bett...
           | 
           | When did this change?
        
             | Rastonbury wrote:
             | When it first launched it, I too didn't know they had
             | changed the model from the original codex which came
             | similar time as gpt-3.5
        
           | jasonjmcghee wrote:
           | > Gpt-4 for chat, 3.5 for code
           | 
           | That thread is comparing sidebar chat to inline chat. Doesn't
           | discuss code completions afaict.
        
         | localfirst wrote:
         | It's miles better.
         | 
         | In fact I stopped using expensive GPT-4
         | 
         | Codestral just works, its quick, output is accurate its kinda
         | scary.
        
       | Zambyte wrote:
       | Link to the huggingface page:
       | https://huggingface.co/mistralai/Codestral-22B-v0.1
        
       | bloopernova wrote:
       | Does anyone know of a link to a codegen comparison page? In other
       | words, you write your request, and it's submitted to multiple
       | codegen engines, so you can compare the output.
        
         | rohansood15 wrote:
         | Not the same, but we evaluated how good LLMs are at fixing code
         | and just posted it on HN:
         | https://news.ycombinator.com/item?id=40511689
        
       | andruby wrote:
       | This is an open weights 22B model. The download on Huggingface is
       | 44GB.
       | 
       | Is there a rule-of-thumb estimate for how much RAM this would
       | need to be used locally?
       | 
       | Is the RAM requirement the same for a GPU and "unified" RAM like
       | Apple silicon?
        
         | fnbr wrote:
         | The rule of thumb is roughly 44gb, as most models are trained
         | in bf16, and require 16 bits per parameter, so 2 bytes. You
         | need a bit more for activations, so maybe 50GB?
         | 
         | you need enough RAM and HBM (GPU RAM) so it's a constraint on
         | both.
        
           | sharbloop wrote:
           | Which GPU card can I buy to run this model? Can it run on
           | commercial RTX3090 or does it need a custom GPU?
        
             | Havoc wrote:
             | 3090 or 4090 will be able to run quantized 22B models.
             | 
             | Though realistically for code completion smaller models
             | will be better due to speed
        
             | TechDebtDevin wrote:
             | Easy..
        
           | Novosell wrote:
           | Most GPUs still use GDDR I'm pretty sure, not HBM. Do you
           | mean VRAM?
        
         | mauricio wrote:
         | 22B params * 2 bytes (FP16) = 44GB just for the weights.
         | Doesn't include KV cache and other things.
         | 
         | When the model gets quantized to say 4bit ints, it'll be 22B
         | params * 0.5 bytes = 11GB for example.
        
         | tosh wrote:
         | B x Q / 8
         | 
         | B: number of parameters
         | 
         | Q: quantization (16 = no quantization)
         | 
         | via https://news.ycombinator.com/item?id=40090566
        
         | TechDebtDevin wrote:
         | I'm honestly not sure on how to measure the amount of vRAM
         | required for these models but I suspect this would run
         | relatively fast, depending on your use case, on a mid to high
         | end 20 or 30 series card. No idea about Apple unified RAM. I
         | get a lot out of performance out of even older cards such as a
         | 1080ti but haven't tested this model.
        
         | wing-_-nuts wrote:
         | Wait for a gguf release of this and it will fit neatly into a
         | 3090 with a decent quant. I'm excited for this model and I'll
         | be adding it to my collection.
        
       | sashank_1509 wrote:
       | Seems nice but some preliminary testing against GPT-4o shows it's
       | lacking a bit. It does a pretty good job for easy questions
       | though
        
         | jasonjmcghee wrote:
         | GPT-4o is really oddly hit or miss for code.
         | 
         | Sometimes it outperforms GPT-4 in quality by a fair amount, and
         | other times it starts repeating itself. Duplicating function
         | definitions, even misremembering what things are named.
         | 
         | It seems to have to do with length. If the output exceeds a few
         | thousand tokens, it seems to experience some pretty bad failure
         | modes.
        
           | afro88 wrote:
           | 4o can only output 4k tokens. So the training to complete an
           | answer within 4k tokens is probably kicking in and nerfing
           | the quality
        
         | localfirst wrote:
         | personally this has performed consistently and just as good if
         | not better than GPT-4
         | 
         | what strikes me is the consistency and lack of hallucination
         | you got in GPT4o making in unusuable for any reliable code gen
        
       | swyx wrote:
       | i've been noticing that there's a divergence in philosophy
       | between Llama style LLMs (Mistral are Meta alums so I'm counting
       | them in tehre) and OpenAI/GPT style LLMs when it comes to code.
       | 
       | GPT3.5+ prioritized code very heavily - there's no CodeGPT, its
       | just GPT4, and every version is better than the last.
       | 
       | Whereas the Llama/Mistral models are now shipping the general
       | language model first, then adding CodeLlama/Codestral with
       | additional pretraining (it seems like we don't know how much more
       | tokens are on this one, but CodeLLama was 500B-1T extra tokens of
       | code).
       | 
       | Zuck has mentioned recently that he doesnt see coding ability as
       | important for his usecases, whereas obviously OpenAI is betting
       | heavily on code as a way to improve LLM reasoning for AGI.
        
         | memothon wrote:
         | >Zuck has mentioned recently
         | 
         | That's a really surprising thing to hear, where did you see
         | that? The only quote I've seen is this one:
         | 
         | >"One hypothesis was that coding isn't that important because
         | it's not like a lot of people are going to ask coding questions
         | in WhatsApp," he says. "It turns out that coding is actually
         | really important structurally for having the LLMs be able to
         | understand the rigor and hierarchical structure of knowledge,
         | and just generally have more of an intuitive sense of logic."
         | 
         | https://www.theverge.com/2024/1/18/24042354/mark-zuckerberg-...
        
           | imachine1980_ wrote:
           | Make Sense, they want better interaction whit users for
           | Whatsapp, Instagram and Facebook marketers, content creation
           | and moderation,and their glasses(ai /ar) I don't see in that
           | context why the should push more effort into llm coding, is
           | sad anyways
        
           | whoami_nr wrote:
           | He mentioned it on the Dwarkesh podcast:
           | https://www.youtube.com/watch?v=bc6uFV9CJGg
        
         | tkellogg wrote:
         | The OpenAI philosophy is that adding modes improves everything.
         | Sure, it's astronomically expensive, but I tend to think
         | they're on to something.
        
         | guyomes wrote:
         | > OpenAI is betting heavily on code as a way to improve LLM
         | reasoning for AGI.
         | 
         | And researchers from Google Deepmind, University of Wisconsin-
         | Madison and Laboratoire de l'Informatique du Parallelisme,
         | University of Lyon, actually publish some of their results in
         | that direction [1,2].
         | 
         | [1]: https://deepmind.google/discover/blog/funsearch-making-
         | new-d...
         | 
         | [2]: https://www.nature.com/articles/s41586-023-06924-6
        
         | Rastonbury wrote:
         | I thought that was the idea, open source small specific models
         | that most people can run vs general purpose ones that require a
         | massive amount of GPUs
        
         | behnamoh wrote:
         | > Zuck
         | 
         | No, if anything he said Meta realized coding abilities make the
         | model overall better, so they focused on those more than
         | before.
        
       | sebzim4500 wrote:
       | Very impressed with it based on a short live chat, feels insanely
       | fast considering its capability.
       | 
       | chat.mistral.ai
        
         | kergonath wrote:
         | We'll see how fast it is on consumer hardware once decent
         | quantisations are available.
        
       | colesantiago wrote:
       | I'm so happy now LLMs are democratising access to programming,
       | especially open models like what Meta with Llama and Mistral is
       | doing with Codestral are doing.
       | 
       | The abundance of programming is going to allow almost everyone to
       | become a great programmer.
       | 
       | This is so exciting to see and each day programming is becoming a
       | solved problem so we can focus on other things.
        
         | skydhash wrote:
         | Shadow libraries did more to democratize anything than LLMs.
         | And following a book like Elixir in Action (Manning) will get
         | you there faster than chatting with LLMs or copilot generating
         | code for you.
        
         | smokel wrote:
         | In my experience these tools amplify the quality of a
         | programmer.
         | 
         | I have seen good programmers dramatically increase their
         | productivity, but I've also seen others copy-pasting for loops
         | inside other for loops where one loop would definitely suffice.
         | We're not quite there yet.
        
           | croes wrote:
           | I'm curious for the long-term effect.
           | 
           | I observe a certain laziness in myself when it comes to
           | certain problems. It's easier to ask a LLM and debug provided
           | code, but I ask myself if I'm losing some problem solving
           | capabilities in the long run because of this.
           | 
           | Similar to the loss of speed in doing mental arithmetic
           | because of calculators on the smartphone.
        
           | bubbleRefuge wrote:
           | Absolutely it amplifies. Complex and esoteric configuration
           | of frameworks, for example, entails so much reading and
           | Googling and can be very time consuming without AI. AI can
           | help to bring custom software to the markets that could not
           | otherwise afford to pay for it.
        
         | icedchai wrote:
         | I'm skeptical. I've run into people who used LLMs to code, then
         | can't debug it without someone else's help. It may get you 80%
         | there though.
        
           | whiplash451 wrote:
           | It does not get you 80% there if it achieves what you
           | described. It rather gets you 100% into trouble.
        
             | croes wrote:
             | Programmer view vs management view.
             | 
             | 100% of nothing vs 80% of enough.
             | 
             | That's the risk of AI. Not that AI outperforms humans
             | already but that managers believe it does. That and that
             | code writing is the main work of programmers.
        
             | icedchai wrote:
             | I agree with you. I've had to debug some of that junk.
        
           | Cyphase wrote:
           | I've run into working programmers who were bad at debugging
           | before LLMs existed.
        
         | croes wrote:
         | >The abundance of programming is going to allow almost everyone
         | to become a great programmer.
         | 
         | How do you become a great programmer if you don't really
         | program?
        
         | maskil wrote:
         | I would argue the opposite is true.
         | 
         | My experience with coding with LLMs is that the only thing it's
         | really good at is generating boilerplate that it has more-or-
         | less seen before (essentially a library, even if is somewhat
         | adapted), however it is incapable of the creative thinking that
         | developers regularly need to engage in when architecting a
         | solution for their use case.
        
           | Kiro wrote:
           | My experience is the opposite. When I started using Copilot I
           | thought it would only be good at standard boilerplate but I'm
           | constantly surprised how well it understands my completely
           | convoluted legacy architecture that barely I understand
           | myself even though I'm the only contributor.
        
             | localfirst wrote:
             | I've been on both sides of the fence here.
             | 
             | Parents problem I experienced -> it gets "stuck" and its
             | limitation of learning loop (humans are always asking why
             | it gets stuck and how to get unstuck), LLMs just power
             | through without understanding what "stuck" is.
             | 
             | For explaining existing corpus, algorithm it does a
             | fantastic job.
             | 
             | So likely we will see significant wage garnishing in
             | "agency/b2b enterprise" shops.
        
         | huygens6363 wrote:
         | This enables everyone to be great programmers like how easily
         | available power tools enables everyone to be a great carpenter
         | and general craftsman.
         | 
         | You'll get a lot of shitty stuff and the profession will get
         | hollowed out losing attraction of the smart people. We'll be
         | left with low-quality, disposable bullshit while wondering
         | where all the programmers went.
        
       | jhonatan08 wrote:
       | Do we have a list of the 80+ languages it was trained on? I
       | couldn't find it
        
       | ddavis wrote:
       | My favorite thing to ask the models designed for programming is:
       | "Using Python write a pure ASGI middleware that intercepts the
       | request body, response headers, and response body, stores that
       | information in a dict, and then JSON encodes it to be sent to an
       | external program using a function called transmit." None of them
       | ever get it right :)
        
         | bongodongobob wrote:
         | Cool, you've identified that your prompt is inadequate for the
         | task.
         | 
         | 'Pray, Mr. Babbage, if you put into the machine wrong figures,
         | will the right answers come out?'
        
           | TechDebtDevin wrote:
           | Damn, show us your brilliant prompt then. LLMs cannot do
           | this, not even in python, of which there are libraries like
           | Blacksheep that honestly make it a trivial task.
        
             | bongodongobob wrote:
             | My point is that you shouldn't expect to one shot
             | everything. Have it start by writing a spec, then outline
             | classes and methods, then write the code, and feed it debug
             | stuff.
        
               | bottom999mottob wrote:
               | Exactly, expecting one shot 100% working code with one
               | prompt is ridiculous at this point. It's why libraries
               | like Aider are so useful, because you can iteratively
               | diff generated code until it's useable.
        
               | TechDebtDevin wrote:
               | Sure it's impossible at this point, but the point of a
               | benchmark isn't to complete the task it's to test it's
               | efficacy overall and to see progress. None of them are
               | 100% at even the simplistic python benchmarks, doesn't
               | mean we shouldn't measure that capability. But sure, I
               | get it. That's not how they are intended to be used but
               | that's also not the point the commenter was laying out.
        
               | TechDebtDevin wrote:
               | I see your point but hand holding isn't really a good way
               | to benchmark a models coding capabilities.
        
               | Closi wrote:
               | Depends if benchmarking is the aim, rather than
               | decreasing the time it takes to build things.
        
               | TechDebtDevin wrote:
               | Well sure, but that wasn't what we were discussing. The
               | original comment says they use that as their benchmark.
               | While their coding task is a bit complex compared to
               | other benchmarking prompts, it's not that crazy. Here is
               | an example of prompts used for benchmarking with Python
               | for reference:
               | 
               | https://huggingface.co/datasets/mbpp?row=98
               | 
               | At the end of the day LLMs in their current iteration
               | aren't intended to do even moderately difficult tasks on
               | their own but it's fun to query them to see progress when
               | new claims are made.
        
               | Closi wrote:
               | The original comment says nothing about benchmarking,
               | they just say that an AI can't one shot their complex
               | task?
        
               | amne wrote:
               | When I read
               | 
               |  _" My favorite thing to ask the models designed for
               | programming is ....... None of them ever get it right"_
               | 
               | I read "benchmark".
        
             | ben_w wrote:
             | Prompts like yours (I ask them for a fluid dynamics
             | simulator which also doesn't succeed) inform us of the
             | level they have reached. A useful benchmark, given how many
             | of the formal ones they breeze through.
             | 
             | I'm glad they can't quite manage this yet. Means I still
             | have a job.
        
             | Closi wrote:
             | Break your prompt up into smaller pieces and it can.
        
               | qeternity wrote:
               | Taken to the extreme, a sufficiently broken down prompt
               | is simply the code itself.
               | 
               | The whole point is to prompt less?
        
               | meiraleal wrote:
               | > Taken to the extreme, a sufficiently broken down prompt
               | is simply the code itself
               | 
               | it is not. But the artifacts generated through the steps
               | will be code. The last prompt will have most of the code
               | supplied to it as the context.
        
               | achierius wrote:
               | A prompt is just a specification for an output. Code is
               | just what we call a sufficiently detailed specification.
        
               | buddhistdude wrote:
               | No he is right, he is saying taken to the extreme. The
               | point is the more and more specific you have to prompt,
               | the more you are actually contributing to the result
               | yourself and the less the model is
        
               | Closi wrote:
               | More practically, the whole point is to prompt enough to
               | generate valid code.
        
           | ddavis wrote:
           | It's something I know how to do after figuring it out myself
           | and discovering the potential sharp edges, so I've made it
           | into a fun game to test the models. I'd argue that it's a
           | great prompt (to keep using consistently over time) to see
           | the evolution of this wildly accelerating field.
        
             | kergonath wrote:
             | Do you notice any progress over time?
        
           | AnimalMuppet wrote:
           | How is that "putting in wrong figures"? It's a perfectly
           | valid prompt, written in clear, proper English.
        
         | sanex wrote:
         | Can you get it right without an IDE?
        
           | ddavis wrote:
           | Nope, I don't know how to do it at all- that's why I have to
           | ask AI!
        
         | nicce wrote:
         | I usually through some complex Rust code with lifetime
         | requirements. And ask them to fix it. LLMs aren't capable on
         | providing much help for that in general, other than some very
         | basic cases.
         | 
         | The best way to get your work done is still to look into Rust
         | forums.
        
           | meiraleal wrote:
           | It works amazingly well for the ones that never coded in
           | Rust, at least in my experience. It took me a couple hours
           | and 120 lines of code to set up a WebRTC signaling server.
        
         | meiraleal wrote:
         | Interesting. My favorite thing to ask the models is to refactor
         | code I've not touched for too long and this works very well.
        
         | JimDabell wrote:
         | I normally ask about building a multi-tenant system using async
         | SQLAlchemy 2 ORM where some tables are shared between tenants
         | in a global PostgreSQL schema and some are in a per-tenant
         | schema.
         | 
         | Nothing gets it right first time, _but_ when ChatGPT 4 first
         | came out, I could talk to it more and it would eventually get
         | it right. Not long after that though, ChatGPT degraded. It
         | would get it wrong on the first try, but with every subsequent
         | follow up it would forget one of the constraints. Then when it
         | was prompted to fix that one, it forgot a different one. And
         | eventually it would cycle through all of the constraints,
         | getting at least one wrong each time.
         | 
         | Since then benchmarks came out showing that ChatGPT "didn't
         | really degrade", but all of the benchmarks seemed focused on
         | single question/answer pairs and not actual multi-turn chat.
         | For this kind of thing, ChatGPT 4 has never managed to recover
         | to as good as it was when it was first released in my
         | experience.
         | 
         | It's been months since I've had to deal with that kind of code,
         | so I might be forgetting something, but I just tried it with
         | Codestral and it spat out something that looked reasonable very
         | quickly on its first try.
        
           | checkyoursudo wrote:
           | I had a similar experience. I was trying to get GPT 4 to
           | write some R/Stan code for a bit of bayesian modelling. It
           | would get the model wrong, and then I would walk it through
           | how to do it right, and by the end it would almost get it
           | right, but on the next step, it would be like, oh, this is
           | what you want, and the output was identical to the first
           | wrong attempt, which would start the loop over again.
        
             | happypumpkin wrote:
             | Similar experience using GPT4 for help with Apple's
             | Accessibility API. I wanted to do some non-happy-path
             | things and it kept looping between solutions that failed to
             | satisfy at least one of a handful of requirements that I
             | had, and in ways that I couldn't combine the different
             | "solutions" to meet all the requirements.
             | 
             | I was eventually able to figure it out with the help of
             | some early 2010s blog posts. Sadly I didn't test giving it
             | that context and having it attempt to find a solution again
             | (and this was before web browsing was integrated with the
             | web app).
             | 
             | More of an issue than it not knowing enough to fulfill my
             | request (it was pretty obscure so I didn't necessarily
             | expect that it would be able to) was that it didn't mind
             | emitting solutions that failed to meet the requirements. "I
             | don't know how to do that" would've been a much preferred
             | answer.
        
           | alephxyz wrote:
           | >It would get it wrong on the first try, but with every
           | subsequent follow up it would forget one of the constraints.
           | Then when it was prompted to fix that one, it forgot a
           | different one. And eventually it would cycle through all of
           | the constraints, getting at least one wrong each time.
           | 
           | That drives me nuts and makes me ragequit about half the
           | time. Although it's usually more effective to go and correct
           | your initial prompt rather than prompt it again
        
         | gyudin wrote:
         | I ask software developers to do the same thing and give them
         | the same amount of time. None of them ever write a single line
         | of code :)
        
           | dieortin wrote:
           | Give an LLM all the time you want, and they will still not
           | get it right. In fact, they most likely will give worse and
           | worse answers with time. That's a big difference with a
           | software developer.
        
             | mypalmike wrote:
             | My experience is very different. Often it (ChatGPT or
             | Copilot, depending on what I'm trying to accomplish) gets
             | things right the first time. When it doesn't, it's usually
             | close enough that a bit of manual modification is all
             | that's needed. Sometimes it's totally wrong, but I can
             | usually point it in the right direction.
        
         | shepardrtc wrote:
         | gpt-4o gets it right on the first try for me. Just ran it and
         | tested it.
        
         | spmurrayzzz wrote:
         | I love to ask it to "make me a Node.js library that pings an
         | ipv4 address, but you must use ZERO dependencies, you must only
         | the native Node.js API modules"
         | 
         | The majority of models (both proprietary and open-weight) don't
         | understand:
         | 
         | - by inference, ping means we're talking about ICMP
         | 
         | - ICMP requires raw sockets
         | 
         | - Node.js has no native raw socket API
         | 
         | You can do some CoT trickery to help it reason about the
         | problem and maybe finally get it settled on a variety of
         | solutions (usually some flavor of building a native add-on
         | using C/C++/Rust/Go), or just guide it there step by step
         | yourself, but the back and forth to get there requires a ton of
         | pre-knowledge of the problem space which sorta defeats the
         | purpose. If you just feed it the errors you get verbatim trying
         | to run the code it generates, you end up in painful feedback
         | loops.
         | 
         | (Note: I never expect the models to get this right, it's just a
         | good microcosmic but concrete example of where knowledge &
         | reasoning meets actual programming acumen, so its cool to see
         | how models evolve to get better, if at all, at the task).
        
       | IMTDb wrote:
       | Is there a way to use this within VSCode like copilot , meaning
       | having the "shadow code" appear while you code instead of having
       | to tho back-and-forth between the editor and a chat-like
       | interface ?
       | 
       | For me, a significant component of the quality of these tools
       | resides on the "client" side; being able to engineer a prompt
       | that will yield to accurate code being generated by the model.
       | The prompt needs to find and embed the right chunks from the user
       | current workspace, or even from his entire org repos. The model
       | is "just" one piece of the puzzle.
        
         | meiraleal wrote:
         | I created a simple CLI app that does this in my workspace,
         | which is under source control so after the LLM execution all
         | the changes are highlighted by diff and the LLM also creates a
         | COMMIT_EDITMSG file describing what it changed. Now I don't use
         | chatgpt anymore, only this cli tool.
         | 
         | I never saw something like this integrated directly on VSCode
         | tho (and isn't my preferred workflow anyway, command line works
         | better).
        
         | pyepye wrote:
         | Not using Codestral (yet) but check out Continue.dev[1] with
         | Ollama[2] running llama3:latest and starcoder2:3b. It gives you
         | a locally running chat and edit via llama3 and autocomplete via
         | starcoder2.
         | 
         | It's not perfect but it's getting better and better.
         | 
         | [1] https://www.continue.dev/ [2] https://ollama.com/
        
           | jmorgan wrote:
           | Codestral was just published here as well:
           | https://ollama.com/library/codestral
        
           | sa-code wrote:
           | This doesn't give the "shadow text" that the user
           | specifically mentioned
        
           | mijoharas wrote:
           | Wow... That site (continue.dev) managed to consistently crash
           | my mobile google chrome.
           | 
           | I've had the odd crash now and again, but I can't think of
           | many sites that will reliably make it hard crash. It's almost
           | impressive.
        
         | croes wrote:
         | You mean like in their example VS code integration shown here?:
         | 
         | https://m.youtube.com/watch?v=mjltGOJMJZA
        
         | jacekm wrote:
         | The article says that the model is available in Tabnine, a
         | direct competitor to Copilot.
        
         | jdoss wrote:
         | I have been using Ollama to run the Llama3 model and I chat
         | with it via Obsidian using
         | https://github.com/logancyang/obsidian-copilot and I hook
         | VSCode into it with https://github.com/ex3ndr/llama-coder
         | 
         | Having the chats in Obsidian lets me save them to reference
         | them later in my notes. When I first started using it in VSCode
         | when programming in Python it felt like a lot of noise at
         | first. It kept generating a lot of useless recommendations, but
         | recently it has been super helpful.
         | 
         | I think my only gripe is I sometimes forget to turn off my
         | ollama systemd unit and I get some noticeable video lag when
         | playing games on my workstation. I think for my next video card
         | upgrade, I am going to build a new home server that can fit my
         | current NVIDIA RTX 3090 Ti and use that as a dedicated server
         | for running ollama.
        
       | analyte123 wrote:
       | The license for this [1] prohibits use of the model and its
       | outputs for any commercial activity, or even any "live" (whatever
       | that means) conditions, commercial or not.
       | 
       | There seems to be an exclusion for using the code outputs as part
       | of "development". But wait! It also prohibits "any internal usage
       | by employees in the context of the company's business
       | activities". However you interpret these clauses, this puts their
       | claims and comparisons on completely unequal ground. They only
       | compare to other open-weight models, not GPT-4 or Opus, but a
       | normal company or individual can do whatever they want with the
       | Llama weights and outputs. LangChain? "Your favourite coding and
       | building environment"? Who cares? It seems you're not allowed to
       | integrate this with anything else and show it to anyone, even as
       | an art project.
       | 
       | [1] https://mistral.ai/licenses/MNPL-0.1.md
        
         | croes wrote:
         | It's more like a demo version you can evaluate before you need
         | to buy a commercial license.
         | 
         | On whose code is Mistral trained?
        
           | rvnx wrote:
           | Your code, my code, etc. But there is a common case with law;
           | copyright do not apply when you have billions.
           | 
           | Examples: recurring infringement from Microsoft on open-
           | source projects, Google scraping content to build their own
           | database, etc...
        
         | foobiekr wrote:
         | There's some irony in the fact that people will ignore this
         | license in exactly the same way Mistral and all the other LLM
         | guys ignore the copyright and licensing on the works they
         | ingest.
        
           | nicce wrote:
           | In many countries you even can't claim copyright for the
           | output of the AI to use license like this.
        
             | hannasanarion wrote:
             | Copyright on the software that produces something isn't the
             | same as copyright on the output.
             | 
             | The library's copyright is intact, as normal, and they can
             | control who uses it and how just like any other software.
             | 
             | The _output_ of AI systems is not copyrightable, but the
             | systems themselves are, and associated EULAs are valid.
        
               | nicce wrote:
               | Is that so certain? To be able to make claims for _what_
               | you can use the output, can you do it without making any
               | claims for about control and ownership of the output?
               | 
               | Of course, they can revoke your right to use the
               | software, but if it goes to court, that would be
               | interesting case.
        
           | belter wrote:
           | And nobody will sue anybody because suing
           | means...discovery....
        
           | hehdhdjehehegwv wrote:
           | So basically I, as an open source author, had my code eaten
           | up by Mistral without my consent, but if I want to use their
           | code model I'm subject to a bunch of restrictions that
           | benefit their bottom line?
           | 
           | The problem these AI companies have is they live in a glass
           | house and they can't throw IP rocks around without breaking
           | their own "your content is our training data" foundation.
           | 
           | They only reason I can think of that Google doesn't go after
           | OpenAI for scraping YouTube is then they'd put themselves in
           | the same crosshairs, and may set a precedent they'd also be
           | bound by.
           | 
           | Given the model is "on the web" I have the same rights as
           | Mistral to use anything online however I want without regard
           | for IP, right?
           | 
           | Utter absurdity.
        
             | htrp wrote:
             | call it an Enterprise poison pill.
        
               | hehdhdjehehegwv wrote:
               | But a pill they also have to swallow.
        
               | dodslaser wrote:
               | Enterprise suicide cult.
        
             | IshKebab wrote:
             | > So basically I, as an open source author, had my code
             | eaten up by Mistral without my consent
             | 
             | Not necessarily. You consented to people reading your code
             | and learning from it when you posted it on Github. Whether
             | or not there's an issue with AI doing the same remains to
             | be settled. It certainly isn't clear cut that separate
             | consent would be required.
        
               | hehdhdjehehegwv wrote:
               | I did not give consent to train on my software and the
               | license does not allow commercial use of it.
               | 
               | They have taken _my_ code and now are dictating how _I_
               | can use their derived work.
               | 
               | Personally I think these tools are useful, but if the
               | data comes from the commons the model should also belong
               | to the commons. This is just another attempt to gain
               | private benefit from public work.
               | 
               | There are legal issues to be resolved, and there is an
               | explosion of lawsuits already, but the fact pattern is
               | simple and applies to nearly all closed-source AI
               | companies.
        
               | portaouflop wrote:
               | Mistral is as open as they get, most others are far
               | worse. Here you can use the model without issues, as
               | others are saying it's doubtful they would sue you if you
               | were to use code generated by the model in a commercial
               | app
        
               | mananaysiempre wrote:
               | Replit's replit-code[1,2] is CC BY-SA 4.0 for the
               | weights, Apache 2.0 for the sources. Replit has its own
               | unpleasant history[3], but the model's terms are good.
               | (The model is not as good, but deciding whether that's a
               | worthwhile tradeoff is up to you. The tradeoff exists and
               | is meaningful, is my point.)
               | 
               | [1] https://huggingface.co/replit/replit-code-v1-3b
               | 
               | [2] https://huggingface.co/replit/replit-code-v1_5-3b
               | 
               | [3] https://news.ycombinator.com/item?id=27424195
        
               | Liquix wrote:
               | MIT/BSD code is fair game, but isn't the whole point of
               | GPL/AGPL "you can read and share and use this, but you
               | can't take it and roll it into your closed commercial
               | product for profit"? It seems like what Mistral and co
               | are doing is a fundamental violation of the one thing GPL
               | is striving to enforce.
        
           | nullc wrote:
           | Five years ago it would not have been at all controversial
           | that these weights would not be copyrightable in the US,
           | they're machine generated output on third party data. Yet
           | somehow we've entered a weird timeline where obvious
           | copyfraud is fine, by the same entities that are at best on
           | the line of engaging in commercial copyright _infringement_
           | at a hereto unforeseen scale.
        
         | rohansood15 wrote:
         | That license is just hilarious.
        
           | das_keyboard wrote:
           | OT, but 7.2 reads like the description of some Yu-Gi-Oh card
           | or something:
           | 
           | > Mistral AI may terminate this Agreement at any time [...].
           | Sections 5, 6, 7 and 8 shall survive the termination of this
           | Agreement.
        
         | behnamoh wrote:
         | From the website:
         | 
         | > licensed under the new Mistral AI Non-Production License,
         | which means that you can use it for research and testing
         | purposes. ...
         | 
         | Which basically means "we give you this model. Go find its
         | weaknesses and report on r/locallama. Then we'll use that to
         | improve our commercial model which we won't open-source."
         | 
         | I'm sick of abusing the word "open-source" in this field.
        
           | JimDabell wrote:
           | > I'm sick of abusing the word "open-source" in this field.
           | 
           | They don't call this open source anywhere, do they? As far as
           | I can see, they only say it's open weights and that it's
           | available under their Mistral AI Non-Production License for
           | research and testing. That doesn't scream "open source" to
           | me.
        
             | demosthanos wrote:
             | They do say "open-weight", which is I think still very
             | misleading in this context. Open-weight sounds like it
             | should be the same as open-source, just for weights instead
             | of the full source (for example, training data and the code
             | used to generate the weights may not be released). This
             | isn't really "open" in any meaningful sense.
        
               | Zambyte wrote:
               | The fact that I can downloaded it and run it myself is a
               | pretty meaningful amount of openness to me. I can easily
               | ignore their bogus claims about what I'm allowed to do
               | with it due to their distribution model. I can't
               | necessarily do the same with a propriety service, as they
               | can cut me off if the way I use the output makes them sad
               | :(
        
               | TeMPOraL wrote:
               | > _The fact that I can downloaded it and run it myself is
               | a pretty meaningful amount of openness to me_
               | 
               | That's typically called _freeware_ , though.
        
               | Zambyte wrote:
               | The inference engine that I use to run open weight
               | language models is fully free software. The model itself
               | isn't really software in the traditional sense. So
               | calling it ____ware seems inaccurate.
        
               | TeMPOraL wrote:
               | The interpreter is free software. The model is freeware
               | distributed as a binary blob. Code vs. Data is a matter
               | of perspective, but with large neural nets, more than
               | anywhere, it makes no sense to pretend they're plain
               | data. All the computational complexity is in the weights,
               | they're very much code compiled for an unusual
               | architecture (the inference engine).
        
               | demosthanos wrote:
               | > I can easily ignore their bogus claims about what I'm
               | allowed to do with it due to their distribution model.
               | 
               | If you're talking about exclusively personally use, sure.
               | If you're talking about a business setting in a
               | jurisdiction that Mistral can sue in, not so much.
               | 
               | Being able to use it in a business setting is a pretty
               | darn important part of what Open Source has always meant
               | (it's why it exists as a term at all).
        
               | boulos wrote:
               | This is why I prefer the term "weights available" just
               | like "source available". It makes it clear that you can
               | get your hands on the copy, you could run this exact
               | thing locally if they go out of business, etc. but it is
               | definitely not open in the OSS sense.
        
             | gyudin wrote:
             | All their other models are "open source" and it was the
             | selling point they built their brand on. I doubt they made
             | their new model completely different from previous ones so
             | it's supposed be open source too, unless they found some
             | juridical loophole lol
        
             | Rastonbury wrote:
             | No but they do say "empowering developers" and
             | "democratising coding" as the subtitle, I guess only those
             | who pay
        
         | meiraleal wrote:
         | > Who cares? It seems you're not allowed to integrate this with
         | anything else and show it to anyone, even as an art project.
         | 
         | Now they just lack the means to enforce it.
        
           | localfirst wrote:
           | impossible to enforce
        
         | isoprophlex wrote:
         | So, it's almost entirely useless with that license, because the
         | average pack of corpo beancounters will never let you use it
         | over whatever Microsoft has already sold them.
        
         | batch12 wrote:
         | If they can make agreements with arbitrary terms, why can't we?
         | [0]
         | 
         | [0] https://o565.com/content-ownership-and-licensing-agreement/
        
       | croes wrote:
       | >Usage Limitation
       | 
       | - You shall only use the Mistral Models and Derivatives (whether
       | or not created by Mistral AI) for testing, research, Personal, or
       | evaluation purposes in Non-Production Environments;
       | 
       | - Subject to the foregoing, You shall not supply the Mistral
       | Models, Derivatives, or Outputs in the course of a commercial
       | activity, whether in return for payment or free of charge, in any
       | medium or form, including but not limited to through a hosted or
       | managed service (e.g. SaaS, cloud instances, etc.), or behind a
       | software layer
        
       | jstummbillig wrote:
       | How I interact with new model reports at this point: Open the
       | page, ctrl + f, "gpt-4" and skip the rest.
        
       | isoprophlex wrote:
       | Does it do SQL, and if so, which dialects? I am having a hard
       | time figuring out what it is actually trained on
        
         | sebzim4500 wrote:
         | They claim good results in a SQL benchmark but they don't
         | specify what dialects it knows.
        
       | esafak wrote:
       | Are there any IDE plugins that index your entire code base in
       | order to provide contextual responses AND let you pick between
       | the latest models?
       | 
       | If not, consider it a product idea ;)
        
         | pmmucsd wrote:
         | There are plugins for various IDEs that operate like copilot
         | but let you select model you want to use, just supply your key.
         | CodeGPT for JetBrains/Android Studio is pretty good. I think
         | you can even use a model running locally.
        
         | saturatedfat wrote:
         | Supermaven, but you don't get model choice.
        
         | elmariachi wrote:
         | Cody by Sourcegraph allows you to do this. It doesn't have
         | Codestral yet but probably will soon.
        
           | jdorfman wrote:
           | We are working on it.
        
       | asadm wrote:
       | How do people do infilling these days? In olden times models used
       | to provide a way to provide suffix separately.
        
       | artninja1988 wrote:
       | This is a business model I can get behind. The model is under a
       | non-commercial license, but it's open weights and they have their
       | official API for it
        
       | YetAnotherNick wrote:
       | What's the business model for semi open source models like these?
       | Is it just because they can't be fully closed as they have to
       | then compare with OpenAI. Who would pay for these model if better
       | is available for cheaper from Anthropic or Google.
        
       | mirekrusin wrote:
       | Fifty shades of "open".
        
       | isaacrolandson wrote:
       | Will this run on an M3 48GB?
        
         | piskov wrote:
         | You'll need 44GB just for the weights
         | 
         | By default only 75% of unified memory is available to GPU if
         | you have >36GB. So with 48 total only 36 is available for GPU
         | with is lower than 44.
         | 
         | tldr; without quantization you will not be able to run it.
        
       | Sytten wrote:
       | Is there a vscode extension that could plug any model out there
       | and have a similar experience to copilot. I always want to try
       | them but I cant be bothered to do a whole setup each time.
        
       | gsuuon wrote:
       | How does the Mistral non-production license work for
       | small/hobby/indie projects? Has anyone tried to get approval for
       | that kind of use?
        
       | James_K wrote:
       | > Democratising code
       | 
       | Did yall see what happened when they democratised art? I don't
       | want to have a billion and one AI garbage libraries to sift
       | through before I can find something reliable and human-made. At
       | least the potential for creating horrific political software is
       | slightly lower than with simple images.
        
       | gavin_gee wrote:
       | what the heck is this for, if you can't use it for commercial
       | work?
        
       | ein0p wrote:
       | If I can't use the output of this in practical code completion
       | use cases, it's meaningless, because GH Copilot exists. Idk what
       | they're thinking or what business model they're envisioning -
       | Copilot is far and away the best model of this kind anyway
        
       ___________________________________________________________________
       (page generated 2024-05-29 23:00 UTC)