[HN Gopher] Yi-Coder: A Small but Mighty LLM for Code
       ___________________________________________________________________
        
       Yi-Coder: A Small but Mighty LLM for Code
        
       Author : crbelaus
       Score  : 237 points
       Date   : 2024-09-05 03:38 UTC (19 hours ago)
        
 (HTM) web link (01-ai.github.io)
 (TXT) w3m dump (01-ai.github.io)
        
       | smcleod wrote:
       | Weird they're comparing it to really old deepseek v1 models, even
       | v2 has been out a long time now.
        
         | butterfly42069 wrote:
         | My exact thoughts, especially because DeepseekV2 is meant to be
         | a massive improvement.
         | 
         | It seems to be an emerging trend people should look out for
         | that model release sheets often contain comparisons with out of
         | date models and don't inform so much as just try to make the
         | model look "best."
         | 
         | It's an annoying trend. Untrustworthy metrics betray
         | untrustworthy morals.
        
         | bubblyworld wrote:
         | My barely-informed guess is that they don't have the resources
         | to run it (it's a 200b+ model).
        
           | regularfry wrote:
           | They could compare to DeepSeek-Coder-V2-Lite-Instruct. That's
           | a 16B model, and it comes out at 24.3 on LiveCodeBench. Given
           | the size delta they're respectably close - they're only just
           | behind at 23.4. The full V2 is way ahead.
        
           | smcleod wrote:
           | That's for the larger model, most people running it locally
           | use the -lite model (both of which has lots of benchmarks
           | published)
        
       | theshrike79 wrote:
       | > Continue pretrained on 2.4 Trillion high-quality tokens over 52
       | major programming languages.
       | 
       | I'm still waiting for a model that's highly specialised for a
       | single language only - and either a lot smaller than these jack
       | of all trades ones or VERY good at that specific language's
       | nuances + libraries.
        
         | wiz21c wrote:
         | If the LLM training makes the LLM generalize things _between_
         | languages, then it is better to leave it like it is...
        
         | richardw wrote:
         | I'd be interested to know if that trade off ends up better.
         | There's probably a lot of useful training that transfers well
         | between languages, so I wouldn't be that surprised if the extra
         | tokens helped across all languages. I would guess a top quality
         | single language model would need to be very well supported, eg
         | Python or JavaScript. Not, say, Clojure.
        
         | rfoo wrote:
         | An unfortunate fact is, similar to human with infinite time,
         | LLMs usually have better performance on your specific langauge
         | when they are not limited to learn or over-sample one single
         | language. Not unlike the common saying "learning to code in
         | Haskell makes you a better C++ programmer".
         | 
         | Of course, this is far from trivial, you don't just add more
         | data and expect it to automatically be better for everything.
         | So is time management for us mere mortals.
        
           | deely3 wrote:
           | > usually have better performance on your specific langauge
           | when they are not limited to learn or over-sample one single
           | language.
           | 
           | Source? Im very curious how learning one language helps model
           | to generate code in language with different paradigms. Java,
           | Markdown, JSON, HTML, Fortran?
        
         | imjonse wrote:
         | Unclear how much of their coding knowledge is in the space of
         | syntax/semantics of a given language and how much in the latent
         | space that generalizes across languages and logic in general.
         | If I were to guess I'd say 80% is in the latter for the larger
         | capable models. Even very small models (like in Karpathy's
         | famous RNN blog) will get syntax right but that is superficial
         | knowledge.
        
         | kamphey wrote:
         | I wonder what those 52 languages are.
        
           | richardw wrote:
           | According to the repo README: 'java', 'markdown', 'python',
           | 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html',
           | 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin',
           | 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml',
           | 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala',
           | 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly',
           | 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell',
           | 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang',
           | 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r',
           | 'prolog', 'verilog'
           | 
           | https://github.com/01-ai/Yi-Coder
        
             | Y_Y wrote:
             | They're playing a dangerous game if they assume that a
             | single language or even family of similar languages is
             | referred to by e.g. "assembly", "shell", "lisp".
             | 
             | (I also note that several of these are markup or config
             | languages which are explicitly not for programming.)
        
         | karagenit wrote:
         | Yep, been waiting for the same thing. Maybe at some point it'll
         | be possible to use a large multilingual model to translate the
         | dataset into one programming language, then train a new smaller
         | model on just that language?
        
           | terminalcommand wrote:
           | Isn't microsoft phi specifically trained for Python? I recall
           | that Phi 1 was advertised as a Python coding helper.
           | 
           | It's a small model trained only by quality sources (ie
           | textbooks).
        
         | mark_l_watson wrote:
         | I get your point. Models that support many dozens of human
         | languages seem not what I personally need because I only speak
         | English.
         | 
         | However, I enjoy using various Lisp languages and I was pleased
         | last night when I set up Emacs + ellama + Ollama + Yi-Coder. I
         | experimented with Cursor last weekend, and it was nice for
         | Python, not so great for Common Lisp.
        
         | rty32 wrote:
         | I don't know if that will happen, but there are tools that at
         | least _try_ to improve performance for specific languages,
         | especially  "underrepresented" languages, e.g.
         | https://sourcegraph.com/blog/enhancing-code-completion-for-r...
        
         | sitkack wrote:
         | The models benefit immensely from being trained with more data
         | from other languages, even if you only ever use it in one.
         | 
         | You could finetune it on your codebases and specific docs for
         | added perf.
        
       | mythz wrote:
       | Claude 3.5 Sonnet still holds the LLM crown for code which I'll
       | use when wanting to check the output of the best LLM, however my
       | Continue Dev, Aider and Claude Dev plugins are currently
       | configured to use DeepSeek Coder V2 236B (and local ollama
       | DeepSeek Coder V2 for tab completions) as it offers the best
       | value at $0.14M/$0.28M which sits just below Claude 3.5 Sonnet on
       | Aider's leaderboard [1] whilst being 43x cheaper.
       | 
       | [1] https://aider.chat/docs/leaderboards/
        
         | dsp_person wrote:
         | DeepSeek sounds really good, but the terms/privacy policy look
         | a bit sketch (e.g. grant full license to use/reproduce inputs
         | and outputs). Is there anywhere feasible to spin up the 240B
         | model for a similarly cheap price in private?
         | 
         | The following quotes from a reddit comment here
         | https://www.reddit.com/r/LocalLLaMA/comments/1dkgjqg/comment...
         | 
         | > under International Data Transfers (in the Privacy Policy):
         | """ The personal information we collect from you may be stored
         | on a server located outside of the country where you live. We
         | store the information we collect in secure servers located in
         | the People's Republic of China . """
         | 
         | > under How We Share Your Information > Our Corporate Group (in
         | the Privacy Policy): """ The Services are supported by certain
         | entities within our corporate group. These entities process
         | Information You Provide, and Automatically Collected
         | Information for us, as necessary to provide certain functions,
         | such as storage, content delivery, security, research and
         | development, analytics, customer and technical support, and
         | content moderation. """
         | 
         | > under How We Use Your Information (in the Privacy Policy):
         | """ Carry out data analysis, research and investigations, and
         | test the Services to ensure its stability and security; """
         | 
         | > under 4.Intellectual Property (in the Terms): """ 4.3 By
         | using our Services, you hereby grant us an unconditional,
         | irrevocable, non-exclusive, royalty-free, sublicensable,
         | transferable, perpetual and worldwide licence, to the extent
         | permitted by local law, to reproduce, use, modify your Inputs
         | and Outputs in connection with the provision of the Services.
         | """
        
           | yumraj wrote:
           | There's no company info on DeepSeek's website. Looking at the
           | above, and considering that, it seems very sketchy indeed.
           | 
           | Maybe OK for trying out stuff, a big no no for real work.
        
             | dotancohen wrote:
             | Might be good for contributing to open source projects. But
             | not for clients' projects.
        
             | rfoo wrote:
             | > There's no company info on DeepSeek's website.
             | 
             | It's backed solely by a hedge fund who do not want to draw
             | attention to their business. So yeah, as sketchy as DESRES.
        
             | redeyedtreefrog wrote:
             | The names of their researchers are on this recent paper:
             | https://arxiv.org/pdf/2408.15664
             | 
             | Their terms of service say "The DeepSeek Open Platform is
             | jointly owned and operated by Hangzhou DeepSeek Artificial
             | Intelligence Co., Ltd., Beijing DeepSeek Artificial
             | Intelligence Co., Ltd. "
             | 
             | And they're funded by https://www.high-flyer.cn/en/fund/
             | which the FT did an article on: https://www.ft.com/content/
             | 357f3c68-b866-4c2e-b678-0d075051a...
             | 
             | In terms of the personal data you share when using their
             | models, I can't see why they would be any more or less
             | nefarious than big Western tech companies.
             | 
             | That said, if you're using a model based in China then by
             | providing them with data and feedback you are in a very
             | small way helping researchers in China catch up with/keep
             | up with/overtake researchers in the West. Maybe in the long
             | term that could end badly. And if you are concerned about
             | the environment, it's entirely possible their training and
             | inference is run using coal power stations.
        
           | mythz wrote:
           | It's a 236B MoE model with only 21B active parameters that
           | ollama is reporting having 258k downloads [1] (for 16/236
           | combined) whilst Hugging Face says was downloaded 37k times
           | last month [2], which can run at 25 tok/s on a single M2
           | Ultra [3].
           | 
           | At $0.14M/$0.28M it's a no brainier to use their APIs. I
           | understand some people would have privacy concerns and would
           | want to avoid their APIs, although I personally spend all my
           | time contributing to publicly available OSS code bases so I'm
           | happy for any OSS LLM to use any of our code bases to improve
           | their LLM and hopefully also improving the generated code for
           | anyone using our libraries.
           | 
           | Since many LLM orgs are looking to build proprietary moats
           | around their LLMs to maintain their artificially high prices,
           | I'll personally make an effort to use the best OSS LLMs
           | available first (i.e. from DeepSeek, Meta, Qwen or Mistral
           | AI) since they're bringing down the cost of LLMs and aiming
           | to render the technology a commodity.
           | 
           | [1] https://ollama.com/library/deepseek-coder-v2
           | 
           | [2] https://huggingface.co/deepseek-ai/DeepSeek-
           | Coder-V2-Lite-In...
           | 
           | [3] https://x.com/awnihannun/status/1814045712512090281
        
           | bilekas wrote:
           | > """ The personal information we collect from you may be
           | stored on a server located outside of the country where you
           | live. We store the information we collect in secure servers
           | located in the People's Republic of China . """
           | 
           | Is that even legal in regards to EU users ?
        
             | onli wrote:
             | Of course not.
        
               | yard2010 wrote:
               | Well, good luck prosecuting Winnie the Pooh
        
               | bilekas wrote:
               | Their services will just be blocked in the EU instead..
               | we've seen it in Italy early on with ChatGPT..
        
               | samstave wrote:
               | When will we have token-flow-aware-networking gear...
               | Surely NVIDIA and others are already doing special
               | traffic shaping for tokenFlows?
               | 
               | Whats the current state of such
               | tech/thought/standards/vendors?
        
         | stavros wrote:
         | I'm making a small calendar renderer for e-ink screens
         | (https://github.com/skorokithakis/calumny) which Claude
         | basically wrote all of, so I figured I'd try DeepSeek. I had it
         | add a small circle to the left of the "current day" line, which
         | it added fine, but it couldn't solve the problem of the circle
         | not being shown over another element. It tried and tried, to no
         | avail, until I switched to Claude, which fixed the problem
         | immediately.
         | 
         | 43x cheaper is good, but my time is also worth money, and it
         | unfortunately doesn't bode well for me that it's stumped by the
         | first problem I throw at it.
        
         | jadbox wrote:
         | What's the better plug-in among Continue Dev, Aider and Claude
         | Dev?
        
           | phren0logy wrote:
           | You probably already know that Aider is not a plugin, but
           | just in case - it's a program that runs from the terminal. I
           | think the results are very impressive, and it readily handles
           | multiples source files for context.
        
       | ziofill wrote:
       | Are coding LLMs trained with the help of interpreters?
        
         | willvarfar wrote:
         | Google's Gemini does.
         | 
         | I can't find a post that I remember Google published just after
         | all the ChatGPT SQL generation hype happened, but it felt like
         | they were trying to counter that hype by explaining that most
         | complex LLM-generated code snippets won't actually run or work,
         | and that they were putting a code-evaluation step after the LLM
         | for Bard.
         | 
         | (A bit like why did they never put an old fashioned rules-based
         | grammar checker check stage in google translate results?)
         | 
         | Fast forward to today and it seems it's a normal step for
         | Gemini etc https://ai.google.dev/gemini-api/docs/code-
         | execution?lang=py...
        
           | redeyedtreefrog wrote:
           | That's interesting! Where it says that is will "learn
           | iteratively from the results until it arrives at a final
           | output" I assume it's therefore trying multiple LLM
           | generations until it finds one that works, which I didn't
           | know about before.
           | 
           | However, AFAIK it's only ever at inference time, an
           | interpreter isn't included during LLM training? I wonder if
           | it would be possible to fine tune a model for coding with an
           | interpreter. Though if noone has done it yet there is
           | presumably a good reason why not.
        
             | littlestymaar wrote:
             | > Though if noone has done it yet there is presumably a
             | good reason why not.
             | 
             | The field is vast, moving quickly and there are more
             | directions to explore than researchers working at top AI
             | labs. There's lots of open doors that haven't been explored
             | yet but that doesn't mean it's not worth it, it's just not
             | done yet.
        
       | Havoc wrote:
       | Beats deepseek 33. That's impressive
        
         | tuukkah wrote:
         | They used DeepSeek-Coder-33B-Instruct in comparisons, while
         | DeepSeek-Coder-v2-Instruct (236B) and -Lite-Instruct (16B) are
         | available since a while: https://github.com/deepseek-
         | ai/DeepSeek-Coder-v2
         | 
         | EDIT: Granted, Yi-Coder 9B is still smaller than any of these.
        
       | mtrovo wrote:
       | I'm new to this whole area and feeling a bit lost. How are people
       | setting up these small LLMs like Yi-Coder locally for tab
       | completion? Does it work natively on VSCode?
       | 
       | Also for the cloud models apart from GitHub Copilot, what tools
       | or steps are you all using to get them working on your projects?
       | Any tips or resources would be super helpful!
        
         | cassianoleal wrote:
         | You can run this LLM on Ollama [0] and then use Continue [1] on
         | VS Code.
         | 
         | The setup is pretty simple:
         | 
         | * Install Ollama (instructions for your OS on their website -
         | for macOS, `brew install ollama`)
         | 
         | * Download the model: `ollama pull yi-coder`
         | 
         | * Install and configure Continue on VS Code
         | (https://docs.continue.dev/walkthroughs/llama3.1 <- this is for
         | Llama 3.1 but it should work by replacing the relevant bits)
         | 
         | [0] https://ollama.com/
         | 
         | [1] https://www.continue.dev/
        
         | suprjami wrote:
         | If you have a project which supports OpenAI API keys, you can
         | point it at a LocalAI instance:
         | 
         | https://localai.io/
         | 
         | This is easy to get "working" but difficult to configure for
         | specific tasks due to docs being lacking or contradictory.
        
           | samstave wrote:
           | Can you post screens/configs on how setup success?
           | 
           | Or at least state what you configured toward and how?
        
             | suprjami wrote:
             | The documentation gives a quick start and many examples of
             | integration with OpenAI projects like a chatbot. That's all
             | I did.
        
       | NKosmatos wrote:
       | It would be good if LLMs were somehow packaged in an easy
       | way/format for us "novice" (ok I mean lazy) users to try them
       | out.
       | 
       | I'm not interested so much with the response time (anyone has a
       | couple of spare A100s?), but it would be good to be able to try
       | out different LLMs locally.
        
         | nusl wrote:
         | This is already possible. There are various tools online you
         | can find and use.
        
         | hosteur wrote:
         | You should try GPT4all. It seems to be exactly what you're
         | asking for.
        
         | suprjami wrote:
         | One Docker command if you don't mind waiting minutes for CPU-
         | bound replies:
         | 
         | https://localai.io/
         | 
         | You can also use several GPU options, but they are not as easy
         | to get working.
        
         | PhilippGille wrote:
         | With Mozilla's llamafile you can run LLMs locally without
         | installing anything: https://github.com/Mozilla-Ocho/llamafile
        
         | senko wrote:
         | LM Studio is pretty good: https://lmstudio.ai/
        
         | dizhn wrote:
         | I understand your situation. It sounds super simple to me now
         | but I remember having to spend at least a week trying to get
         | the concepts and figuring out what prerequisite knowledge I
         | would need between a continium of just using chatgpt and
         | learning relevant vector math etc. It is much closer to the
         | chatgpt side fortunately. I don't like ollama per se (because i
         | can't reuse its models with other frontends due to it
         | compressing them in its own format) but it's still a very good
         | place to start. Any interface that lets you download models as
         | gguf from huggingface will do just fine. Don't be turned off by
         | the roleplaying/waifu sounding frontend names. They are all
         | fine. This is what I mostly prefer:
         | https://github.com/oobabooga/text-generation-webui
        
       | cassianoleal wrote:
       | Is there an LLM that's useful for Terraform? Something that
       | understands HCL and has been trained on the providers, I imagine.
        
         | bloopernova wrote:
         | Copilot writes terraform just fine, including providers.
        
           | cassianoleal wrote:
           | Thanks. I should have specified, LLMs that can be run locally
           | is what interests me.
        
         | lasermike026 wrote:
         | Try this, https://ollama.com/jeffrymilan/aiac
        
       | Palmik wrote:
       | The difference between (A) software engineers reacting to AI
       | models and systems for programming and (B) artists (whether it's
       | painters, musicians or otherwise) reacting to AI models for
       | generating images, music, etc. is very interesting.
       | 
       | I wonder what's the reason.
        
         | suprjami wrote:
         | Because code either works or it doesn't. Nobody is replacing
         | our entire income stream with an LLM.
         | 
         | You also need a knowledge of code to instruct an LLM to
         | generate decent code, and even then it's not always perfect.
         | 
         | Meanwhile plenty of people are using free/cheap image
         | generation and going "good enough". Now they don't need to pay
         | a graphic artist or a stock photo licence
         | 
         | Any layperson can describe what they want a picture to look
         | like so the barrier to entry and successful exit is a lot lower
         | for LLM image generation than for LLM code generation.
        
         | rty32 wrote:
         | Coding Assistants are not good enough (yet). Inline suggestions
         | and chats are incredibly helpful and boost productivity (and
         | only to those who know to use them well), but that's as fast as
         | they go today.
         | 
         | If they can take a Jira ticket, debug the code, create a patch
         | for a large codebase and understand and respect all the
         | workarounds in a legacy codebase, I would have a problem with
         | it.
        
           | xvector wrote:
           | Except they can't do the equivalent for art yet either, and I
           | am fairly familiar with the state of image diffusion today.
           | 
           | I've commissioned tens of thousands of dollars in art, and
           | spent many hundreds of hours working with Stable Diffusion,
           | Midjourney, and Flux. What all the generators are missing is
           | _intentionality_ in art.
           | 
           | They can generate something that looks great at surface
           | level, but doesn't make sense when you look at the details.
           | Why is a particular character wearing a certain bracelet? Why
           | do the windows on that cottage look a certain way? What does
           | a certain engraving mean? Which direction is a character
           | looking, and why?
           | 
           | The diffusers do not understand what they are generating, so
           | they just generates what "looks right." Often this results in
           | art that looks pretty but has no deeper logic, world
           | building, meaning, etc.
           | 
           | And of course, image generators cannot handle the client-
           | artist relationship as well (even LLMs cannot), because it
           | requires an understanding of what the customer wants and what
           | emotion they want to convey with the piece they're
           | commissioning.
           | 
           | So - I rely on artists for art I care about (art I will hang
           | on my walls), and image generators for throwaway work (such
           | as weekly D&D campaign images.)
        
             | rty32 wrote:
             | Of course the "art" art -- the part that is all about human
             | creativity -- will always be there.
             | 
             | But lots of people in the art business aren't doing that.
             | If you didn't have midjourney etc, what would you be doing
             | for the throwaway work? Learn to design the stuff yourself,
             | hire someone to do it on Upwork, or just not do it all?
             | Some money likely will exchange hands there.
        
               | xvector wrote:
               | The throwaway work is worth pennies per piece to me _at
               | most._ So I probably wouldn 't do it at all if it wasn't
               | for the generators.
               | 
               | And even when it comes to the generators, I typically
               | just use the free options like open-source diffusion
               | models, as opposed to something paid like Midjourney.
        
           | mrklol wrote:
           | But that's not that far. Like sure, currently it's not. But
           | "reading a ticket with a description, find the relevant code,
           | understand the code (often better than human), test it,
           | return the result" is totally doable with some more
           | iterations. It's already doable for smaller projects, see
           | GitHub workspaces etc.
        
           | viraptor wrote:
           | Have you seen https://www.swebench.com/ ?
           | 
           | Once you engage agentic behaviour, it can take you way
           | further than just the chats. We're already in the "resolving
           | JIRA tickets" area - it's just hard to setup, not very well
           | known, and may be expensive.
        
             | rty32 wrote:
             | Looks like the definition of "resolving a ticket" here is
             | "come up with a patch that ensures all tests pass", which
             | does not necessarily include "add a new test", "make sure
             | the patch is actually doing something meaningful",
             | "communicate how this is fixed". Based on my experience and
             | what I saw in the reports in the logs, a solution could be
             | just hallucinating completely useless code -- as long as it
             | doesn't fail a test.
             | 
             | Of course, it is still impressive, and definitely would
             | help with the small bugs that require small fixes,
             | especially for open source projects that have thousands of
             | open issues. But is it going to make a big difference?
             | Probably not yet.
             | 
             | Also, good luck doing that on our poorly written, poorly
             | documented and under tested codebase. By any standard
             | django is a much better codebase than the one I work on
             | every day.
        
               | viraptor wrote:
               | Some are happy with creating tests as well, but you
               | probably want to mostly write them yourself. I mean, only
               | you know the real world context - if the ticket didn't
               | explain it well enough, LLMs can't do magic.
               | 
               | Actually the poorly documented and poorly written is not
               | a huge issue in my experience. The under tested is way
               | more important if you want to automate that work.
        
             | IshKebab wrote:
             | > We're already in the "resolving JIRA tickets" area
             | 
             | For very simple tasks maybe, but not for the kinds of
             | things I get paid to do.
             | 
             | I don't think it will be able to get to the level of
             | reliably doing difficult programming tasks that require
             | understanding and inferring requirements without having
             | AGI, in which case society has other things to worry about
             | than programmers losing their jobs.
        
         | viraptor wrote:
         | Is it really? I know people who love using LLMs, people who are
         | allergic to the idea of even taking about AI usability and lots
         | of others in between. Same with artists hating the idea,
         | artists who spend hours crafting very specific things with SD,
         | and many in between.
         | 
         | I'm not sure I can really point out a big difference here.
         | Maybe the artists are more skewed towards not liking AI since
         | they work with medium that's not digital in the first place,
         | but the range of responses really feels close.
        
         | crimsoneer wrote:
         | I mean, it's supply and demand right.
         | 
         | - There is a big demand for _really complex_ software
         | development, and an LLM can 't do that alone. So software devs
         | have to do lots of busywork, and like the opportunity to be
         | augmented by AI
         | 
         | - Conversely, there is a huge demand for _not very high level_
         | art. - eg, lots of people want a custom logo or a little
         | jingle, but no many people want to hire a concert pianist or
         | comission the next Salvadore Dali.
         | 
         | So most artists spend a lot of time doing a lot of low level
         | work to pay the bills, while software devs spend a lot of time
         | doing low level code monkey work so they can get to the
         | creative part of their job.
        
         | aithrowaway1987 wrote:
         | Look at who the tools are marketed towards. Writing software
         | involves a lot of tedium, eye strain, and frustration, even for
         | experts who have put in a lot of hours practicing, so LLMs are
         | marketed to help developers make their jobs easier.
         | 
         | This is not the case for art or music generators: they are
         | marketed towards (and created by) laypeople with who want
         | generic content and don't care about human artists. These
         | systems are a significant burden on productivity (and fatal
         | burden on creativity) if you are an honest illustrator or
         | musician.
         | 
         | Another perspective: a lot of the most useful LLM codegen is
         | not asking the LLM to solve a tricky problem, but rather to
         | translate and refine a somewhat loose English-language solution
         | into a more precise JavaScript solution (or whatever),
         | including a large bag of memorized tricks around sorting,
         | regexes, etc. It is more "science than art," and for a
         | sufficiently precise English prompt there is even a plausible
         | set of optimal solutions. The LLM does not have to "understand"
         | the prompt or rely on plagiarism to give a good answer.
         | (Although GPT-3.5 was a horrific F# plagiarist... I don't like
         | LLM codegen but it is far more defensible than music
         | generation)
         | 
         | This is not the case with art or music generators: it makes no
         | sense to describe them as "English to song" translators, and
         | the only "optimal" solutions are the plagiarized / interpolated
         | stuff the human raters most preferred. They clearly don't
         | understand what they are drawing, nor do they understand what
         | melodies are. Their output is either depressing content slop or
         | suspiciously familiar. And their creators have filled the tech
         | community with insultingly stupid propaganda like "they learn
         | art just like human artists do." No wonder artists are mad!
        
           | rahimnathwani wrote:
           | What you say may be true about the simplest workflow: enter a
           | prompt and get one or more finished images.
           | 
           | But many people use diffusion models in a much more
           | interactive way, doing much more of the editing by hand. The
           | simplest case is to erase part of a generated image, and
           | prompt to infill. But there are people who spend hours to get
           | a single image where they want it.
        
             | eropple wrote:
             | This is true, and there's some really cool stuff there, but
             | that's not who most of this is marketed at. Small wonder
             | there's backlash from artists and people who appreciate
             | artists when the stated value proposition is "render
             | artists unemployed".
        
         | tcdent wrote:
         | It's just gatekeeping.
         | 
         | Artists put a ton of time into education and refining their
         | vision inside the craft. Amateur efforts to produce compelling
         | work always look amateur. With augmentation, suddenly the
         | "real" artists aren't as differentiated.
         | 
         | The whole conversation is obviously extremely skewed toward
         | digital art, and the ones talking about it most visibly are the
         | digital artists. No abstract painter thinks AI is coming for
         | their occupation or cares wether it is easier to create anime
         | dreamscapes this year or the next.
        
       | JediPig wrote:
       | I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it
       | started responding about non-sense on a simple write me boto
       | python script that changes x ,y,z value.
       | 
       | Then I tried other questions in my past to compare... However, I
       | believe the engineer who did the LLM, just used the questions in
       | benchmarks.
       | 
       | One instance after a hour of use ( I stopped then ) it answered
       | one question with 4 different programming languages, and answers
       | that was no way related to the question.
        
         | tmikaeld wrote:
         | I have the same experience, hallucinates and rambles on and on
         | about "solutions" that are not related.
         | 
         | Unfortunately, this has always been my experience with all open
         | source code models that can be self-hosted.
        
           | Gracana wrote:
           | It sounds like you are trying to chat with the base model
           | when you should be using a chat model.
        
         | tarruda wrote:
         | Have you ran the model in full FP16? It is possible a lot of
         | performance is lost when running quantized versions.
        
       | Tepix wrote:
       | Sounds very promising!
       | 
       | I hope that Yi-Coder 9B FP16 and Q8 will be available soon for
       | Ollama, right now i only see the 4bit quantized 9B model.
       | 
       | I'm assuming that these models will be quite a bit better than
       | the 4bit model.
        
         | tmikaeld wrote:
         | Click on "View more" in the dropdown on their page, it has many
         | many quantized versions to choose from.
        
       | anotherpaulg wrote:
       | Yi-Coder scored below GPT-3.5 on aider's code editing benchmark.
       | GitHub user cheahjs recently submitted the results for the 9b
       | model and a q4_0 version.
       | 
       | Yi-Coder results, with Sonnet and GPT-3.5 for scale:
       | 77% Sonnet       58% GPT-3.5       54% Yi-Coder-9b-Chat       45%
       | Yi-Coder-9b-Chat-q4_0
       | 
       | Full leaderboard:
       | 
       | https://aider.chat/docs/leaderboards/
        
       | kleiba wrote:
       | What is the recommended hardware to run a model like that locally
       | on a desktop PC?
        
         | tadasv wrote:
         | you can easily run 8b yi coder on 4090 rtx. Probably could do
         | on a smaller gpu (16GB). I have 24gb, and run it through
         | ollama.
        
       | gloosx wrote:
       | Can someone explain these Aider benchmarks to me? They pass same
       | 113 tests through llm every time. Why they then extrapolate
       | ability of llm to pass these 113 basic python challenges to the
       | general ability to produce/edit code? For me it sounds like this
       | or that model is 70% accurate in solving same hundred python
       | training tasks, but why does it mean that it's good at other
       | languages and arbitrary, private tasks as well? Does anyone ever
       | tried to change them test cases or wiggle conditions a bit to see
       | if it will still hit 70%?
        
         | tarruda wrote:
         | It seems this is the problem with most benchmarks, which is why
         | benchmark performance doesn't mean much these days.
        
       | lasermike026 wrote:
       | First look seem good. I'll keep hacking with it.
        
       | smokel wrote:
       | Does anyone know why the sizes of these models are typically
       | expressed in _number_ of weights (i.e 1.5B and 9B in this case),
       | without mentioning the weight size in bytes?
       | 
       | For practical reasons, I often like to know how much GPU RAM is
       | required to run these models locally. The actual number of
       | weights seems to only express some kind of relative power, which
       | I doubt is relevant to most users.
       | 
       | Edit: reformulated to sound like a genuine question instead of a
       | complaint.
        
         | tarruda wrote:
         | Since most LLMs are released as FP16, just the number of
         | parameters is enough to know the total required GPU RAM.
        
         | magnat wrote:
         | Because you can quantize a model e.g. from original 16 bits
         | down to 5 bits per weight to fit your available memory
         | constraints.
        
         | GaggiX wrote:
         | The weight size depends on the accuracy you are running the
         | model at, you usually do not run a model at fp16 as it would be
         | wasteful.
        
       | zeroq wrote:
       | Everytime someone tells how AI 10x his programming capabilities
       | I'm like "tell me you're bad at coding without telling me".
        
         | coolspot wrote:
         | It allows me to move much faster, because I can write a comment
         | describing something more high-level and get plausible code
         | from it to review & correct.
        
       | patrick-fitz wrote:
       | I'd be interested to see how it performs on
       | https://www.swebench.com/
       | 
       | Using SWE-agent + Yi-Coder-9B-Chat.
        
       ___________________________________________________________________
       (page generated 2024-09-05 23:00 UTC)