[HN Gopher] Google CodeGemma: Open Code Models Based on Gemma [pdf]
___________________________________________________________________
Google CodeGemma: Open Code Models Based on Gemma [pdf]
Author : tosh
Score : 145 points
Date : 2024-04-09 12:32 UTC (10 hours ago)
(HTM) web link (storage.googleapis.com)
(TXT) w3m dump (storage.googleapis.com)
| tosh wrote:
| It's really sad that Cursor does not support local models yet
| (afaiu they fetch the URL you provide from their server). Is
| there a VS Code plugin or other editor that does?
|
| With models like CodeGemma and Command-R+ it makes more and more
| sense to run them locally.
| sp332 wrote:
| According to https://forum.cursor.sh/t/support-local-
| llms/1099/7 the Cursor servers do a lot of work in between your
| local computer and the model. So porting all that to work on
| users' laptops is going to take a while.
| ericskiff wrote:
| I've been playing with Continue:
| https://github.com/continuedev/continue
| tosh wrote:
| ty for the pointer!
| tosh wrote:
| Already on ollama: https://ollama.com/library/codegemma
| ado__dev wrote:
| Cody supports local inference with Ollama for both Chat and
| Autocomplete. Here's how to set it up:
| https://sourcegraph.com/blog/local-chat-with-ollama-and-cody :)
| tosh wrote:
| ty for the pointer!
| sheepscreek wrote:
| Download the model weights here (PyTorch, GGUF):
|
| https://huggingface.co/collections/google/codegemma-release-...
|
| I am really liking the Gemma line of models. Thoroughly impressed
| with the 2B and 7B non-code optimized variants. The 2B especially
| packs a lot of punch. I reckon its quality must be at par with
| some older 7B models, and it runs blazing fast on Apple Silicon -
| even at 8 bit quantization.
| tosh wrote:
| Gemma 2b instruct worked well for me for categorization. I
| would also say it felt 7b-ish. Very impressed. The initial
| release left me a bit underwhelmed but 1.1 is better and
| punches above its weight.
|
| Also looking fwd to use 2b models on iOS and Android (even if
| they will be heavy on the battery).
| kolbe wrote:
| My issue so far with the various code assistants isn't the
| quality necessarily, but the ability of them to draw in context
| from the rest of the code base without breaking the bank or
| proving so much info that the middle gets ignored. Are there any
| systems doing that well these days?
| grey8 wrote:
| If I'm not mistaken, this is not on the models itself, but
| rather on the implementation of the addon.
|
| I haven't found an open source VSCode or WebStorm addon yet
| that allows me to use a local model and implements code
| completion and commands as good as GitHub Copilot.
|
| They either miss a chat feature and/or inline action / code
| completion and/or fill-in-the-middle models. And if they do,
| they don't provide the context as intelligently (? an
| assumption!) as GH's Copilot does.
|
| One alternative I liked was Supermaven: It's really really fast
| and has a huge context window, so it knows almost your whole
| project. That was nice! But - one thing I ultimately didn't
| continue using it for: It doesn't support chat and/or inline
| commands (CTRL+I on VSCode's GH Copilot).
|
| I feel like a really good Copilot alternative is definitely a
| still missing.
|
| But: Regarding your question, I think GitHub Copilot's VSCode
| extension is the best - as of now. The WebStorm extension is
| sadly not as good, it misses the "inline command" function
| which IMHO is a must.
| skybrian wrote:
| Could you use one tool for code completion and another for
| chat?
| evilduck wrote:
| Continue.dev allows for this. You can even mix hosted Chat
| options like GPT-4 (via API) with local completion. I
| typically use a smaller model for faster text completion
| and a larger model (with a bigger context) for chat.
|
| https://github.com/continuedev/continue
| mediaman wrote:
| I think most of them allow for that. Works in vscode and
| vscode-derived (e.g., cursor) editors.
| wsxiaoys wrote:
| Checkout tabby: https://github.com/TabbyML/tabby
|
| Blog post on repository context:
| https://tabby.tabbyml.com/blog/2023/10/16/repository-context...
|
| (Disclaimer: I started this project)
| Havoc wrote:
| Continue + deepseek local model has been working reasonably
| well for me
| snovv_crash wrote:
| A RAG system is better than a pure LLM for this usecase IMO.
| kolbe wrote:
| Yeah. This is what I imagined should be how these things
| work. But it is tricky. The system needs to pattern match on
| what types you've been using, if possible. So you need to
| vector search for code to do that. Then you need to vector
| search for the actual dependency source. It's not that
| simple, but would be the ultimate solution.
| viksit wrote:
| a rag system that uses tree sitter vs vector search alone
| would lead to better results (intuitively speaking)? have you
| seen anything like that yet?
| mediaman wrote:
| Seconding Supermaven here, from the guy that made Tabnine.
|
| Supermaven has a 300k token context. It doesn't seem like it
| has a ton of intelligence -- maybe comparable to copilot, maybe
| a bit less -- but it's much better at picking up data
| structures and code patterns from your code, and usually what I
| want is help autocompleting that sort of thing rather than
| writing an algorithm for me (which LLMs often get wrong
| anyway).
|
| You can also pair it with a gpt4 / opus chat in Cursor, so you
| can get your slower but more intelligent chat along with the
| simpler but very fast, high context autocomplete.
| typpo wrote:
| If anyone wants to eval this locally versus codellama, it's
| pretty easy with Ollama[0] and Promptfoo[1]:
| prompts: - "Solve in Python: {{ask}}"
| providers: - ollama:chat:codellama:7b -
| ollama:chat:codegemma:instruct tests: - vars:
| ask: function to return the nth number in fibonacci sequence
| - vars: ask: convert roman numeral to number
| # ...
|
| YMMV based on your coding tasks, but I notice gemma is much less
| verbose by default.
|
| [0] https://github.com/ollama/ollama
|
| [1] https://github.com/promptfoo/promptfoo
| danielhanchen wrote:
| Made Code Gemma 7b 2.4x faster and use 68% less VRAM with Unsloth
| if anyone wants to finetune it! :) Have a Tesla T4 Colab notebook
| with ChatML:
| https://colab.research.google.com/drive/19lwcRk_ZQ_ZtX-qzFP3...
| trisfromgoogle wrote:
| Love to see your work, Daniel -- thank you, as always! Playing
| with the Colab now =). Go Unsloth, and thanks from the Gemma
| team!
| danielhanchen wrote:
| Thanks! :) Appreciate it a lot! :)
___________________________________________________________________
(page generated 2024-04-09 23:01 UTC)