[HN Gopher] StarCoder and StarCoderBase: 15.5B parameter models ...
___________________________________________________________________
StarCoder and StarCoderBase: 15.5B parameter models with 8K context
length
Author : belter
Score : 71 points
Date : 2023-05-15 21:06 UTC (1 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| fbodz wrote:
| Has anyone figured out a way to fine tune this with 24gb of vram?
| I have tried with deepspeed etc but no luck. Seems to be just out
| of reach for fine tuning requiring 26gb.
| csdvrx wrote:
| Have you tried quantization? It's often a cheap and simple way
| to reduce the VRAM requirements.
|
| What hardware are you using? (CPU,RAM,GPU,VRAM)
|
| Have you considered using llama.cpp for a mixed CPU+GPU use (if
| you have enough RAM)
| jimlongton wrote:
| (Possibly naive question) This is marketed as open source. Does
| that mean I can download the model and run it locally? If so,
| what kind of GPU would I need?
| pyrophane wrote:
| Here is a good reference:
|
| https://huggingface.co/docs/transformers/perf_train_gpu_one
| cs702 wrote:
| It's great to see this!
|
| A big THANK YOU to everyone who made it possible.
|
| I'm looking forward to playing with it -- and also, eventually,
| inevitably, running a quantized, super-efficient version on my
| laptop.
| simonw wrote:
| This is trained on The Stack, which is available here:
| https://huggingface.co/datasets/bigcode/the-stack/
|
| Interesting to note that The Stack is 6TB - the whole of the
| RedPajama LLM training set (a lot more than just code) is only
| 2.6TB.
|
| To get an idea what that training data looks like, I grabbed the
| first 300MB SQL file from
| https://huggingface.co/datasets/bigcode/the-stack/tree/main/...
| and then dumped the first 1,000 rows from that into JSON and
| loaded it into Datasette Lite:
|
| https://lite.datasette.io/?json=https://gist.github.com/simo...
|
| Here's a query that shows a random row - hit the blue "Run SQL"
| button to see another one:
| https://lite.datasette.io/?json=https://gist.github.com/simo...
| vlovich123 wrote:
| Something tells me that I haven't trained on 6 TB of code and
| can meaningfully outperform any AI. That tells me that there's
| something still structurally missing about the training
| efficiency. I wonder if this replicates to things like chess/go
| - for a computer trained on the same number of games that a
| human is, is the computer still able to outperform a human?
| mysterydip wrote:
| I wonder how curated the input data is. Just on the surface
| of it, there's a lot of spaghetti code out there that people
| may have shared. I once saw a codebase that used three
| different implementations of a date/time structure and
| overloaded operators to convert between them. Or people
| rolling their own crypto, sort, or random functions,
| reimplementing data structures, etc.
| RangerScience wrote:
| Is this training just to understand code, or is training to
| understand code _and_ language?
|
| (If we're comparing you to the model, is the model starting
| at "baby" or "teenager"?)
| bootloop wrote:
| The biggest interest I have in this, is that I would like to have
| the ability to ask questions about large code-bases. I think
| being able to generate small functions or explain single code
| sections is nice, but being able to ask bigger architectural
| questions would be really helpful for all kind of engineers (in
| particular in a large company).
|
| I have seen approaches with merging context across multiple
| levels. But that can only do so much. Is it viable to fine-train
| a model to a specific code-base so it has knowledge across all
| files? Does anyone have more info on this kind of problem space?
| freeqaz wrote:
| Looks like the model is on HuggingFace here, for anybody that is
| curious to play with it. https://huggingface.co/bigcode/starcoder
| ftxbro wrote:
| Do I need to make an account on huggingface to get the model? I
| would prefer not to do it, and just download a zip like you can
| on github.
| meghan_rain wrote:
| tldr how does it compare to codepilot/gpt4?
| bavell wrote:
| From the summary:
|
| "We perform the most comprehensive evaluation of Code LLMs to
| date and show that StarCoderBase outperforms every open Code
| LLM that supports multiple programming languages and matches or
| outperforms the OpenAI code-cushman-001 model."
|
| So I'd assume not up to par with gpt4 or copilot. Can't wait to
| see it evolve from here!
| [deleted]
| nr2x wrote:
| Given some of my own open source code is no doubt in GPT and
| Bard, which feels wrong given the fees and limitations, I'm VERY
| VERY excited for this!
___________________________________________________________________
(page generated 2023-05-15 23:00 UTC)