[HN Gopher] StableCode
___________________________________________________________________
StableCode
Author : kpozin
Score : 155 points
Date : 2023-08-08 15:25 UTC (7 hours ago)
(HTM) web link (stability.ai)
(TXT) w3m dump (stability.ai)
| runako wrote:
| Is this a "product" that one could install and use or a model
| that one should expect an OEM to integrate into a product before
| programmers can use it? I'm asking because I don't see any links
| that would help me figure out how to try it out.
| nwoli wrote:
| Ctrl-F for "Code for using StableCode Instruct to generate a
| response to a given instruction." and you'll see a super
| straightforward piece of code to copy to test it out for
| generating code
| runako wrote:
| Thanks! The verbiage at the beginning of the announcement
| seems to go out of its way to not call StableCode a "model,"
| which was confusing. By contrast, the recent release of SDXL
| 1.0 is described as a "model" in its announcement.
| yohannparis wrote:
| To be honest, you'd better buy GitHub co-pilot and enjoy the
| productivity boost at a cheap price. Trying to
| download/install/setup/use StableCode is worth it only if you
| want to learn all those steps as well. If what you care is the
| final result, just buy an existing service.
| arcanemachiner wrote:
| I may put all my open source stuff on GitHub, but hell will
| freeze over before I willingly let Microsoft get a whiff of
| my private data, no matter how irrelevant it may be.
|
| GitHub Copilot sounds pretty neat though, I will admit that.
| hmottestad wrote:
| I have bought into co-pilot, but I can't say it's that much
| of a productivity boost. More often than not it recommends
| something completely wrong. I guess it might be more useful
| if I did more spring boot or maybe hibernate.
|
| I've found chat gpt to be more helpful in general. I can
| paste some code in and have a discussion about what I want it
| to fix for me.
| carom wrote:
| Yes, the model is available. However, it just released so no
| one has wrapped it in a plugin yet. I would expect that within
| the month there will be a nicely runnable local version,
| similar to llama2's wrappers.
| cutler wrote:
| Yet another site whose data privacy policy amounts to nothing
| more than an Accept button. Refuse to use such sites.
| capableweb wrote:
| It's a model you download and run yourself, on your own
| hardware. No privacy policy needed.
| barrotes wrote:
| He's referring to the actual website, which doesn't give you
| the option of reject profilation cookies (mandatory in
| Europe). I commented about another website posted here few
| days ago. It gets me mad too
| monlockandkey wrote:
| Any performance metrics?
| nwoli wrote:
| I love stability AI
| sebzim4500 wrote:
| Hard to believe it can work that well when it only has 3B
| parameters, but I'd love to be proven wrong.
| csjh wrote:
| phi-1[0] is only 1.3 billion parameters and performs very well
| in coding tasks - small models have a massive amount of
| potential
|
| [0] - https://arxiv.org/abs/2306.11644
| nwoli wrote:
| Reminder that GPT-2 was considered "too dangerous" to be
| released at just 1.5B weights
| ben_w wrote:
| My memory may be imperfect, but I thought it was more "we
| aren't sure and we want to promote a culture of safety"
| rather than "this is definitely unsafe... oh wait never
| mind"?
| arugulum wrote:
| It's actually even less remarkable than that. It was an
| experiment in having a limited release, to shift the field
| toward a different release convention.
|
| > Nearly a year ago we wrote in the OpenAI Charter: "we
| expect that safety and security concerns will reduce our
| traditional publishing in the future, while increasing the
| importance of sharing safety, policy, and standards
| research," and we see this current work as potentially
| representing the early beginnings of such concerns, which
| we expect may grow over time.
|
| > This decision, as well as our discussion of it, is an
| experiment: while we are not sure that it is the right
| decision today, we believe that the AI community will
| eventually need to tackle the issue of publication norms in
| a thoughtful way in certain research areas.
|
| > We will further publicly discuss this strategy in six
| months.
|
| https://openai.com/research/better-language-models
| thewataccount wrote:
| I was impressed enough by replit's 2.7B model that I'm
| convinced it's doable. I have a 4090 and consider that the "max
| expected card for a consumer to own".
|
| Also exllama doesn't support non-llama models and the creator
| doesn't seem interested in adding support for wizardcoder/etc.
| Because of this, using the alternatives are prohibitively slow
| to use a quantized 16B model on a 4090 (if the exllama author
| reads this _please_ add support for other model types!).
|
| 3B models with refact are pretty snappy with Refact, about as
| fast as github copilot. The other benefit is more context
| space, which will be a limiting factor for 16B models.
|
| tl;dr - I think we need ~3B models if we want any chance of
| consumer hardware to reasonably run coding models akin to
| github copilot with decent context length. And I think it's
| doable.
| eyegor wrote:
| I'm fairly confident a coding specific model should be a lot
| smaller - 3b should be plenty if not 1b or less. As it
| stands, there are quite a few 7-13b model sizes that can
| predict natural language quite well. Code seems at its
| surface a much simpler language, strict grammars, etc so I
| wouldn't think it needs to be anywhere near as large as the
| nlp models. Right now people are retraining nlp models to
| work with code, but I think the best code helper models in
| the future will be trained primarily on code and maybe fine
| tuned on some language. I'm thinking less of a chat bot api
| and more of a giant leap in "intellisense" services.
| gsuuon wrote:
| I'd really like to see smaller models trained on only one
| specific language, with it's own language specific
| tokenizer. I imagine the reduction in vocab size would
| translate to handling more context easier?
| thewataccount wrote:
| I think simply having the vocab more code friendly (e.g.
| codex) would make the biggest difference, whitespace is
| the biggest one (afaik every space is a token), but
| consider how many languages continue `for(int i=0;`, `)
| {\n`, `} else {`, 'import ', etc.
|
| My understanding is that a model properly trained on
| multiple languages will beat an expert based system. I
| feel like programming languages overlap, and interop with
| each other enough that I wouldn't want to specialize it
| in just one language.
| johndough wrote:
| > Code seems at its surface a much simpler language
|
| When using GitHub Copilot, I often write a brief comment
| first and most of the time, it is able to complete my code
| faster than if I had written it myself. For my workflow, a
| good code model must therefore also be able to understand
| natural text well.
|
| Although I am not sure to which degree the ability to
| _understand_ natural text and the ability to _generate_
| natural text are related. Perhaps a bit of text generation
| capabilities can be traded off against faster execution and
| fewer parameters.
| GeneralMayhem wrote:
| Understanding should be much easier, for the same reason
| humans (e.g. children, foreign-language learners) can
| always understand more than they can say: human language
| is fairly low-entropy, so if there's a word you don't
| understand, you can pick up most of the meaning from
| context. On the other hand, producing natural-sounding
| language requires knowing _every single_ word you 're
| going to use.
| thorum wrote:
| replit's model is surprisingly good at generating code, even
| at following complex instructions that I was sure would
| confuse it. I have found it's a bit weak on code _analysis_ ,
| for open-ended questions like 'is there a bug anywhere in
| this code?' that GPT-4 can answer.
| politelemon wrote:
| But it does mean, hopefully, it is easier to run on small
| hardware. Making it much more accessible.
| capableweb wrote:
| I had that thought at first too, but then the scope is really
| small (programming) compared to other models (everything) so
| might not be that bad.
| rvz wrote:
| Either way, the race to zero has been further accelerated.
|
| Stability AI, Apple, Meta, etc are clearly at the finish line
| putting pressure on cloud only AI models and cannot raise prices
| or compete with free.
| empath-nirvana wrote:
| Open Source doesn't mean free. It costs a lot of money to run
| models and keep models up to date, and maybe a "good enough"
| model runs relatively cheaply, but there's always going to be a
| "state of the art" that people are willing to pay for.
| _pdp_ wrote:
| Lots of folks out there would rather skip the hassle of running
| their own models, and that's totally understandable. Similarly,
| you've got plenty of folks who'd rather pay for managed hosting
| services instead of dealing with the nitty-gritty of setting up
| everything themselves using free tools. This opens up exciting
| opportunities for successful companies to offer some real perks
| - think convenience, a smoother user experience, and lightning-
| fast speeds, just to name a few! All of these things save time
| and are worth paying for.
| thewataccount wrote:
| > Stability AI, Apple, Meta, etc are clearly at the finish line
|
| I'm very optimistic and expect them to catch up. I've used the
| open models a lot, to be clear they are starting to compare to
| GPT3.5Turbo right now, they can't compete with GPT4 at all.
| GPT4 is almost a year old from when it finished training I
| think?
|
| I expect open source models to stay ~1.5 years behind. That
| said they will eventually be "good enough".
|
| Keep in mind too though that using and scaling GPUs is not
| free. You have to run the models somewhere. Most businesses
| will still prefer a simple api to call instead of managing the
| infrastructure. On top of this many business (medium and
| smaller) will likely find models like GPT4 to be sufficient for
| their workload, and will appreciate the built in "rails" for
| their specific usecases.
|
| tl;dr - open models don't even compare to GPT4 yet (I use them
| all daily), they aren't free to run, and a API option is still
| preferably to a massive if not most companies.
| nwoli wrote:
| > Keep in mind too though that using and scaling GPUs is not
| free. You have to run the models somewhere.
|
| Long or medium term these will probably be dirt cheap to just
| run in the background though. It might be within 3-5 years
| since parallel compute is still growing and isn't as bounded
| by moores law stagnation
| thewataccount wrote:
| I get decent performance with my 4090, enough that LLMs
| with exllama at 30B quantitized are very usable. But we're
| severely VRAM limited, especially on lower end hardware
| which rarely sees > 10GB of VRAM.
|
| I don't know how much slower it could be and still be
| useful though. The big thing is we need more VRAM, 30B is
| context length limited with only 24GB of vram, I've only
| barely made it above 3.2k tokens before running out.
|
| I hope you're right, that it becomes common for systems to
| have either dedicated TPU type stuff similar to
| smartphones, and that they absolutely load the crap with
| VRAM (which I don't think is even that expensive?)
|
| Models will also get smaller but I'm skeptical we'll get
| GPT4 performance with any useful context length under 24GB
| VRAM any time soon.
| RomanPushkin wrote:
| Is it good at algos?
|
| From interviews:
|
| Implement queue that supports three methods:
|
| * push
|
| * pop
|
| * peek(i)
|
| peek returns element by its index. All three methods should have
| O(1) complexity [write code in Ruby].
|
| ChatGPT wasn't able to solve that last time I tried
| https://twitter.com/romanpushkin/status/1617037136364199938
| thewataccount wrote:
| I can't seem to find a demo, if anyone has a chance to test it,
| how does it compare to replit and wizardcoder?
| james2doyle wrote:
| Looks like there is one on the Hugging Face page:
| https://huggingface.co/stabilityai/stablecode-instruct-alpha...
|
| Not very promising based on this lame test
| politelemon wrote:
| I ran it locally and it seemed to do better. I switched
| Python to Bash and it also gave a good answer (nproc).
| 3rd3 wrote:
| How does it compare to GitHub Copilot?
| karmasimida wrote:
| On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for
| stable code 3b.
|
| HumanEval is abused but this model is only good for its size,
| it is no match for Copilot ... yet
| UncleOxidant wrote:
| > On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for
| stable code 3b.
|
| Can you put those numbers into context for those who haven't
| done HumanEval? Are those percentages so that 40+ means 40+%
| and 26 is 26%? If so does that imply both would be failing
| scores?
| jstummbillig wrote:
| When they don't voluntarily answer the question, you know the
| answer.
| sebzim4500 wrote:
| It's not easy to compare them, to be fair.
|
| I guess you could come up with a thousand example prompts and
| pay some students to pick which output is better, but I can
| also see why you wouldn't bother. It probably depends on
| language, type of prompt, etc.
| maaaaattttt wrote:
| One could team up with Hackerrank/leetcode, let the model
| code in the interface (maybe there's an API for that
| already, no idea), execute their code verbatim and see how
| many test cases they get right the first time around. Then,
| like for humans, give them a clue about one of the tests no
| passing (or code not working, too slow, etc.). Give points
| based on the difficulty of the question and the number of
| clues needed.
|
| I guess the obvious caveat is that these model are probably
| overfitted on these types of questions. But a specific
| benchmark could be made containing question kept secret for
| models. Time to build "Botrank" I guess.
| erwald wrote:
| Sure it's easy -- you can use benchmarks like HumanEval,
| which Stability did. They just didn't compare to Codex or
| GPT-4. Of course such benchmarks don't capture all aspects
| of an LLM's capabilities, but they're a lot better than
| nothing!
| miohtama wrote:
| The model, source, etc. are available under permissive terms
|
| https://huggingface.co/stabilityai/stablecode-instruct-alpha...
|
| You can "run it locally". Very handy if you do not trust
| automatically sending all your code to someone in the United
| States.
| UncleOxidant wrote:
| Hmmm... so on that hugging face page there's a text box where
| you enter input then you click the 'compute' button.
|
| So I asked it to "Write a python function that computes the
| square of the input number."
|
| And it responds with: def square(x):
|
| Which seems quite underwhelming.
| lolinder wrote:
| > to reproduce, distribute, and create derivative works of
| the Software Products solely for your non-commercial research
| purposes
|
| I wouldn't call these terms permissive. It's in line with the
| recent trend in released AI models, but fairly restrictive in
| what you're actually allowed to do with it.
| coder543 wrote:
| The Completion model appears to place the model weights
| under the Apache 2 license, which is a permissive license:
| https://huggingface.co/stabilityai/stablecode-completion-
| alp...
|
| The Instruct model has that non-commercial restriction, but
| I'm not sure why. They say it was trained with Alpaca-
| formatted questions and responses, but I'm not sure if that
| includes the original Alpaca dataset.
___________________________________________________________________
(page generated 2023-08-08 23:01 UTC)