[HN Gopher] Refact LLM: New 1.6B code model reaches 32% HumanEva...
___________________________________________________________________
Refact LLM: New 1.6B code model reaches 32% HumanEval and is SOTA
for the size
Author : kateklink
Score : 154 points
Date : 2023-09-04 16:13 UTC (6 hours ago)
(HTM) web link (refact.ai)
(TXT) w3m dump (refact.ai)
| iFire wrote:
| LICENSE
|
| bigscience-openrail-m
|
| https://huggingface.co/smallcloudai/Refact-1_6B-fim/blob/mai...
| [deleted]
| palmer_fox wrote:
| All these LLMs are pretty general if I understand correctly. Are
| there any efforts to create specialized models (other than for
| coding)? Or, what would be even better, "extract" certain areas
| from existing LLMs as a way to specialize them? With the goal to
| drastically reduce model size to be able to run on less powerful
| devices.
|
| E.g. a model specializing in chemistry doesn't need to include
| data on world's history or to be able to write poetry.
| hnhg wrote:
| I am not an expert but it still has to learn human
| language/grammar/whathaveyou, and that is where scale seems to
| matter. Fine-tuning on a subset of knowledge after that is
| typically how the domain-specialisation is achieved, by my
| understanding.
| charcircuit wrote:
| Domain specialization is done by continuing the full training
| process. Fine tuning is more for changing the style of the
| output than adding new knowledge.
| palmer_fox wrote:
| What if the initial training already contains all necessary
| data for a particular specialization? What would be the
| benefit of continuing the training process?
| viraptor wrote:
| Imagine someone tells you about how someone committed a
| crime and asks you to summarise. Now imagine the same
| question is asked to a lawyer. Even if you both knew the
| same facts, the response would be very different in
| style, highlighted points, mentioned references, etc. The
| domain specific fine tuning does exactly that. Sure,
| sometimes you can get very close by changing the prompt
| to include "respond like a lawyer in situation X with
| following extra rules", but not always and the fine-
| tuning gives better results and shorter prompt.
| palmer_fox wrote:
| I was wondering about that too. Would it be possible in the
| future to have a more modular approach to LLMs? Have a module
| that is responsible for basic knowledge/language/grammar and
| then other more specialized modules that are added
| selectively.
|
| I don't know enough about fine-tuning, not sure if the
| process is capable of removing "unused" parts of the model (I
| guess not possible, similar to un-learning).
| lucubratory wrote:
| There are various methods for removing unused parts of the
| model, like distillation. The idea is generally that you
| always lose performance, but hopefully you lose more
| size/runcost than you do performance, proportionately.
| swyx wrote:
| so, so many. there are RAG specific models (contextual ai),
| finance specific models (bloomberg gpt, brightwave), contact
| center models (cresta), even telco models (anthropic).
| palmer_fox wrote:
| Very interesting. Thanks for replying!
| notsahil wrote:
| Model Stats - Architecture: LLAMA-like model with multi-query
| attention - Objectives Fill-in-the-Middle, Chat - Tokens context:
| 4096 - Pretraining tokens: 1.2T - Finetuning tokens: 40B -
| Precision: bfloat16 - GPUs 64 NVidia A5000 - Training time 28
| days
| [deleted]
| brucethemoose2 wrote:
| One misleading thing is the notion that you need a 1-2B model to
| run on commodity hardware.
|
| This is not really true. Llama 7B runs with Vulkan/llama.cpp on
| ~8GB smartphones and ~12GB laptops. That ease is going to get
| much better over time, as lower RAM hardware starts dropping out
| of the market and the Vulkan implementations get more widespread.
|
| For users trying to run LLMs on 8GB or less machines, the AI
| Horde approach of distributed models seems much more practical
| anyway.
| jmorgan wrote:
| This is true! Although I'm also really excited at the potential
| speed (both for loading the model and token generation) of a 1B
| model for things like code completion.
| naillo wrote:
| 7b runs on my 4gb vram machine (8gb memory). I.e. quantization
| helps a lot too
| smcleod wrote:
| Yeah but I remember thinking to myself every few years that
| surely next year will be the year that base model machines
| start at 32/64/... GB - but alas, it's nearly the end of 2023
| and your average computer still seems stuck on a measly 16GB! I
| don't think average RAM size on consumer machines has increased
| at all in the last 8~ years or so.
| Retric wrote:
| It actually kind of makes sense.
|
| RAM is only about 6x the speed of SSD's for sequential
| access. Most people don't actually need truly random access
| to all that much data rather than streaming video or loading
| video game assets to their GPU. So they shift spending to
| other components like video card, monitors, etc that actually
| provide significant value.
|
| Which is how you get people with 16 GB of system RAM using
| graphics cards that also have 16GB of RAM.
| btown wrote:
| Ah, but have no fear - as lower RAM hardware starts dropping
| out of the market, the RAM usage of Microsoft Teams will
| increase to compensate!
|
| (Not even /s - while the developers of LLM applications may
| have 64GB RAM in their laptops or desktops, the less-technical
| early adopters of LLMs running locally are likely to be power
| users with lower-powered laptops, much more stringent RAM
| limits, and numerous line-of-business applications and browser
| tabs contending for that RAM. Causing those applications to be
| swapped onto disk will almost certainly result in a degraded
| overall experience that could easily be blamed on the LLM
| application itself.)
| nacs wrote:
| Yes, 7B is perfectly usable on low-end hardware if you're using
| it for instruction tuning/chat.
|
| But for code completion in an IDE where it has to react as you
| type, every 100 millisecond delay in response time is
| noticable.
|
| Even with a 24GB GPU, a 7B model doesn't feel snappy enough for
| code-completion in an IDE.
| brucethemoose2 wrote:
| This can be addressed with token streaming and input caching.
|
| Would that be enough? _shrug_
| swyx wrote:
| > the AI Horde approach of distributed models seems much more
| practical anyway.
|
| i wasnt aware this was a term of art. is there a definitive
| blogpost or product explaining this approach?
| ukuina wrote:
| This is a reference to Kobold Horde, a distributed volunteer
| network of GPUs that can be inferenced upon.
| brucethemoose2 wrote:
| ^
|
| I didn't mean to imply splitting llama up between machines
| (though that is a thing with llama.cpp), but a pool of
| clients and servers who make requests and process them:
|
| https://lite.koboldai.net/
|
| A few users with half decent PCs can serve a much larger
| group of people, and the "lesser" hosts can host smaller
| models to "earn" access to larger ones.
| palmer_fox wrote:
| Perhaps the wrong thread to ask this question... Is it not
| possible to load a model on something like an NVMe M.2 drive
| instead of RAM? It's slower of course, but only 5-10x if I
| understand correctly.
| kirill5pol wrote:
| Yes but they're slow enough on normal hardware for that 5-10x
| to be painful...
| igammarays wrote:
| For the sake of not giving Microsoft and a few other tech giants
| immense power over the world, I really do hope the cost and
| efficiency of LLMs improve dramatically, until we can get
| GPT-4-equivalent models trained on a few graphics cards and
| running offline on an iPhone. Really rooting for these kinds of
| projects until someone makes the breakthrough.
| taywrobel wrote:
| You may be interested in what we're working on at Symbolica AI.
|
| We're using formal logic in the form of abstract rewrite
| systems over a causal graph to perform geometric deep learning.
| In theory it should be able to learn the same topological
| structure of data that neural networks do, but using entirely
| discrete operations and without the random walk inherent to
| stochastic gradient descent.
|
| Current experiments are really promising, and assuming the
| growth curve continues as we scale up you should be able to
| train a GPT-4 scale LLM in a few weeks on commodity hardware
| (we are using a desktop with 4 4090's currently), and be able
| to do both inference and continual fine tuning/online learning
| on device.
| paulsutter wrote:
| Especially interested in learning directly on geometries,
| please keep us updated and share results
| taywrobel wrote:
| Would definitely recommend Bronstein et. al's work on
| geometric deep learning! https://geometricdeeplearning.com
|
| That's effectively the right hand side of the bridge that
| we're building between formal logic and deep learning. So
| far their work has been viewed mainly as descriptive,
| helping to understand neural networks better, but as their
| abstract calls out: "it gives a constructive procedure to
| incorporate prior physical knowledge into neural
| architectures and provide principled way to build future
| architectures yet to be invented". That's us (we hope)!
| arthurcolle wrote:
| I would like to subscribe to your newsletter, we'd be super
| interested in this at Brainchain AI.
|
| Drop me a link at (my first name) @ brainchain dot AI if
| you'd like to chat, I'd love to hear more about what you're
| working on!
| dmarchand90 wrote:
| Really cool stuff! Do you have any recommendations of where
| we could learn more?
| krak12 wrote:
| [dead]
| pawelduda wrote:
| Sounds cool, but what are the drawbacks?
| k__ wrote:
| It doesn't exist at scale yet.
| taywrobel wrote:
| Biggest drawback is that since the structure is all
| discrete, it is inherently weak at modeling statistical
| distributions. For example, it'll likely never best a
| neural network at stock market prediction or medical data
| extrapolation.
|
| However, for things that are discrete and/or causal in
| nature, we expect it to outperform deep learning by a wide
| margin. We're focused on language to start, but want to
| eventually target planning and controls problems as well,
| such as self-driving and robotics.
|
| Another drawback is that the algorithm as it stands today
| is based on a subgraph isomorphism search, which is hard.
| Not hard as in tricky to get right like Paxos or other
| complex algorithms; like NP-Hard, so very difficult to
| scale. We have some fantastic Ph.Ds working with us who
| focus on optimization of subgraph isomorphism search, and
| category theorists working to formalize what constraints we
| can relax without effecting the learning mechanism of the
| rewrite system, so we're confident that it's achievable,
| but the time horizon is unknown currently.
| KRAKRISMOTT wrote:
| > _We're using formal logic in the form of abstract rewrite
| systems over a causal graph to perform geometric deep
| learning. In theory it should be able to learn the same
| topological structure of data that neural networks do, but
| using entirely discrete operations and without the random
| walk inherent to stochastic gradient descent._
|
| Abstract rewrite like a computer algebra system's (e.g.
| Wolfram) term rewriting equation simplication method?
| taywrobel wrote:
| Heavily influenced by Wolfram's work on metamathematics and
| the physics project, in so far as using a rewrite system to
| uncover an emergent topology; we're just using it to
| uncover the topology of certain data (assuming that the
| manifold hypothesis is correct), rather than the topology
| of fundamental physics as he did.
| fnordpiglet wrote:
| I think with or without algorithmic advantages hardware will
| improve for local model running. There's an immense amount of
| capital being invested in hardware improvement and that will
| absolutely trickle down.
|
| My sincere belief is that local models is the way of the
| future, with flexible base models adapted via Lora and context
| to specific use cases. I think open source models and
| techniques are inexorable at this point barring some sort of
| regulatory moat and will rival commercial models in all but
| extreme cases.
| adrenvi wrote:
| That could also help tech giants build even larger/more capable
| models cheaply. Ideally there would be a hard ceiling of LLM
| capability that even massive amounts of hardware couldn't
| exceed, allowing inexpensive hardware to catch up.
| a_wild_dandan wrote:
| I personally hope that LLMs have no such limits. The good
| these tools can do is immeasurable.
|
| I can already run Llama 2 @70b on my laptop, and that'll look
| like a quaint old AI artifact in 5-7 years. I think the
| consumer market will keep pace yet stay well below SotA, just
| as it always has. That still leaves plenty of room for
| incredible open-source stuff!
| axpy906 wrote:
| The key in that is models. Per the GPT4 leaked details, it's
| not a a single model but 16 MOE mixture of experts. There's
| probably quite a lot of complexity on the backend in sourcing
| the right model for the right query. In short, it's probably
| better to focus on single models for specific tasks in the OS
| community as evidenced by Code Llama. Having a system like GPT4
| is still difficult to replicate. Getting it to run on a
| consumer hardware for specific tasks like code gen at almost
| GPT4 level is doable.
| og_kalu wrote:
| >There's probably quite a lot of complexity on the backend in
| sourcing the right model for the right query.
|
| This isn't how Sparse MoE models work. There isn't really any
| complexity like that. And different models will or can pick
| each token.
|
| Sparse models aren't an ensemble of models.
| [deleted]
| ttul wrote:
| There are many MoE architectures and I suppose we don't know
| for sure which OpenAI is using. The "selection" of the right
| mix of models is something that a network learns and it's not
| a complex process. Certainly no more complex than training an
| LLM.
| axpy906 wrote:
| When I wrote "backend" was a poor choice of a word. "Meta-
| model" is probably a better choice of wording.
|
| I hope it did not detract too much from the point of
| focusing on subtasks and modalities for FOSS as GPT 4 was
| built on a $163 million budget.
|
| Finally, good point. We've got no idea of what OpenAI's MoE
| approach is and how it works. I went back to Metas 2022
| NLLB-200 system paper and they didn't even publish the
| exact details of the router (gate).
| smoldesu wrote:
| > For the sake of not giving Microsoft and a few other tech
| giants immense power over the world
|
| I agree with and appreciate the sentiment, but it feels _way_
| too late for that. These people do have and exert direct
| control over pretty much all of our digital devices. It 's
| funny (or sad) that we only seem to care about this when shiny
| doodads like AI come around every so-often.
| stainablesteel wrote:
| to be fair, if that is achieved then the massive models that
| tech giants produce will probably be phenomenal
| [deleted]
| flangola7 wrote:
| I don't, how do you maintain control and prevent mass harm in
| that case? I don't see anyway out other than similar
| gatekeeping we apply to ownership and use of high explosives
| and radiological weapon tooling.
|
| At all other times I support tech freedom. I use libre
| software, I use Tor, I donate to privacy and FOSS organizations
| constantly. I only write my software projects under an AGPL
| license. AI is qualitatively different. A world run amok with
| intelligent infinite Sybils is not good for anyone. I hope
| massive compute continues to be necessary, it may be the only
| hard chokepoint we have to keep a handle on the beast.
| Manjuuu wrote:
| Another model that we'll soon forget it ever existed.
| holoduke wrote:
| Whats the difference between 1% and 99% of HumanEval? What does
| it tell really?
| kateklink wrote:
| for pass@1 HumanEval tells how well the model solves a task
| from a set, given only one chance to solve it. It's not the
| perfect metric, there're other like DS-1000, MBPP (we have
| included them on HuggingFace model card). HumanEval is good for
| benchmarking with other models as it gives a fast idea how
| powerful the model is.
| swyx wrote:
| > given only one chance to solve it
|
| my understanding is that there are 2 usages of the
| pass@{number} syntax. the HumanEval/Codex paper interprets
| the {number} as number of attempts[0]. however language
| modelers seem to use it to denote the number of few shot
| example demonstrations given in the context. these are
| starkly different and i wish the syntax wasnt overloaded
|
| ---
|
| [0] https://arxiv.org/pdf/2107.03374.pdf
|
| > Kulal et al. (2019) evaluate functional correctness using
| the pass@k metric, where k code samples are generated per
| problem, a problem is considered solved if any sample passes
| the unit tests, and the total fraction of problems solved is
| reported.
| mholubowski wrote:
| Hey, I have a genuine question:
|
| What is the point of a new model that isn't better than the best
| possible model (example: OpenAI GPT-4)?
|
| What's the point in having a smaller model? Who cares?
|
| ---
|
| This is a real, genuine question that I don't have a clear answer
| to. Excuse my ignorance, plz enlighten your boi.
| notsylver wrote:
| IMO, the main reasons are (but are definitely not limited to):
|
| - You can fine tune these models for very specific tasks, which
| GPT-4 might not be as good at.
|
| - Open source models are free. You can use them as much as you
| want without worrying about a $xx,xxx bill at the end of the
| month which makes tinkering with them easier.
|
| - Smaller models like this can run on consumer hardware, even
| phones, and can run offline.
|
| - Privacy and not having to abide by a third parties terms. You
| don't have to deal with "As a large language model...",
| especially with uncensored models.
|
| - Tools like jsonformer https://github.com/1rgs/jsonformer are
| not possible with OpenAIs API.
|
| - It's also just really cool, let's be honest.
| tiborsaas wrote:
| Your questions sounds like why do we need Alpine linux when we
| have Ubuntu? Why do we have SQLite when we have Postgres?
|
| I think the point is to reach a baseline of something being
| super lightweight yet still useful that could be production for
| a number of use cases.
| seydor wrote:
| Imagine being on Mars and running on a small PV panel and
| needing to code a bugfix in your oxygen supply system through
| the wire with Microsoft Earth(tm) or something
| TuringNYC wrote:
| The other answers are great, but to add more
|
| - You can run it behind an air-gap, where your systems are
| disconnected from the world.
|
| - You can run it on the edge with low or no internet
| connectivity
|
| - You do not need to worry about breaching geographic data
| restrictions, e.g.: medical data from Country X cannot leave
| Country X
| [deleted]
| yieldcrv wrote:
| 1) people can run a 1.6B model for free on consumer hardware
|
| 2) any model that's run on computational resources you are
| owning or leasing will have more privacy than an explicit cloud
| offering. running completely on your own local hardware will be
| private. this means you don't have to think twice about asking
| the LLM about the proprietary code or information you are
| working on.
|
| 3) smaller models gain the performance improvements from all
| the other improvements in interpreters and quantizing, allowing
| for even more consumer friendly offline use
|
| 4) oh yeah, offline use. could expand use cases to having LLM's
| baked into operating systems directly, including leading phones
|
| 5) showing what's possible, pushing towards the benchmarks of
| the best possible model while using less computational
| resources. this also makes the hosts of the best possible model
| realize that they could either A) be using less computational
| resources and increasing the bandwidth for their users B)
| further improve their own model because of competition.
| Basically if ChatGPT 4 was using similar improvements in
| technology across all areas of reasoning/whatever, there never
| would have been a rate limit on ChatGPT 4.
|
| 6) more demand for other computational resources. Nvidia is
| backordered till maybe Q2 2024 right now. If people realize AMD
| or even their ARM chips can offer same performance with the
| right combination of hardware and software, It alleviates
| pressure on other ventures that want computation power.
| SparkyMcUnicorn wrote:
| You can use it 100% locally, and it doesn't cost anything.
| [deleted]
| yunwal wrote:
| GPT4 is expensive to run, even more expensive to finetune, and
| for all practical purposes can't be run offline (because the
| model is too big to run outside of a huge data center).
| Evaluation latency is also an issue for many usecases, and you
| have to share your query with openai, so you can't run
| sensitive queries. The output is also controlled/censored by
| OpenAI.
|
| Here's a few usecases that I wouldn't want to use OpenAI/GPT
| for
|
| - Advanced autocomplete for texting and private communications
|
| - Querying sensitive document databases like emails
|
| - Traveling in low connectivity areas
|
| - Politically incorrect usecases (generating erotic content for
| example)
|
| List kinda goes on and on
| qeternity wrote:
| > GPT4 is expensive to run, even more expensive to finetune
|
| GPT4 can't even be finetuned at the moment (though I expect
| that to change).
| MichaelBurge wrote:
| It can be finetuned. Bing is a finetuned GPT-4.
| acheong08 wrote:
| Say I want to fine tune a Golang specific model. How much $ and
| effort would I have to put in? Would using this as a base help in
| any way compared to starting from llama?
| OlegKlimov1337 wrote:
| Maybe it makes sense to start from llama-code, not llama :D I
| think golang specific model will not be that different from a
| multi-language model. But it definitely will work better after
| fine tuning on your code. Check out refact self hosting docker
| in a couple of days, finetune will be there soon. It will take
| you 1 GPU and almost no money )
| howon92 wrote:
| Congrats on your achievement! I'm curious about your end goal. Do
| you aim to beat GitHub Copilot's performance and convince devs to
| use Refact for code completion instead of GitHub Copilot? I want
| to understand the motivation behind these different code-
| completion models that are not solely for academic research.
| kateklink wrote:
| we want to help developers who need either on-premise or
| permissive code assistant, copilot has neither of this. We also
| wanted to lower the barriers for self-hosting, so that the
| model is available on most GPUs with just 3GB Ram. Plus making
| the code completions fast and efficient (understanding entire
| context, not just the previous tokens).
| OlegKlimov1337 wrote:
| You can use it in practice, that was the goal of that
| particular model! It's fast, runs on your own hardware if you
| want it to.
| glutamate wrote:
| License text:
| https://drive.google.com/file/d/16NqKiAkzyZ55NClubCIFup8pT2j...
| [PDF]
|
| See last page for restrictions
| Havoc wrote:
| Thanks. That looks pretty relaxed on terms
| lordofgibbons wrote:
| > In any way that violates any applicable national, federal,
| state, local or international law or regulation;
|
| Darn! Foiled again! I was planning on breaking some federal
| laws, but the license says that I can't ;( \s
|
| Open-RAIL license has the be the worst license in existence
| claiming to be "open".
|
| > You shall undertake reasonable efforts to use the latest
| version of the Model.
|
| Plea to folks releasing models: Please stop using this user-
| hostile and deranged license
| Havoc wrote:
| That's an impressive result
|
| The open rail license seems to reference some sort of limitations
| on safety and unethical use but I can't see where in the repo
| that's spelled out precisely what the authors have in mind?
| [deleted]
| vikp wrote:
| This post is misleading, in a way that is hard to do
| accidentally. - They compare the performance of
| this model to the worst 7B code llama model. The base code llama
| 7B python model scores 38.4% on humaneval, versus the non-python
| model, which only scores 33%. - They compare their instruct
| tuned model to non-instruct-tuned models. Instruction tuning can
| add 20% or more to humaneval performance. For example, WizardLM
| 7B scores 55% on humaneval [1], and I've trained a 7B model that
| scores 62% [2]. - For another example of instruction
| tuning, Stablecode instruct tuned benchmarks at 26%, not the 20%
| they cite for the base model [3] - Starcoder, when prompted
| properly, scores 40% on humaneval [4] - They do not report
| their base model performance (as far as I can tell)
|
| This is interesting work, and a good contribution, but it's
| important to compare similar models.
|
| [1] https://github.com/nlpxucan/WizardLM
|
| [2] https://huggingface.co/vikp/llama_coder
|
| [3] https://stability.ai/blog/stablecode-llm-generative-ai-
| codin...
|
| [4] https://github.com/huggingface/blog/blob/main/starcoder.md
| umutisik wrote:
| The title is misleading This model is not "SOTA for the size",
| there are smaller models that do 10-18% better in absolute score.
| The text says it's SOTA "among similar models" where they
| probably compare with other models with permissive licensing.
| mrob wrote:
| "Permissive" usually refers to Free Software or Open Source
| licenses without copyleft requirements. OpenRAIL is a
| proprietary license because it imposes usage restrictions,
| contrary to both the Free Software and Open Source definitions.
| OlegKlimov1337 wrote:
| AFAIK There is only one model that do better, it's phi-1 and
| it's python only, and it does not support fill-in-the-middle so
| you can't really use it.
| umutisik wrote:
| Phi-1-small also scores higher with 350M parameters. It helps
| to be specific about what the comparison is against when
| claiming SOTA.
| ldjkfkdsjnv wrote:
| I dont trust any benchmarks for any LLM thats not coming from FB,
| Google, OpenAI, Anthropic, or Microsoft. These models are so
| dynamic, the simple benchmark numbers never tell the whole story
| of the quality of the model. Take for instance, a recent posting
| by anyscale, claiming their fine tuning of Llama 2 was
| competitive with OpenAI's model. The reality being their fined
| tuned model is basically worthless, and was competitive along a
| single metric/very narrow commoditized task. Its a great way to
| get clicks by posting these metrics though
| breadsniffer01 wrote:
| They could have easily benchmarked with the Spider SQL test set
| but they didn't.
|
| I have a feeling that the more robust models might be the ones
| that don't perform best on benchmarks right away.
| SparkyMcUnicorn wrote:
| The community has fine-tuned some really good llama models
| (much better than llama-chat), but I get what you're saying.
|
| I've been testing the best performing models on the huggingface
| leaderboard lately. Some of them are really impressive, and
| others are so bad that I second guess the prompt format or if
| the benchmarked model is actually the same one I'm testing.
| breadsniffer01 wrote:
| Which models were really bad?
| SparkyMcUnicorn wrote:
| I was keeping track of the good ones, and don't have many
| notes on the bad ones.
|
| I do remember testing "LoKuS" last week and it was quite
| terrible (sometimes gave completely off-topic answers). It
| scored as one of the highest 13B models on the leaderboard
| (~65 average), but appears to be removed now.
| nomel wrote:
| This is the goal of humaneval, correct?
| zcesur wrote:
| tangentially related: refact recently shared 4 bounties worth
| $9,000 to help improve their tech!
|
| https://algora.io/org/smallcloudai/bounties
|
| disclaimer: i'm a cofounder of algora, the platform enabling
| these bounties
| [deleted]
| kateklink wrote:
| We've finished training a new code model Refact LLM which took us
| about a month. The main use-case is for blazing-fast code
| completion with fill-in-the-middle, additionally, the model could
| reply to chat prompts.
|
| It has much better performance than all of the code models of
| similar size, and almost reaches the same HumanEval as Starcoder
| being 10x smaller in size.
|
| With the small size, it can work with most modern GPUs requiring
| just 3GB Ram.
|
| You can try self-hosting it in Refact
| https://github.com/smallcloudai/refact/ and get a local fast
| copilot alternative with decent suggestions.
|
| Weights and model card
| https://huggingface.co/smallcloudai/Refact-1_6B-fim.
|
| We would love to hear your feedback!
| drcongo wrote:
| Is it possible to run it as an LSP so that it can be used in
| editors other than VSCode and JetBrains? (sorry if this
| question is completely mad, my understanding of how these
| things work is extremely limited)
| OlegKlimov1337 wrote:
| Yes, it's coming up in a couple of weeks.
| drcongo wrote:
| Great, thanks. I'll keep an eye out.
| [deleted]
| diminish wrote:
| Does ctransformer
| (https://github.com/marella/ctransformers#supported-models)
| support running refact?
|
| I see that model type "gpt_refact" in
| https://huggingface.co/smallcloudai/Refact-1_6B-fim/blob/mai...
| ALittleLight wrote:
| How does it compare to Copilot? A metric I'd like to see is %
| of proposed completions accepted by a human user. If you had an
| extension that 50% of the time proposed a Copilot extension and
| 50% of the time proposed a Refact extension (blind to the user)
| then you could come up with a metric like this.
| riku_iki wrote:
| > almost reaches the same HumanEval
|
| how can you tell that HumanEval is not leaked to your training
| data in some form?
| mityamitya wrote:
| Hi! We ran LSH filtering over datasets to remove all code
| that can be similar to HumanEval samples.
| riku_iki wrote:
| so, we have to trust your procedure..
___________________________________________________________________
(page generated 2023-09-04 23:00 UTC)