[HN Gopher] OpenCoder: Open Cookbook for Top-Tier Code Large Lan...
___________________________________________________________________
OpenCoder: Open Cookbook for Top-Tier Code Large Language Models
Author : pil0u
Score : 209 points
Date : 2024-11-09 17:27 UTC (5 hours ago)
(HTM) web link (opencoder-llm.github.io)
(TXT) w3m dump (opencoder-llm.github.io)
| TZubiri wrote:
| What is that "this http URL" thing in the first sentence of the
| abstract?
|
| Is this slob?
| HerrMonnezza wrote:
| arXiv replaces any URL in the text of the abstract with a link
| with text "this http url"; it seems the authors did not know
| this and just embedded a bare URL in their abstract.
| vasco wrote:
| I think it mistook a typo that didn't add a space after a
| sentence.
| johndough wrote:
| I think this is the relevant code: TLDS =
| "[a-z][a-z]+"
|
| https://github.com/arXiv/arxiv-
| base/blob/develop/arxiv/base/...
|
| A more restrictive TLD list would have prevented this, but
| I certainly don't want to be the one to add new TLDs all
| the time, so I can see why the code looks like it does.
| Mathnerd314 wrote:
| Mozilla has a list, https://publicsuffix.org/list/,
| relatively easy to update. I'm sure there is some Python
| wrapper library they could use.
| Retr0id wrote:
| Bad auto-URL-extraction, presumably. The PDF reads:
|
| > Large language models (LLMs) for code have become
| indispensable in various domains, including code generation,
| reasoning tasks and agent systems. While open-access code LLMs
| are increasingly approaching the performance levels of
| proprietary models,
|
| "systems.while" is obviously not a valid domain.
| 4b11b4 wrote:
| while.systems
| atilimcetin wrote:
| Home page of that arxiv paper: https://opencoder-llm.github.io/
| dang wrote:
| Thanks! We've changed to that from
| https://arxiv.org/abs/2411.04905, which is also linked there.
| mistrial9 wrote:
| making a wild guess on the nationality of every author of this
| paper (1), and observing the number of authors, and observing the
| velocity and volume of similar papers.. it seems a pattern of
| "English language as a service to automated programming
| environments" appears to be very useful and relevant for people
| (nations?) that are wholly and firmly not English speaking..
|
| (1) is M-A-P or INFtech dot ai a well-known institutional
| affiliation?
| jstanley wrote:
| What are you trying to say here?
|
| I gave it a few tries but couldn't figure it out.
| jannyfer wrote:
| It seems proofreading-as-a-service would be very useful for
| mistrial9.
| rnewme wrote:
| Keep trying, you might get it.
| bbor wrote:
| To be clear: INFTech is a for-profit (I think...?) firm out of
| Shanghai, and MAP is an international FOSS collective
| (https://m-a-p.ai/about).
|
| Speaking generally, a _lot_ of software engineering worldwide
| is done in English, so it makes sense that they're training
| models in English even if some /most of the researchers also
| speak a Chinese language. Plus, HuggingFace is English-native,
| and working on FOSS models (FOSLMs?) without targeting that
| community would be like making a command line accounting tool
| and not immediately posting it to the HackerNews community.
|
| Your comment seems to imply some sort of hidden motivation, but
| idk, seems pretty straightforwardly benign to me! Plus it's
| hard to say how many papers are published in other languages
| about LLMs, considering we wouldn't read them.
| swyx wrote:
| someone on twitter once referred to these as "wechat papers"
| and i cant get it out of my head
| tontoncyber wrote:
| Interesting paper and work but the model doesn't seems to be
| better than Qwen2.5-Coder in some languages including Ruby.
| deepsquirrelnet wrote:
| I've tried a bunch of different models that are essentially
| different instruction tuning on base models, and that seems to
| be generally true in my experience. I don't think you can fine
| tune your way into a significantly better code model. At best,
| one that can follow instructions better, but not one that can
| usually write noticeably better code or solve harder problems.
| tontoncyber wrote:
| I'm waiting for the 32B!
| https://news.ycombinator.com/item?id=42096027
| johndough wrote:
| I was wondering why Figure 1 showed a HumanEval score of 61.6 for
| Qwen2.5-Coder-7B, but Table 1 shows a score of 88.4, i. e. better
| than this new model with a score of 66.5.
|
| The reason is that those are actually two different models
| (Qwen2.5-Coder-7B-Base with 61.6, Qwen2.5-Coder-7B-Instruct with
| 88.4).
| marmaduke wrote:
| > Unlike most prior efforts, we release not only model weights
| and inference code, but also the reproducible training data,
| complete data processing pipeline, rigorous experimental ablation
| results, and detailed training protocols for open scientific
| research.
|
| Regardless of the specific performance of this model versus
| another model, I think it's good to keep in mind that everyone
| benefits from this kind of work
| 4b11b4 wrote:
| plumbing is important
| hasnain99 wrote:
| nice
| v3ss0n wrote:
| Tested , so much hallucination , cannot hold a candle against
| Qwen 2.5 or even General Purpose model Mistral-Nemo.
| bt1a wrote:
| To be fair, nothing comes close to Qwen2.5 atm
| v3ss0n wrote:
| don't know how they are getting top of Qwen at very poor
| quality via humaneval bench.
| littlestymaar wrote:
| This is something that's obvious to anyone playing with local
| LLMs but that doesn't seem to be that much well-known even
| among tech enthusiast.
|
| Qwen is really ahead of the pack right now when it comes to
| weight-available models.
| drawnwren wrote:
| How does it compare to Claude?
| tomr75 wrote:
| which size are you using?
|
| I don't see why you would use it over claude and 4o-mini
| with cursor unless you are working on a top secret repo
| rnewme wrote:
| Not even deepseek coder 2.5?
| viraptor wrote:
| Not according to the scores here
| https://github.com/QwenLM/Qwen2.5-Coder
| IshKebab wrote:
| What kind of hardware do you need to run this?
| smilebot wrote:
| >Due to the prevalence of forking and copy-pasting within the
| codebase, nearly 75% of files are completely duplicated.
|
| This is surprisingly high. Does the include imported libraries
| and packages? Since you are hashing at the file level, I am not
| fully convinced that this is due to people copying entire files
| over without modification.
___________________________________________________________________
(page generated 2024-11-09 23:00 UTC)