[HN Gopher] Replit's new Code LLM: Open Source, 77% smaller than...
___________________________________________________________________
Replit's new Code LLM: Open Source, 77% smaller than Codex, trained
in 1 week
Author : swyx
Score : 538 points
Date : 2023-05-03 15:19 UTC (7 hours ago)
(HTM) web link (www.latent.space)
(TXT) w3m dump (www.latent.space)
| vrglvrglvrgl wrote:
| [dead]
| protonbob wrote:
| Darn it doesn't look like it has c sharp.
| m3kw9 wrote:
| It just gave me prototypes lol
|
| def sieve_eratosthenes(n):
|
| ##a function to sort 10 numbers def
| bubble_sort(a):
|
| ##a function to sort 10 numbers def
| insertion_sort(a):
|
| ##a function to sort 10 numbers def
| quick_sort(a):
| m3kw9 wrote:
| I left the settings. All I added was ##a function to sort 10
| numbers. Assuming it would complete it like copilot
| ImprobableTruth wrote:
| Did you mess around with the settings? I'm getting a correct
| implementation and since it's deterministic (with default
| settings) it should be the same for you.
| LightMachine wrote:
| Any idea how much it cost to train it and how it was trained?
| heliophobicdude wrote:
| Important distinction that I'm learning today is that not all
| LLMs will be interoperable with each others'
| queries/prompts/inputs.
|
| Code LLM right now is not responding how a Chat LLM would
| respond.
|
| ~~~~~ Hats off to the team on the impressive work!
| eikenberry wrote:
| Is this a Co-pilot like assistant or something more? Co-pilot is
| neat but is basically not much more than an automated snippet
| system. The actual writing of the code is not the part that I
| want help with, I want an AI system that helps me design better
| software systems. Something more akin to program mind mapping
| than some fancy auto-completion system.
| qrio2 wrote:
| I wonder if LLM with something like plantUML would generate
| anything useful
| circuit10 wrote:
| This probably makes a self-hosted and/or local Copilot a lot more
| feasible
| FanaHOVA wrote:
| Yes, something like FauxPilot[0] should be able to use it
| instead of CodeGen
|
| [0] https://github.com/fauxpilot/fauxpilot
| sashank_1509 wrote:
| Replit: I tried to write HMM Forward function code:
|
| ```
|
| # Write a function that performs the forward algorithm of an HMM.
| Ideally it uses numpy vectorization,
|
| # But if you cannot do that it is fine too. Just perform forward
| inference, given HMM parameters and Observations.
|
| # def HMM_forward() : # Write your code here
| # Hint: # - Remember that HMMs are stochastic, so
| you will need to multiply transition matrices by observation
| probabilities # - Remember that HMMs are stochastic,
| so you will need to multiply emission matrices by observation
| probabilities # - Remember that HMMs are stochastic,
| so you will need to multiply emission matrices by transition
| matrices # - Remember that HMMs are stochastic, so
| you will need to multiply emission matrices by transition
| matrices # - Remember that HMMs are stochastic, so
|
| ``` It gets stuck in an infinite loop. ChatGPT gets it write,
| first shot:
|
| ```
|
| import numpy as np
|
| def forward_algorithm(A, B, pi, obs): """
| Performs the forward algorithm for an HMM.
| Parameters: A: numpy.ndarray, shape (N, N)
| Transition matrix of the HMM, where N is the number of states.
| B: numpy.ndarray, shape (N, M) Emission matrix of
| the HMM, where M is the number of possible observations.
| pi: numpy.ndarray, shape (N,) Initial probability
| distribution over states. obs: numpy.ndarray, shape
| (T,) Sequence of T observations.
| Returns: alpha: numpy.ndarray, shape (T, N)
| Forward probabilities for each state at each time step.
| """ T = obs.shape[0] N = A.shape[0]
| alpha = np.zeros((T, N)) alpha[0] = pi * B[:, obs[0]]
| for t in range(1, T): alpha[t] = np.dot(alpha[t-1],
| A) * B[:, obs[t]] return alpha
|
| ``` OpenAI managed to do the important but extremely hard, they
| moved out of the DL benchmark frame and made something that is
| general purpose useful. Great effort and congrats to Replit team
| though, hopefully they can keep iterating on this and reach
| ChatGPT capabilities someday
| amasad wrote:
| The model is not RLHF'd or instructed. It's an inline
| autocomplete model so it will get confused if you talk it like
| you're talking to a person. Altho it is possible to finetune it
| this way. To get better full function completion try giving it
| the function definition and a descriptive docstring as a
| prompt.
| gowld wrote:
| Can I use repl.it with an external Code LLM, with or without
| paying repl.it for Ghostwriter ?
| amasad wrote:
| Yes we have a robust extension system and some are already
| building alternatives.
| varunkmohan wrote:
| Hi from the Codeium team. It's awesome to hear you are
| allowing other code LLMs to be used on the Replit platform
| (we're big fans)! We'd love to enable our free chrome
| extension on Replit.
| swyx wrote:
| would love to be able to compare codeium vs ghostwriter
| inside replit! (or toggle between them based on known
| strengths or preferences, perhaps by project or by
| filetype)
| amasad wrote:
| Some links:
|
| - Repo: https://github.com/replit/ReplitLM/tree/main/replit-
| code-v1-...
|
| - HuggingFace: https://huggingface.co/replit/replit-code-v1-3b
|
| - Demo: https://huggingface.co/spaces/replit/replit-
| code-v1-3b-demo
|
| - Early benchmark results:
| https://twitter.com/amasad/status/1651019556423598081
|
| A lot about this project was surprising. We knew it was going to
| be good, but didn't expect to be this good -- especially
| surprising was the finetuned performance boost, and the fact that
| the model is decent at language tasks and reasoning (in some
| cases much better than much larger general-purpose models).
|
| It feels like there is a lot more to do with this model, and I
| have a suspicion you can even make a half-decent chatbot (at
| least one focused on code) by finetuning it on conversation
| (and/or instruction) datasets.
|
| Will follow up with a more comprehensive technical report and the
| UL2R version (fill-in-the-middle support).
| letitgo12345 wrote:
| Doesn't the Stack contain HumanEval? So you're basically
| comparing numbers on the pretraining data.
| godelski wrote:
| My favorite line from the HumanEval paper[0]
|
| > It is important for these tasks to be hand-written, since
| our models are trained on a large fraction of GitHub, which
| already contains solutions to problems from a variety of
| sources.
|
| So to answer your question, yes, the evaluation dataset is
| spoiled. You can find such unique and never before seen
| docstrings like
|
| > For a given list of input numbers calculate the Mean
| Absolute Deviation around the mean of this dataset. Mean
| Absolute Deviation is the absolute difference between each
| element and a centerpoint (mean in this case)[1]
|
| And here's a repo I found that is 8 years old[2]. But how
| about a more recent one that is even closer?[3] There's
| plenty more examples[4] (does anyone know how actually limit
| the date to prior to 2021? `pushed:<2021` doesn't work nor
| does using the `created` keyword. Date searching doesn't seem
| to work well).
|
| In essence, we can still use this evaluation method to
| determine how good our model is at doing fuzzy searching.
| Which, mind you, is still a useful thing. But I would be
| careful in concluding that this means the model is good at
| generalizing arbitrary descriptions of code or novel pieces
| of code. That said, one may be able to argue that not many
| lines of code are actually that novel. Still, we need to be
| careful about our conclusions and understand the limitations
| of our metrics (something I am currently deeply troubled by)
|
| [0] https://arxiv.org/abs/2107.03374
|
| [1] https://github.com/openai/code-align-evals-
| data/blob/97446d9...
|
| [2] https://github.com/bertomartin/stat4701/blob/ec2b64f629cb
| bf6...
|
| [3] https://github.com/danielwatson6/hate-speech-
| project/blob/64...
|
| [4] https://github.com/search?q=abs%28x+-+mean%29+for+languag
| e%3...
| godelski wrote:
| (follow-up: Figured this should be a different comment)
|
| I wanted to demonstrate what I said above so I came up with
| some examples of things I think a human would have an easy
| time implementing but might be hard to implement. BUT a key
| part is that I expect these to be in the dataset! I just
| don't expect these to be in hundreds or thousands of
| githubs because they will be uncommon (but not rare). Also,
| we'll pretty much ask for few-liners to give the model the
| biggest advantage we can (errors will compound).
|
| Prompt:
|
| from torch import nn
|
| class LipSwish(nn.Module):
|
| """"
|
| The Swish activation function is defined by a gated linear
| unit,
|
| where the gate is defined by a sigmoid function and
| multiplies the input with
|
| a learnable parameter, beta. Beta is initialized as 0.5.
|
| The Lipswish function normalizes the output by the upper
| bound of 1.1.
|
| """" def __init__(self:
| super().__init__()
|
| Result: Mostly correct but missing the division by 1.1. The
| forward is `return x * F.sigmoid(self.beta * x)`, which is
| Swish (it also assumes we had "import torch" and applied
| type hinting). It did properly set the parameter (this is
| just a 3 liner)
|
| Discussion: The Swish function should be in the dataset and
| is a well known activation function (though beta is not in
| the pytorch version). Despite LipSwish being in the dataset
| (introduced in 2019 from Residual Flows[0]) it is not
| common. I could get the code to generate the swish function
| (initializing beta, and performing the gate) but could not
| get the code to divide the output by 1.1. I would not
| expect a human to have difficulties with this.
|
| Okay, so let's try something else that might be a bit more
| common and older. The same paper uses a concatenated
| activation function, and those aren't "uncommon". CReLU was
| introduced in 2016[1] and there's plenty of concatenated
| activations around since then. The pytorch documentation
| even uses it as an example[2]. There's far more examples of
| CReLU (3k python results for "class CReLU" vs 58 for "class
| LipSwish. Use these numbers as weak hints because search
| sucks and isn't always accurate).
|
| Prompt:
|
| from torch import nn
|
| from torch.nn import functional as F
|
| class CReLU(nn.Module):
|
| """"
|
| Concatenated version of ReLU. The activation is applied to
| both the positive and
|
| negative of our input and the result is concatenated.
|
| """" def __init__(self):
| super().__init__() def forward(self, x):
|
| Result: `return torch.cat([x.clamp(min=0),
| -x.clamp(min=0)], 1)`. This is correct but not the expected
| one-liner result.
|
| Discussion: This was a bit surprising, it didn't use
| functional as we might expect (or hinted). But
| interestingly it will if we change the class name to
| "ConcatenatedReLU". I found exact copies on GitHub with the
| full name (memorization) but the fist page of instances for
| CReLU I found used functional (I did find one that was
| exactly the above code, when adding "clamp" to the search,
| but missing the minus sign. There were plenty of errors in
| CReLU implementations). Interesting side note: CReLU
| continues and defines a function CReLU6 with uses the same
| docstring but clamps with a max of 6 on the positive input
| whereas Concatenated starts to define a convolutional block
| (Conv + BatchNorm + ReLU) called Conv2d.
|
| So we have kinda mixed results, and in both cases these are
| rather odd and probably not what we wanted. We can clearly
| see that there are issues where a human would not have too
| much trouble. There's a big issue in these types of
| problems: we need to memorize a lot of information
| (otherwise we can't even write code or know library calls)
| but too much memorization prevents creativity. There is a
| lot of gray area between the _pure_ "Stochastic
| Parrot"/"Fancy copy machine" vs a generalized intelligence
| (with a broad and flexible definition of intelligence). I'd
| still call them stochastic parrots because to me the
| evidence suggests that we're closer to the memorization
| side than the creation side. But that doesn't mean these
| frameworks aren't useful. We all know a lot of code is
| boiler plate (otherwise we wouldn't have the joke "copy
| paste from SO") and these tools can be very useful for
| that. But I think the utility is highly going to depend on
| what you are coding for and how you code. If you're doing
| standard stuff, this probably has high utility to you and
| can save you a lot of time. The same way writing macros
| does, but this is FAR more powerful. It can also help
| novices a lot. Also, if your main errors are reading
| mistakes (e.g. you're dyslexic) -- this is my largest
| problem -- then this might make things difficult as you
| have a tendency to gloss over text and miss minor errors. I
| also don't think these tools would help if you're a
| researcher or writing optimized or specialized code. These
| differences are probably why we see such differences in
| people's reactions. But it may also be a hint into what
| people do and how they work when we see who raves and who
| rants about these.
|
| [0] https://arxiv.org/abs/1906.02735
|
| [1] https://arxiv.org/abs/1603.05201
|
| [2] https://pytorch.org/docs/stable/generated/torch.nn.ReLU
| .html
|
| Edit: We can also check if code is in the stack[3]. We see
| that [0] is indeed in the dataset so we know there is
| information leakage. Interestingly the exact copy I found
| in the previous comment[4] isn't! (The repo, though the
| user is)
|
| [3] https://huggingface.co/spaces/bigcode/in-the-stack
|
| [4] https://github.com/bertomartin/stat4701/blob/ec2b64f629
| cbbf6...
| amasad wrote:
| Can't find it now but pretty sure BigCode said somewhere they
| explicitly looked for it and removed it. Also subjective
| measure does match up to the benchmark. Our finetuned model
| performed +50% on HumanEval and then when using it felt at
| least that much improved.
| godelski wrote:
| You can view the prompts, solutions, and checks here[0].
| See my sibling comment (to yours) where I quote the Human
| Eval paper and do some more analysis. But I think if you
| look at [0] you'll see that these aren't really unique
| problems and are likely to have large repetitions in the
| dataset. I should add to that comment to include the
| dataset[1] (too late to edit) where they mention that they
| just scrape all of GitHub (Jan 1 2015 - Mar 31 2022). They
| do exact and near de-duplicate but near de-duplication is
| messy.
|
| > We implement near-deduplication in our pre-processing
| pipeline on top of exact deduplication. We first split the
| files into words/tokens based on non-alphanumeric
| characters and remove files with fewer than 10 tokens.
| Next, we compute the MinHash with 256 permutations of all
| documents, and use Locality Sensitive Hashing to find
| clusters of duplicates. We further reduce these clusters by
| ensuring that each file in the original cluster is similar
| to at least one other file in the reduced cluster. We
| consider two files similar when their Jaccard similarity
| exceeds 0.85.
|
| Near-duplicates are still difficult to measure. So we
| should expect duplication, and it should be proportional to
| the number of samples we have (even if the same variance,
| but I'd wager higher variance with larger duplications).
|
| [0] https://github.com/openai/code-align-evals-
| data/tree/97446d9...
|
| [1] https://arxiv.org/abs/2211.15533
| spenczar5 wrote:
| How is this code licensed? I didn't see a license in the repo.
| It looks interesting!
| dgacmu wrote:
| The README indicates:
|
| The base model checkpoint is licensed under the Creative
| Commons license (CC BY-SA-4.0). Under the license, you must
| give credit to Replit, provide a link to the license, and
| indicate if changes were made. You may do so in any
| reasonable manner, but not in any way that suggests that
| Replit endorses you or your use.
| sputknick wrote:
| What does "fine tuning" mean in this context? Does it mean you
| fine-tuned it on a specific code repository, or collection of
| code repositories and then had it do work in those
| repositories?
| amasad wrote:
| Broadly finetuning is any post pretraining training. Most of
| the time it is used in the context of fitting a more narrow
| task. In our case, it was the same training objective as the
| pretraining but meant to be more representative of what
| Replit users like to code. However, we were surprised by how
| well it boosted overall performance. Best guess: it's a)
| novel data and b) the model could take even more training!!
| sanderjd wrote:
| You seem to know your stuff some, so I'll ask you a
| question on this: Are there any good books on all the
| different approaches in this space, or is it all too new
| and fast moving for such a thing?
| spenczar5 wrote:
| How feasible and effective would it be to fine-tune a model
| against an organization's private source code, resulting in
| an "internal" model that knows how to work with that org's
| stuff?
|
| Could you, say, fine-tune the model every week with the
| latest merges? Every hour?
| naderkhalil wrote:
| Finetuning a smaller model leading to better performance
| seems like a significant finding that'll lead to a lot of
| companies fine-tuning their own internal "ChatGPT"s
| pyth0 wrote:
| Finetuning is a relatively quick process. Training the
| base model is the expensive part (can take weeks and huge
| amounts of compute), whereas finetuning usually is only
| on the last few layers and can be done with much less
| resources. You could definitely have a "nightly" finetune
| model that is retrained every day or so.
| titaniczero wrote:
| When you fine-tune it, do you train just the head/last few
| layers or do you also unfreeze the model afterwards and
| retrain the whole model with a very small LR for a few
| epochs?
| WinLychee wrote:
| You can take a network and its weights that someone else
| trained, and use that pretrained network to train on your own
| data, which is likely to be a better starting point than
| random weights.
| newhouseb wrote:
| First - thank you for open sourcing this! It's a real gift to
| the community to have a model intended for "commercial use"
| that's actually licensed as such.
|
| I'd be very interested to hear about the choice/evaluation of
| the ALiBi approach for positional embedding (perhaps in the
| technical report).
|
| My intuition suggests that while this allows for better
| generalizability for longer sequence lengths, it penalizes
| scenarios where an LLM might need to check for things like a
| function signature far away from where the next token is
| generated. My initial testing of this model tracks with this
| intuition but that's by no means a rigorous evaluation.
| ofirpress wrote:
| (I wrote ALiBi) You can read the paper here
| https://arxiv.org/abs/2108.12409
|
| While intuitively it does seem like ALiBi would make it hard
| for the model to attend to things that are far away, in many
| scenarios we've tested with different models trained on
| different datasets, ALiBi _always_ performs better than
| sinusoidal, rotary, and other embedding types, even when we
| 're not using it to extrapolate to longer sequence lengths.
|
| These findings have been confirmed by others, including by
| the BLOOM open source LM project.
| newhouseb wrote:
| Small world!
|
| Thanks for the link (which I've now skimmed beyond the
| abstract). What wasn't obvious to me from the abstract is
| that different attention heads have different penalty
| strengths, so if some prediction task requires long range
| dependencies you might expect one of the less-penalized
| heads to end up specializing. I wonder what would happen if
| the penalty for one head is zero? (The paper suggests this
| might've been tried and just made things worse, but
| unclear)
|
| I must admit that this is a wonderfully elegant (and
| interpretable) way to do this... much more intuitive (to me
| at least, a wannabe practitioner) than all of the trig-
| based embeddings.
| pera wrote:
| Hi there, I have two question:
|
| 1 - Why did you choose Markdown? It seems an odd choice for
| training a model like this.
|
| 2 - Have you tried to train only one single PL and then
| benchmark it against this more general version?
| runnerup wrote:
| They trained on https://huggingface.co/datasets/bigcode/the-
| stack-dedup which is a massive curated dataset accumulated
| from GitHub. Details are here: https://www.bigcode-
| project.org/docs/about/the-stack/
|
| Many of the most-represented "languages" on GitHub are
| actually things like JSON, XML, HTML, CSV, text, markdown,
| YAML, and SVG.
|
| More details from them here: https://blog.replit.com/llm-
| training
| amasad wrote:
| 1- We trained on languages that are most popular on Replit.
| Markdown is important because you need some amount of natural
| language in the data, and it will act as a sort of "natural
| language label" for code.
|
| 2- I like how portable it is being a single small model doing
| a lot of languages. Single code models are an approach that
| models like Salesforce/Codegen did that, but I believe we
| beat (or get very close) to their mono models on benchmarks.
| fuzzythinker wrote:
| Have you thought of finding or creating something like this
| [0]?
|
| I created this as the basis for my origami folding
| descriptive language. I tried to find something similar,
| requirements being both well structured and English-like
| but couldn't find any, so I created it.
|
| The origami folding app will hopefully be out in 2 weeks,
| so you can see how it's used.
|
| [0] https://github.com/fuzzthink/mation-spec
| kir-gadjello wrote:
| Impressive model, thank you for releasing it under a business-
| friendly license!
|
| Have you considered using Google's sparse "scaling transformer"
| architecture as the base? Even at 3B scale it can generate 3-4x
| more tokens per FLOP while being competitive at perplexity with
| a dense transformer. I think OpenAI uses a variant of it in
| their ChatGPT-3.5-Turbo product.
|
| Here is the paper https://arxiv.org/abs/2111.12763 and the
| implementation
| https://github.com/google/trax/blob/master/trax/models/resea...
| if you are interested.
|
| Hope you get to look into this!
| b33j0r wrote:
| Thank you for releasing the weights along with the
| announcement. The posts that made great headlines, but
| "weights are on their way!"
|
| Like why did we even get excited? This? Great work.
| chaxor wrote:
| I don't think it's a business friendly license?
| gbasin wrote:
| Very exciting, thanks for sharing all this
| swyx wrote:
| hi HN! back again with an exclusive deep dive with Replit's head
| of AI. I attended their developer day last week
| (https://twitter.com/swyx/status/1650989632413401089) just
| expecting a regular fundraise announcement and was totally
| shocked when they annoucned their own LLM and also said they
| would open source it. so immediately asked them for a podcast
| interview and this is the result.
|
| my favorite learning is how they are pushing the state of the art
| - openai's HumanEval is the industry standard benchmark for code
| LLMs, but Reza kindly went above and beyond to show how they use
| "AmjadEval" - using coder intuition to capture human preference
| on what output is more helpful to coders (see screenshots
| https://twitter.com/swyx/status/1653791019421569024?s=20)
|
| please AMA!
| FanaHOVA wrote:
| This was a lot of fun to record, and second episode where I get
| an eval question wrong, I'm going to be demoted to bot soon lol
| swyx wrote:
| means you are human! like the rest of us
| marcodiego wrote:
| Sorry, I have to ask this: how does this compare to ChatGPT?
| swyx wrote:
| it doesn't. replit-code-v1-3b is a code LLM, ChatGPT is an
| app on top of LLMs. it compares to OpenAI Codex, a small
| version of which is behind GitHub Copilot.
| cubefox wrote:
| Free ChatGPT is based on code-davinci-002 (GPT-3.5), which
| is used in OpenAI Codex. See
|
| https://platform.openai.com/docs/model-index-for-
| researchers
|
| https://help.openai.com/en/articles/6195637-getting-
| started-...
| mritchie712 wrote:
| It (replit-code-v1-3b) is already quite good at explaining
| code:
|
| Input: below is a SQL statement:
| SELECT CAST(DATE_TRUNC('week', "t1"."TIMESTAMP")
| AS DATE) AS "WEEK_START", COUNT(\*) AS
| "EVENT_COUNT" FROM
| "ANALYTICS"."POSTHOG"."POSTHOG_EVENTS" AS "t1"
| GROUP BY "WEEK_START" ORDER BY
| "WEEK_START" LIMIT 2000 Explain this
| SQL. Respond in JSON format with the following keys:
| TITLE, DESCRIPTION, TABLES JSON response:
|
| output: { "title": "Weekly
| Events Count", "description": "Count of weekly
| events", "tables": [ {
| "name": "POSTHOG_EVENTS", "columns": [
| "WEEK_START", "EVENT_COUNT"
| ] } ] }
| runnerup wrote:
| It's not crucial that it beat ChatGPT this year. That's a
| pretty unattainable goal for a group like Replit. From the
| users POV, none of the current copilots compare favorably
| against ChatGPT, even Microsoft's OpenAI-powered GitHub
| Copilot.
|
| What's important is that they're preparing for the future by
| building all the tooling/UI/UX around coding copilots. This
| way, when costs and feasibility of building ChatGPT-quality
| LLM's drop and multiple open-source models are available,
| Replit has the ability to immediately drop them into their
| production environment. They'll also have the skills and
| systems to finetune any new models and wring extra
| performance out of them.
|
| This is more important to users than it seems at first
| because current UX of things like GitHub Copilot don't allow
| me to use their AI against my codebase the way that I want to
| (the way I use ChatGPT). Right now GitHub Copilot is a
| glorified auto-complete, but I want it to do widespread
| scaffolding, refactoring, and analysis across my whole
| codebase. Microsoft has access to LLM's that can do this
| through their control of OpenAI -- but Microsoft lacks the
| tooling/UI/UX to bring the power of ChatGPT to me as a user
| of VSCode/IntelliJ/PyCharm/Visual Studio.
|
| So if Replit can find more innovative, boundary-pushing ways
| of integrating LLM's, they won't necessarily need the highest
| quality LLM's to produce a superior user experience. It's a
| strong signal that Replit is well-positioned for the future,
| when ChatGPT-like models are democratized.
|
| Hopefully JetBrains is paying attention. They definitely have
| time to wait a bit more (1-2 years?), but not a lot of time.
| JetBrains shouldn't solely rely on Github Copilot plug-in to
| provide their users with LLM's, because it's not clear that
| the user experience of that plug-in will stay competitive
| with the user experience that GitHub Copilot will offer
| directly in VSCode. The IntelliJ/PyCharm plugin may remain
| "just a fancy auto-complete" while VSCode gets more
| interactive workflows.
|
| Future IDE's with LLM integration require novel, smart,
| clever UX typically invented only by very creative people.
|
| It's also worth noting that Replit is not just trying to be
| an IDE -- they're also building a marketplace to buy/sell
| coding work, and establishing a small foothold as a niche
| cloud computing provider.
| ohjfjfk wrote:
| [dead]
| worldsayshi wrote:
| I'm a bit surprised that IP and infosec isn't a much bigger
| part of this discussion.
|
| ChatGPT ought to be a non starter for many use cases where
| data cannot be shared with OpenAI or where the copyright
| situation of the generated output could become too vague.
|
| Having the option of open source models that potentially
| could be self hosted could make those use cases viable.
| furyofantares wrote:
| Whether it beats ChatGPT right now is important to me,
| right now.
|
| I'm very excited about everyone doing work even when
| they're not beating ChatGPT right now, of course.
|
| But how it compares to ChatGPT right now is extremely
| relevant to lots of people.
|
| It's also become very common to vaguely reference OpenAI's
| offerings when announcing new models without saying how
| they actually compare, or only mentioning some small way in
| which it compares favorably.
|
| (Though it seems to often be that some comment from the
| article comparing to OpenAI gets promoted to the title when
| posted on HN, like here.)
| jeron wrote:
| I think this is somewhat of a naive way to look at this.
| Yes, ChatGPT is really good, but they're basically
| completely closed source. A lot of the power of LLMs can
| and will come from open sourced models that anyone can
| dig into the weights and tune it for their use case, as
| well as train and run on their own platform.
| valedan wrote:
| What does this mean for the future of editors like emacs
| and (neo)vim? Right now the Copilot plugin for Neovim works
| pretty much the same as the one for VSCode, but as LLMs get
| integrated more into IDEs and new workflows are built
| around them, will the old-school editors be able to keep
| up? I'm a little worried because I just switched from
| VSCode to Neovim a few months ago!
| davidkunz wrote:
| It'll be great if they'd build language servers via the
| language server protocol, that would be editor agnostic.
| spenczar5 wrote:
| Github Copilot actually works through the language server
| protocol already. Document contents are sent to it and it
| responds with code completions.
| SheinhardtWigCo wrote:
| This could be the dawn of a new day for the old-school
| editors. Not to start any wars here, but I could never
| get the hang of Vim, and that's hardly an unusual
| complaint. But now, free high-quality personalized
| "tuition" just became economically viable.
| runnerup wrote:
| Side note, potentially check out vimtutor, or also
| https://vim-adventures.com/
| mzz80 wrote:
| Neovim already can't keep up by itself. The future of vim
| won't be as a standalone application, but as a plugin
| into other IDEs. The support for Neovim and VSCodeVim
| within VSCode greatly reduces the utility of a standalone
| app for anything other than edits to very small projects.
| soulofmischief wrote:
| vim is a text editor.
| earthboundkid wrote:
| I keep saying that it's obvious that local execution is the
| future of LLMs. Remote execution makes a ton of sense for
| databases, and most web apps are on some level just CRUD
| over a remote DB, so we've all gotten used to the idea that
| in the 21st century a software business should be running
| remote servers... But LLMs don't need to run remotely, and
| they don't especially benefit from running remotely either
| (okay, more training data, but you can batch that and send
| it back asynchronously). The future is local.
| blueboo wrote:
| The future is using the best possible tool to drive your
| work. Won't local models be systematically inferior to
| bigger commercial offerings for the next few years at
| least?
| pmoriarty wrote:
| _" The future is using the best possible tool to drive
| your work"_
|
| Not if that tool is censored, and you need an uncensored
| version to do your work. Or maybe you have privacy
| considerations, or your company policies forbid using
| something hosted remotely or owned by another company,
| etc...
| imoverclocked wrote:
| For shops that want to ensure their own codebase stays
| local, definitely no.
| theaiquestion wrote:
| I think that we'll reach "good enough" - and that the
| commercial offerings won't have much tangible benefit for
| at least simply being "fancy autocomplete".
|
| Currently you don't really use LLMs for designing the
| structure, just completing the implementation, and I
| think that will be very doable locally.
| steve_adams_86 wrote:
| Maybe. I wonder if very narrow, multi-model systems might
| eventually deliver better performance and utility than
| monolithic models like GPT. Rather than pay for access to
| that, you might be better off investing in resources that
| can train and learn on exactly what you're doing, rather
| than something general that is good at a lot of things
| but not incredible at your specific task.
| heliophobicdude wrote:
| Hard to compare them actually. The thing about ChatGPT is the
| chat part. It was trained to interact and respond with human
| conversation. This is more like CodePilot, with code complete
| based off of actual code
| swyx wrote:
| we also did an interview with Varun Mohan of Codeium, which is
| another competing code model trained from complete scratch:
| https://lspace.swyx.io/p/varun-mohan#details
| robby_w_g wrote:
| I recognized the name Replit and couldn't remember why. A quick
| search reminded me: https://news.ycombinator.com/item?id=27424195
| ec109685 wrote:
| This founder has extreme views and full of hyperbole:
| https://twitter.com/amasad/status/1504092244168478728?s=20
| amasad wrote:
| Is this the best you can find? not even top 10 bangers.
| naillo wrote:
| [flagged]
| stephenjayakar wrote:
| this feels like an attempt to hive mind against anything cool
| from this company
| dimgl wrote:
| +1, this is unnecessary.
| aardshark wrote:
| Alternatively, it's called consequences of your actions.
| Don't be surprised if shitty behaviour comes back to bite
| you.
| ibrarmalik wrote:
| I think people are smart enough to receive extra information
| and do whatever they want with that.
| naillo wrote:
| Threatening a guy for making an open source version of replit
| sounds pretty crummy in my eyes.
| robby_w_g wrote:
| I think it's fair to evaluate a company's behavior before
| engaging in business with them. And I personally dislike
| persons in power abusing their position, which is why I
| remembered the company name almost two years later.
|
| I haven't heard of any similar behavior since then, which is
| a good sign. But a reputation can be a hard thing to shake.
| The CEO should have considered that before doing what he did.
| doodlesdev wrote:
| The model is way too small, comparing it to Codex feels
| disingenous. Sure it's 77% smaller, it's also 77% worse.
| Although, it's a cool project nonetheless.
|
| For instance, even this simple snippet generates wrong inline
| completions: // Only return even numbers bigger
| than 10 from the array const arrayFilter = (array) =>
|
| Replit-code-v1: // Only return even numbers
| bigger than 10 from the array const arrayFilter = (array)
| => { return array.filter((item) => item > 10); };
|
| Gets it wrong, returns odd numbers.
|
| Codeium: // Only return even numbers bigger than
| 10 from the array const arrayFilter = (array) => {
| return array.filter((num) => num > 10 && num % 2 === 0);
| };
|
| ChatGPT (GPT-3.5 Turbo) - Code-only, without the rest of the
| completion since it's instruction-tuned: const
| arrayFilter = (array) => { return array.filter(num =>
| num % 2 === 0 && num > 10); }
|
| Not comparable at all. For reference if anyone wants to test I
| ran this through the HuggingFace space using the default
| parameters, ChatGPT through chat.openai.com, and Codeium through
| the VSCodium extension on an empty JavaScript file.
| amasad wrote:
| Interesting. This seems like a weakness of natural language
| understanding. If you rephrase your prompt slightly it would
| get it right. Try: // return even numbers that
| are also more than 10 const arrayFilter = (array) =>
|
| It would do the right thing. The fine-tuned version gets your
| prompt right so maybe it benefited from natural language data.
| Will look more into it.
| doodlesdev wrote:
| That's really interesting, indeed I can reproduce this by
| changing the comment. I also managed to get correct output
| for this sample by renaming the function.
| eevilspock wrote:
| clearly your original comment was unfair.
| SCLeo wrote:
| I agree. Maybe it interpreted it as return the numbers that
| are more than 10 in the given array of even numbers.
|
| For example, if the instruction says "return person objects
| that are at least 20 years old", it might be more reasonable
| to generate:
|
| array.filter(item => item.age >= 20)
|
| as oppose to
|
| array.filter(item => (item instanceof Person) && (item.age >=
| 20))
| johnfn wrote:
| > Sure it's 77% smaller, it's also 77% worse.
|
| Hehe, yeah, imagine saying you made a new programming language
| with 77% less lines of code than Python.
| Zababa wrote:
| Finally, an opportunity to share this
| https://nsl.com/papers/denial.html
| barking_biscuit wrote:
| I didn't get the punchline of this, so I asked GPT-4 to
| explain the punchline. Actually quite amusing.
| [deleted]
| johnfn wrote:
| I'm curious about the downvotes because I thought I was
| just agreeing with OP. Obviously lines of code in a
| programming language repo is no correlate at all to
| quality. It's like the old adage about measuring aircraft
| quality by weight.
| moffkalast wrote:
| Yeah I tried the demo, it wrote some wrong code with comments
| in Chinese. I think I'll pass.
|
| It's a pretty well accepted fact now that bigger LLM = moar
| better without exceptions. I'm not sure why there's a race to
| the bottom of who'll make the most useless model that can run
| everywhere.
| SheinhardtWigCo wrote:
| It seems like every week someone comes out with some version of
| "we can get results similar to OpenAI's API with our model that
| you can run on a Commodore 64!"
|
| And then you dig in, and it's always far behind in some
| important way.
|
| Not hating here, I love the pace of iteration, just not the
| hyperbole.
| barking_biscuit wrote:
| >"we can get results similar to OpenAI's API with our model
| that you can run on a Commodore 64!"
|
| I have felt similar frustrations with statements that feel
| disingenuous too. Thanks for articulating this with such a
| beautifully hilarious metaphor.
| thewataccount wrote:
| I need more time to compare it, the short 128 tokens in the
| demo is a bit rough but -
|
| On first look this seems to blow the current llama based models
| out of the water including the 30B ones.
|
| Pasting what you want + url + example json with no other
| context and it "knows" what the url and the json is for,
| without even telling it.
|
| I'm not even saying it's as good as chatGPT, but this is a
| tenth the size of the best llama models I've seen.
| jeremypress wrote:
| Interesting how this guy has a finance background but knows how
| to code, especially for emerging technologies like LLMs
| ipsum2 wrote:
| Didn't MosaicML do the training for them?
| youssefabdelm wrote:
| title is missing: "trained in 1 week, and like most open source
| LLMs so far... it sucks compared to the closed source
| alternatives"
|
| Great effort of course bla bla bla...
|
| Open source really needs some benchmarking, and up their game
| quality-wise.
|
| And yes I know they're expensive as shit to train... let's not
| keep wasting our money and actually work together, pool our
| resources, to make a GOOD model.
|
| But oh no, everyone wants to put their stamp on it. "Replit did
| this! Look at us!"
| ImprobableTruth wrote:
| This is easy to say, but I think the issue is that getting an
| LLM right isn't easy, so it's not clear who should steward such
| a project. Something like BLOOM shows that even if you have the
| necessary compute, you can still get a model that isn't good.
|
| I think it will take some time for it to be clear who is a
| leader in training open source models (maybe it will be the red
| pajama folks?) and I think they'll get more support after that.
| youssefabdelm wrote:
| Fair point
| Imnimo wrote:
| Tried it out on the HuggingFace demo, with default settings.
|
| Prompt:
|
| >def nth_prime(n):
|
| Completion:
|
| > if n == 1:
|
| > return 2
|
| > if n == 2:
|
| > return 3
|
| > if n == 3:
|
| > return 5
|
| > if n == 4
| GreedClarifies wrote:
| This is amazing work and bravo on to the people working on
| redpajama.
|
| This is fantastic for the world, this means LLMs will not be
| controlled by a couple of companies with the associated rents.
|
| Yes, private LLMs will likely be a couple of years ahead of
| 'free' alternatives, but that's _OK_ , we want to incentivize for
| profit research so long as the services are low priced in time
| (and in this case in short order).
|
| AMAZING WORK.
| m3kw9 wrote:
| Have you even tried it? It's pretty bad
| GreedClarifies wrote:
| But that's _fine_ it can be a year or two behind the state of
| the art. That 's not the point.
|
| The point is that there will be alternatives and that will
| reduce the price in time further increasing the impact of the
| technology.
|
| There was a possible future where only MSFT and maybe GOOG
| and maybe one or two other companies had this technology and
| extracted massive rents.
| laweijfmvo wrote:
| My first reaction was, "why is replit building LLMs," but I
| guess it fits their needs to have one optimized for their use.
| But I wonder, is this the beginning of another wave of "every
| company is an AI company?" Are we going to see a spike in tech
| hiring around AI/LLM, money starting to flow again, etc? And
| how many years until it all blows up and the layoffs start?
| dpflan wrote:
| Finetuning models and LLMs (and any model) is going to a be
| common practice . Each company is its own domain, which
| domain knowledge and data to specialize open sourced models
| or used other models to distill/teach their own proprietary
| model (home grown or modify someone else's).
| swyx wrote:
| to be clear this work is not based on redpajama - though we did
| discuss that in the previous episode
| https://twitter.com/swyx/status/1648080532734087168?s=46&t=9...
| GreedClarifies wrote:
| Oh my bad!
|
| I thought I read that, is it based upon:
|
| https://arxiv.org/abs/2211.15533 (The Stack) ?
| swyx wrote:
| partially. Reza discussed their data pipeline in the
| blogpost that we reference in the show notes
| robertlucas wrote:
| Is there any way to connect these new code focused LLMs into VS
| Code in order to replace Github Copilot?
| hinkley wrote:
| I think that 20 years from now, we'll all be sitting around
| wondering 1) where the fuck are my flying cars, and 2) what were
| they thinking using computers to write code?
|
| And the reason I say this is because these tools are answering a
| question that we haven't asked yet: what common problems need to
| be solved in this programming language, and where do I get code
| to solve that problem?
|
| These LLM modules are basically telling us how to duplicate code,
| and what we need is the opposite: how to stop reinventing the
| wheel for the 100th time.
|
| Instead of writing code for me, tell me if I already have it. If
| I'm writing it, tell me there's a library for that. If I'm a
| library writer, give me suggestions for what libraries are
| missing from the toolkit.
|
| All we've done so far is begun the process of automating the
| production of duplicate code. With absolutely no way to go back
| in time and correct bugs introduced in earlier iterations. We are
| likely, for instance, to see 0 day attacks that affect hundreds
| of applications, but with no simple way to describe which
| applications are affected. That's going to be a first rate
| trainwreck.
| moffkalast wrote:
| Well fwiw, working with GPT 4 it often suggests which libraries
| to use assuming the question allows for it, so it's not like
| everyone's writing everything from scratch.
|
| But libraries and especially frameworks as they are these days
| are also a giant liability more often than not. APIs change for
| no reason, they can be removed from the package manager at any
| moment without warning, people may slip malicious code into
| them past LGTM reviews, have recursive dependencies upon
| dependencies that bloat and slow down your build process, etc.
|
| Sometimes you don't need the to install the entire damn car
| manufacturing plant and dealership it comes with just to get
| that one wheel you needed. And an LLM can just write you the
| code for a very nicely customized wheel in a few seconds
| anyway.
| webnrrd2k wrote:
| I agree -- maybe someday LLMs will give me a the code for a set
| of simple abstractions that are well-matched for the problems I
| currently face. Something like a Pattern Language that was all
| the rage, but, um, better? More objective and pragmatically
| useful. Not galaxy-brain theory.
|
| That's what I really want. But that would also put me out of a
| job.
| seydor wrote:
| > how to stop reinventing the wheel for the 100th time.
|
| The idea of libraries may not have been a good one. It saved
| human time but no library is perfect because no abstraction is
| perfect and this causes unnecessary bloat. It seems tha Nature
| does not use libraries, it uses replication instead, and we can
| now have that too.
| x-shadowban wrote:
| Ha I never wondered what the physical/life version of a
| shared library is until I read your post so thanks for that.
| webnrrd2k wrote:
| You have a point, but I think there are some big trade-
| offs...
|
| Nature uses replication, but it's also horrifically complex
| and we have no real idea about the specifics of how it all
| works, or what to do when many, many things go wrong.
|
| Also, I think nature uses cloning, which I kind of think
| would be called a 'library' in this case, for single-celled
| organisms (archaea and bacteria). In addition many eukaryotic
| organisms can reproduce via cloning under special situations.
|
| I don't know, I'm not really trying to argue one way or the
| other. I'm kinda' thinking out loud here... but I'd like to
| see LLMs used to create really great libraries, or some other
| abstractions, that are easy to use and also understandable.
| It might not happen soon, but I think that there is a lot of
| value in moving things that way.
| hinkley wrote:
| So instead everyone who has to solve a problem has to be an
| expert on that problem, rather than just an informed
| consumer.
| sicariusnoctis wrote:
| Replication does not help in managing completely. That's why
| we use abstractions, even with the problems they have.
| tyingq wrote:
| More tools in the field is great! I tried a few things, and it's
| reasonable, but it does have some quirks that seem to repeat,
| like:
|
| I tried a prompt of: # python function that
| returns a random integer between min and max
|
| And it produced: def random_int(min, max):
| return random.randint(min, max) # define the size of
| the grid n = 5
|
| It doesn't add the needed import statement, and I'm unclear why
| it's "defining the size of the grid".
| radq wrote:
| I've had the issue of generating random code after the
| completion with other models as well; it's due to how the
| models are trained. You need to stop generating when you
| encounter token(s) that indicate you're done - see
| https://huggingface.co/replit/replit-code-v1-3b#post-process...
| agilob wrote:
| I get such unrelated statements from copilot too, not often,
| but a few I remember.
| tyingq wrote:
| Based on the the replies, I tried a different prompt:
| # python script that prints out an integer between min and max
|
| And it did better. Included the import, didn't add unrelated
| code, but did still put the code inside a function.
| circuit10 wrote:
| That's because it's not following instructions like ChatGPT,
| it's just trying to guess that could plausibly come after what
| you put, like Copilot or the old GPT-3 models
| minkzilla wrote:
| and imports are (almost) always at the top of your file not
| with this function
| vharuck wrote:
| I tried the same input, except wrapping it in triple-quotes
| instead of commenting it. So that it would match the
| standard practice for module doc strings. Here's the
| result: """python function that returns a
| random integer between min and max""" return
| random.randint(min, max) def
| gen_random_float(min, max): """python function
| that returns a random float between min and max"""
| return random.uniform(
|
| So, it assumed the triple-quote was a function's doc
| string, despite it not being indented. It then assumes I'll
| want a similar function for floats (I assume it was cut off
| by a token limit).
| jeremyjh wrote:
| Isn't ChatGPT also just generating plausible text that could
| be a response to an instruction?
| circuit10 wrote:
| "that could be a response to an instruction" is the
| critical part here
| travisjungroth wrote:
| Yeah, at their core they're both trying to guess/generate
| what comes next. Differences: Being trained towards
| conversations versus code. Hyperparameters set to stop
| differently. "Wrappers" that form the prompt.
| whimsicalism wrote:
| It's not generating the most likely next word in the 'meta-
| corpora' of all possible discussions similar to the ones it
| has been trained on, it is trying to generate plausible
| text that would be scored well as a helpful assistant - and
| in the process has transferred knowledge acquired from its
| pre-training task.
| amasad wrote:
| LLMs generally but more so small models will keep going and
| generate seemingly unrelated things. On the frontend tools like
| Copilot and Ghostwriter do a lot of things like use stopwords
| or simply not show completions outside a single block.
|
| As for your prompt, it's following your prompt a little too
| closely and generating just the function. You can however
| condition it that this is the start of the program it will do
| the import, e.g. # python function that
| returns a random integer between min and max import
|
| This is in fact a suggestion from OpenAI on best practices for
| prompting called "leading words"
| https://help.openai.com/en/articles/6654000-best-practices-f...
| fswd wrote:
| I can barely keep up with this stuff, but quick question. Is
| there a way to simply change the URL setting of copilot to point
| to this model? Obviously it needs an endpoint, I could hack
| something up, but asking if somebody has already done this? Would
| be nice to cancel my copilot.
| jacobrussell wrote:
| I don't think it's possible to point Copilot to other models. I
| don't think Microsoft would benefit much from that feature. You
| could use existing tools [0] to host your own model which in
| theory could be used by an extension your IDE uses. But I'm not
| sure if an extension like that exists.
|
| [0] https://github.com/oobabooga/text-generation-webui
| circuit10 wrote:
| Of course it's possible, just not officially
|
| See https://github.com/fauxpilot/fauxpilot/blob/main/document
| ati...
| circuit10 wrote:
| There's https://github.com/fauxpilot/fauxpilot but it doesn't
| use this model
| execveat wrote:
| It's nowhere close to Codex/Copilot. Try the demo:
| https://huggingface.co/spaces/replit/replit-code-v1-3b-demo
| tarruda wrote:
| So they are lying on this tweet?
| https://twitter.com/Replit/status/1651344186715803648
| naillo wrote:
| Yep
| tarruda wrote:
| 3 billion parameters. Does that mean I will be able to run on a
| 8gb consumer GPU?
| generalizations wrote:
| Means that once it's incorporated into llama.cpp, you can run
| it on your laptop.
| amasad wrote:
| Hopefully on phones too
| dontwearitout wrote:
| Probably not out of the box but if some of the local deep
| learning wizards get a quantized version working well and
| optimize it a bit, definitely.
| RHab wrote:
| No, I could only get 2.7B to run on 8GB VRam unfortunatly.
| amasad wrote:
| it is 2.7B
| tarruda wrote:
| Loading seems to have worked on my laptop's RTX 3070,
| `nvidia-smi` shows `5188MiB / 8192MiB` in memory usage.
| pera wrote:
| their pytorch_model.bin is 10.4GB
| tarruda wrote:
| I just loaded this on my laptop's RTX 3070 GPU by following
| the instructions here: https://huggingface.co/replit/replit-
| code-v1-3b
|
| I don't know how I can test the model, but it seem loading
| worked. When I run `nvidia-smi` on another terminal, I see
| `5188MiB / 8192MiB` in the memory-usage column.
| swyx wrote:
| you can load it but you cant run inference? whats the
| issue?
| tarruda wrote:
| No issue, I'm simply unfamiliar with python machine
| learning APIs.
|
| I managed to run inference locally by installing the
| requirements and running app.py from the demo:
| https://huggingface.co/spaces/replit/replit-
| code-v1-3b-demo/...
|
| It is very fast on my RTX 3070, VRAM usage goes to ~=
| 6.3GB during inference.
| chaxor wrote:
| This is a bit hard to believe that the system is decent at
| producing code which captures complex ideas and higher level
| structure when the tokens/param value is >30 (it's ~200 here? )
| The 'good' models (meaning having lots of 'knowledge' or
| 'memorization' about the dataset) typically tend to be around 2
| tokens/param and models with decent generation of language with
| less knowledge/memorization are around 30 tokens/param. Perhaps
| the domain allows for this, but due to the fact that the
| linguistic interface on the input is still needed... It's hard to
| believe.
| swyx wrote:
| this kind of critical thinking is exactly what replit is going
| to need for their stated goal of doing whole-app generation.
| right now they only test it on AmjadEval. you... might wanna
| consider joining them to work on it?
| EvgeniyZh wrote:
| Are you saying the less you train the model the better it is?
| I'm confused
| gnramires wrote:
| Tokens/param shouldn't matter more than the total training
| FLOPs, I believe. Clearly if we train a your claimed 'ideal' 2
| tokens/param a very small dataset (not many tokens in the first
| place), it wouldn't have enough data to properly learn the
| relevant languages. Once there is enough data, then it becomes
| a question of model capacity (does it have enough degrees of
| freedom to support the computational structures needed?).
|
| I believe the overparametrization largely helps with
| generalization and reducing overfitting, at 2 tokens/param
| there's much more degrees of freedom than structures that can
| be learned from what I can tell (the extra capacity just
| provides good breathing room for internal structures). But if
| your model has enough capacity, and you can find a good enough
| training method (and you have enough data to learn the task),
| then you should be able to succeed in arbitrary low
| tokens/param, which is good to keep in mind to make efficient
| models.
| waffletower wrote:
| No Clojure. No Julia. No Haskell. No Racket. No Scheme. No Common
| Lisp. No OCaml. And, as much as I despise Microsoft, No C#. No
| F#. No Swift. No Objective-C. No Perl. No Datalog. A glaringly
| lacking choice of languages.
| mclide wrote:
| Despite the lack of examples, it still completes trivial
| clojure like "(defn connect [" and other lisp syntax like
| "(define (hello" which is promising for further refinement
| training on Lisp languages.
| Dayshine wrote:
| C# was available in the dataset they link, and is the most
| glaring ommission by global usage...
| ubertaco wrote:
| I fed it some OCaml and it worked, though the example was
| trivial: type point = { x: int; y : int }
| let manhattan_distance (a: point) (b: point) : int =
|
| which it completed to type point = { x: int;
| y : int } let manhattan_distance (a: point) (b: point)
| : int = abs (a.x - b.x) + abs (a.y - b.y)
|
| ...which is a valid and correct OCaml definition of this
| method:
|
| https://try.ocamlpro.com/#code/type'point'='$4'x:'int;'y':'i...
| ebiester wrote:
| I'm sure that has to do with the dataset available to them.
| runnerup wrote:
| Which is a deduplicated version of this: https://www.bigcode-
| project.org/docs/about/the-stack/
|
| And probably, yes. While it contains 358 programming
| languages, obviously there's a long tail after the 20 most-
| represented languages. Some people might not expect without
| thinking about it for a bit that many of the most-represented
| "languages" are actually things like JSON, XML, HTML, CSV,
| text, markdown, YAML, SVG.
|
| Also note that it won't be able to parse natural language
| nearly as well without additionally being trained on
| something like the LAION dataset, so this version will be
| more of an autocomplete like Copilot rather than something
| which can manifest high level business logic from whole cloth
| like ChatGPT.
| sitkack wrote:
| You could take it and finetune it on a bunch of Lisps, probably
| cost on the order of 50-500 to do that.
| swyx wrote:
| if anyone from MosaicML is reading this, i'd love a guide on
| how to do exactly this!
| abxytg wrote:
| [flagged]
| love2read wrote:
| Can this be used with the copilot plugins for every ide?
| SXX wrote:
| Weak spot which I guess similar to other LLMs. If you mention
| recursion somewhere in comments model sometimes start to
| recursively generate the same lines over and over again.
| user3939382 wrote:
| Unfortunately I'm someone who sometimes can't separate the art
| from the artist. Replit is the company where the founder sent
| these nasty pompous threats to their ex-employee for their
| innocent side project and then tried to double talk his way out
| of it with a bs non-apology when it got exposed in public. I
| won't support Replit or anything they make.
| davidy123 wrote:
| I keep thinking there should be a way to train a copilot against
| just one set of code libraries. I know LLMs require training
| against a lot of text to get their smarts, but is there a way to
| set this up so a model can be created for a specific library by
| anyone, so it could provide open source support via a transformer
| + model? Maybe this would be a better approach than a jack of all
| trades, master of none.
___________________________________________________________________
(page generated 2023-05-03 23:00 UTC)