[HN Gopher] Replit's new Code LLM: Open Source, 77% smaller than...
       ___________________________________________________________________
        
       Replit's new Code LLM: Open Source, 77% smaller than Codex, trained
       in 1 week
        
       Author : swyx
       Score  : 538 points
       Date   : 2023-05-03 15:19 UTC (7 hours ago)
        
 (HTM) web link (www.latent.space)
 (TXT) w3m dump (www.latent.space)
        
       | vrglvrglvrgl wrote:
       | [dead]
        
       | protonbob wrote:
       | Darn it doesn't look like it has c sharp.
        
       | m3kw9 wrote:
       | It just gave me prototypes lol
       | 
       | def sieve_eratosthenes(n):
       | 
       | ##a function to sort 10 numbers                   def
       | bubble_sort(a):
       | 
       | ##a function to sort 10 numbers                   def
       | insertion_sort(a):
       | 
       | ##a function to sort 10 numbers                   def
       | quick_sort(a):
        
         | m3kw9 wrote:
         | I left the settings. All I added was ##a function to sort 10
         | numbers. Assuming it would complete it like copilot
        
         | ImprobableTruth wrote:
         | Did you mess around with the settings? I'm getting a correct
         | implementation and since it's deterministic (with default
         | settings) it should be the same for you.
        
       | LightMachine wrote:
       | Any idea how much it cost to train it and how it was trained?
        
       | heliophobicdude wrote:
       | Important distinction that I'm learning today is that not all
       | LLMs will be interoperable with each others'
       | queries/prompts/inputs.
       | 
       | Code LLM right now is not responding how a Chat LLM would
       | respond.
       | 
       | ~~~~~ Hats off to the team on the impressive work!
        
       | eikenberry wrote:
       | Is this a Co-pilot like assistant or something more? Co-pilot is
       | neat but is basically not much more than an automated snippet
       | system. The actual writing of the code is not the part that I
       | want help with, I want an AI system that helps me design better
       | software systems. Something more akin to program mind mapping
       | than some fancy auto-completion system.
        
         | qrio2 wrote:
         | I wonder if LLM with something like plantUML would generate
         | anything useful
        
       | circuit10 wrote:
       | This probably makes a self-hosted and/or local Copilot a lot more
       | feasible
        
         | FanaHOVA wrote:
         | Yes, something like FauxPilot[0] should be able to use it
         | instead of CodeGen
         | 
         | [0] https://github.com/fauxpilot/fauxpilot
        
       | sashank_1509 wrote:
       | Replit: I tried to write HMM Forward function code:
       | 
       | ```
       | 
       | # Write a function that performs the forward algorithm of an HMM.
       | Ideally it uses numpy vectorization,
       | 
       | # But if you cannot do that it is fine too. Just perform forward
       | inference, given HMM parameters and Observations.
       | 
       | # def HMM_forward() :                   # Write your code here
       | # Hint:               # - Remember that HMMs are stochastic, so
       | you will need to multiply transition matrices by observation
       | probabilities              # - Remember that HMMs are stochastic,
       | so you will need to multiply emission matrices by observation
       | probabilities              # - Remember that HMMs are stochastic,
       | so you will need to multiply emission matrices by transition
       | matrices              # - Remember that HMMs are stochastic, so
       | you will need to multiply emission matrices by transition
       | matrices              # - Remember that HMMs are stochastic, so
       | 
       | ``` It gets stuck in an infinite loop. ChatGPT gets it write,
       | first shot:
       | 
       | ```
       | 
       | import numpy as np
       | 
       | def forward_algorithm(A, B, pi, obs):                   """
       | Performs the forward algorithm for an HMM.
       | Parameters:             A: numpy.ndarray, shape (N, N)
       | Transition matrix of the HMM, where N is the number of states.
       | B: numpy.ndarray, shape (N, M)                 Emission matrix of
       | the HMM, where M is the number of possible observations.
       | pi: numpy.ndarray, shape (N,)                 Initial probability
       | distribution over states.             obs: numpy.ndarray, shape
       | (T,)                 Sequence of T observations.
       | Returns:             alpha: numpy.ndarray, shape (T, N)
       | Forward probabilities for each state at each time step.
       | """              T = obs.shape[0]         N = A.shape[0]
       | alpha = np.zeros((T, N))         alpha[0] = pi * B[:, obs[0]]
       | for t in range(1, T):             alpha[t] = np.dot(alpha[t-1],
       | A) * B[:, obs[t]]              return alpha
       | 
       | ``` OpenAI managed to do the important but extremely hard, they
       | moved out of the DL benchmark frame and made something that is
       | general purpose useful. Great effort and congrats to Replit team
       | though, hopefully they can keep iterating on this and reach
       | ChatGPT capabilities someday
        
         | amasad wrote:
         | The model is not RLHF'd or instructed. It's an inline
         | autocomplete model so it will get confused if you talk it like
         | you're talking to a person. Altho it is possible to finetune it
         | this way. To get better full function completion try giving it
         | the function definition and a descriptive docstring as a
         | prompt.
        
       | gowld wrote:
       | Can I use repl.it with an external Code LLM, with or without
       | paying repl.it for Ghostwriter ?
        
         | amasad wrote:
         | Yes we have a robust extension system and some are already
         | building alternatives.
        
           | varunkmohan wrote:
           | Hi from the Codeium team. It's awesome to hear you are
           | allowing other code LLMs to be used on the Replit platform
           | (we're big fans)! We'd love to enable our free chrome
           | extension on Replit.
        
             | swyx wrote:
             | would love to be able to compare codeium vs ghostwriter
             | inside replit! (or toggle between them based on known
             | strengths or preferences, perhaps by project or by
             | filetype)
        
       | amasad wrote:
       | Some links:
       | 
       | - Repo: https://github.com/replit/ReplitLM/tree/main/replit-
       | code-v1-...
       | 
       | - HuggingFace: https://huggingface.co/replit/replit-code-v1-3b
       | 
       | - Demo: https://huggingface.co/spaces/replit/replit-
       | code-v1-3b-demo
       | 
       | - Early benchmark results:
       | https://twitter.com/amasad/status/1651019556423598081
       | 
       | A lot about this project was surprising. We knew it was going to
       | be good, but didn't expect to be this good -- especially
       | surprising was the finetuned performance boost, and the fact that
       | the model is decent at language tasks and reasoning (in some
       | cases much better than much larger general-purpose models).
       | 
       | It feels like there is a lot more to do with this model, and I
       | have a suspicion you can even make a half-decent chatbot (at
       | least one focused on code) by finetuning it on conversation
       | (and/or instruction) datasets.
       | 
       | Will follow up with a more comprehensive technical report and the
       | UL2R version (fill-in-the-middle support).
        
         | letitgo12345 wrote:
         | Doesn't the Stack contain HumanEval? So you're basically
         | comparing numbers on the pretraining data.
        
           | godelski wrote:
           | My favorite line from the HumanEval paper[0]
           | 
           | > It is important for these tasks to be hand-written, since
           | our models are trained on a large fraction of GitHub, which
           | already contains solutions to problems from a variety of
           | sources.
           | 
           | So to answer your question, yes, the evaluation dataset is
           | spoiled. You can find such unique and never before seen
           | docstrings like
           | 
           | > For a given list of input numbers calculate the Mean
           | Absolute Deviation around the mean of this dataset. Mean
           | Absolute Deviation is the absolute difference between each
           | element and a centerpoint (mean in this case)[1]
           | 
           | And here's a repo I found that is 8 years old[2]. But how
           | about a more recent one that is even closer?[3] There's
           | plenty more examples[4] (does anyone know how actually limit
           | the date to prior to 2021? `pushed:<2021` doesn't work nor
           | does using the `created` keyword. Date searching doesn't seem
           | to work well).
           | 
           | In essence, we can still use this evaluation method to
           | determine how good our model is at doing fuzzy searching.
           | Which, mind you, is still a useful thing. But I would be
           | careful in concluding that this means the model is good at
           | generalizing arbitrary descriptions of code or novel pieces
           | of code. That said, one may be able to argue that not many
           | lines of code are actually that novel. Still, we need to be
           | careful about our conclusions and understand the limitations
           | of our metrics (something I am currently deeply troubled by)
           | 
           | [0] https://arxiv.org/abs/2107.03374
           | 
           | [1] https://github.com/openai/code-align-evals-
           | data/blob/97446d9...
           | 
           | [2] https://github.com/bertomartin/stat4701/blob/ec2b64f629cb
           | bf6...
           | 
           | [3] https://github.com/danielwatson6/hate-speech-
           | project/blob/64...
           | 
           | [4] https://github.com/search?q=abs%28x+-+mean%29+for+languag
           | e%3...
        
             | godelski wrote:
             | (follow-up: Figured this should be a different comment)
             | 
             | I wanted to demonstrate what I said above so I came up with
             | some examples of things I think a human would have an easy
             | time implementing but might be hard to implement. BUT a key
             | part is that I expect these to be in the dataset! I just
             | don't expect these to be in hundreds or thousands of
             | githubs because they will be uncommon (but not rare). Also,
             | we'll pretty much ask for few-liners to give the model the
             | biggest advantage we can (errors will compound).
             | 
             | Prompt:
             | 
             | from torch import nn
             | 
             | class LipSwish(nn.Module):
             | 
             | """"
             | 
             | The Swish activation function is defined by a gated linear
             | unit,
             | 
             | where the gate is defined by a sigmoid function and
             | multiplies the input with
             | 
             | a learnable parameter, beta. Beta is initialized as 0.5.
             | 
             | The Lipswish function normalizes the output by the upper
             | bound of 1.1.
             | 
             | """"                   def __init__(self:
             | super().__init__()
             | 
             | Result: Mostly correct but missing the division by 1.1. The
             | forward is `return x * F.sigmoid(self.beta * x)`, which is
             | Swish (it also assumes we had "import torch" and applied
             | type hinting). It did properly set the parameter (this is
             | just a 3 liner)
             | 
             | Discussion: The Swish function should be in the dataset and
             | is a well known activation function (though beta is not in
             | the pytorch version). Despite LipSwish being in the dataset
             | (introduced in 2019 from Residual Flows[0]) it is not
             | common. I could get the code to generate the swish function
             | (initializing beta, and performing the gate) but could not
             | get the code to divide the output by 1.1. I would not
             | expect a human to have difficulties with this.
             | 
             | Okay, so let's try something else that might be a bit more
             | common and older. The same paper uses a concatenated
             | activation function, and those aren't "uncommon". CReLU was
             | introduced in 2016[1] and there's plenty of concatenated
             | activations around since then. The pytorch documentation
             | even uses it as an example[2]. There's far more examples of
             | CReLU (3k python results for "class CReLU" vs 58 for "class
             | LipSwish. Use these numbers as weak hints because search
             | sucks and isn't always accurate).
             | 
             | Prompt:
             | 
             | from torch import nn
             | 
             | from torch.nn import functional as F
             | 
             | class CReLU(nn.Module):
             | 
             | """"
             | 
             | Concatenated version of ReLU. The activation is applied to
             | both the positive and
             | 
             | negative of our input and the result is concatenated.
             | 
             | """"                   def __init__(self):
             | super().__init__()              def forward(self, x):
             | 
             | Result: `return torch.cat([x.clamp(min=0),
             | -x.clamp(min=0)], 1)`. This is correct but not the expected
             | one-liner result.
             | 
             | Discussion: This was a bit surprising, it didn't use
             | functional as we might expect (or hinted). But
             | interestingly it will if we change the class name to
             | "ConcatenatedReLU". I found exact copies on GitHub with the
             | full name (memorization) but the fist page of instances for
             | CReLU I found used functional (I did find one that was
             | exactly the above code, when adding "clamp" to the search,
             | but missing the minus sign. There were plenty of errors in
             | CReLU implementations). Interesting side note: CReLU
             | continues and defines a function CReLU6 with uses the same
             | docstring but clamps with a max of 6 on the positive input
             | whereas Concatenated starts to define a convolutional block
             | (Conv + BatchNorm + ReLU) called Conv2d.
             | 
             | So we have kinda mixed results, and in both cases these are
             | rather odd and probably not what we wanted. We can clearly
             | see that there are issues where a human would not have too
             | much trouble. There's a big issue in these types of
             | problems: we need to memorize a lot of information
             | (otherwise we can't even write code or know library calls)
             | but too much memorization prevents creativity. There is a
             | lot of gray area between the _pure_ "Stochastic
             | Parrot"/"Fancy copy machine" vs a generalized intelligence
             | (with a broad and flexible definition of intelligence). I'd
             | still call them stochastic parrots because to me the
             | evidence suggests that we're closer to the memorization
             | side than the creation side. But that doesn't mean these
             | frameworks aren't useful. We all know a lot of code is
             | boiler plate (otherwise we wouldn't have the joke "copy
             | paste from SO") and these tools can be very useful for
             | that. But I think the utility is highly going to depend on
             | what you are coding for and how you code. If you're doing
             | standard stuff, this probably has high utility to you and
             | can save you a lot of time. The same way writing macros
             | does, but this is FAR more powerful. It can also help
             | novices a lot. Also, if your main errors are reading
             | mistakes (e.g. you're dyslexic) -- this is my largest
             | problem -- then this might make things difficult as you
             | have a tendency to gloss over text and miss minor errors. I
             | also don't think these tools would help if you're a
             | researcher or writing optimized or specialized code. These
             | differences are probably why we see such differences in
             | people's reactions. But it may also be a hint into what
             | people do and how they work when we see who raves and who
             | rants about these.
             | 
             | [0] https://arxiv.org/abs/1906.02735
             | 
             | [1] https://arxiv.org/abs/1603.05201
             | 
             | [2] https://pytorch.org/docs/stable/generated/torch.nn.ReLU
             | .html
             | 
             | Edit: We can also check if code is in the stack[3]. We see
             | that [0] is indeed in the dataset so we know there is
             | information leakage. Interestingly the exact copy I found
             | in the previous comment[4] isn't! (The repo, though the
             | user is)
             | 
             | [3] https://huggingface.co/spaces/bigcode/in-the-stack
             | 
             | [4] https://github.com/bertomartin/stat4701/blob/ec2b64f629
             | cbbf6...
        
           | amasad wrote:
           | Can't find it now but pretty sure BigCode said somewhere they
           | explicitly looked for it and removed it. Also subjective
           | measure does match up to the benchmark. Our finetuned model
           | performed +50% on HumanEval and then when using it felt at
           | least that much improved.
        
             | godelski wrote:
             | You can view the prompts, solutions, and checks here[0].
             | See my sibling comment (to yours) where I quote the Human
             | Eval paper and do some more analysis. But I think if you
             | look at [0] you'll see that these aren't really unique
             | problems and are likely to have large repetitions in the
             | dataset. I should add to that comment to include the
             | dataset[1] (too late to edit) where they mention that they
             | just scrape all of GitHub (Jan 1 2015 - Mar 31 2022). They
             | do exact and near de-duplicate but near de-duplication is
             | messy.
             | 
             | > We implement near-deduplication in our pre-processing
             | pipeline on top of exact deduplication. We first split the
             | files into words/tokens based on non-alphanumeric
             | characters and remove files with fewer than 10 tokens.
             | Next, we compute the MinHash with 256 permutations of all
             | documents, and use Locality Sensitive Hashing to find
             | clusters of duplicates. We further reduce these clusters by
             | ensuring that each file in the original cluster is similar
             | to at least one other file in the reduced cluster. We
             | consider two files similar when their Jaccard similarity
             | exceeds 0.85.
             | 
             | Near-duplicates are still difficult to measure. So we
             | should expect duplication, and it should be proportional to
             | the number of samples we have (even if the same variance,
             | but I'd wager higher variance with larger duplications).
             | 
             | [0] https://github.com/openai/code-align-evals-
             | data/tree/97446d9...
             | 
             | [1] https://arxiv.org/abs/2211.15533
        
         | spenczar5 wrote:
         | How is this code licensed? I didn't see a license in the repo.
         | It looks interesting!
        
           | dgacmu wrote:
           | The README indicates:
           | 
           | The base model checkpoint is licensed under the Creative
           | Commons license (CC BY-SA-4.0). Under the license, you must
           | give credit to Replit, provide a link to the license, and
           | indicate if changes were made. You may do so in any
           | reasonable manner, but not in any way that suggests that
           | Replit endorses you or your use.
        
         | sputknick wrote:
         | What does "fine tuning" mean in this context? Does it mean you
         | fine-tuned it on a specific code repository, or collection of
         | code repositories and then had it do work in those
         | repositories?
        
           | amasad wrote:
           | Broadly finetuning is any post pretraining training. Most of
           | the time it is used in the context of fitting a more narrow
           | task. In our case, it was the same training objective as the
           | pretraining but meant to be more representative of what
           | Replit users like to code. However, we were surprised by how
           | well it boosted overall performance. Best guess: it's a)
           | novel data and b) the model could take even more training!!
        
             | sanderjd wrote:
             | You seem to know your stuff some, so I'll ask you a
             | question on this: Are there any good books on all the
             | different approaches in this space, or is it all too new
             | and fast moving for such a thing?
        
             | spenczar5 wrote:
             | How feasible and effective would it be to fine-tune a model
             | against an organization's private source code, resulting in
             | an "internal" model that knows how to work with that org's
             | stuff?
             | 
             | Could you, say, fine-tune the model every week with the
             | latest merges? Every hour?
        
               | naderkhalil wrote:
               | Finetuning a smaller model leading to better performance
               | seems like a significant finding that'll lead to a lot of
               | companies fine-tuning their own internal "ChatGPT"s
        
               | pyth0 wrote:
               | Finetuning is a relatively quick process. Training the
               | base model is the expensive part (can take weeks and huge
               | amounts of compute), whereas finetuning usually is only
               | on the last few layers and can be done with much less
               | resources. You could definitely have a "nightly" finetune
               | model that is retrained every day or so.
        
             | titaniczero wrote:
             | When you fine-tune it, do you train just the head/last few
             | layers or do you also unfreeze the model afterwards and
             | retrain the whole model with a very small LR for a few
             | epochs?
        
           | WinLychee wrote:
           | You can take a network and its weights that someone else
           | trained, and use that pretrained network to train on your own
           | data, which is likely to be a better starting point than
           | random weights.
        
         | newhouseb wrote:
         | First - thank you for open sourcing this! It's a real gift to
         | the community to have a model intended for "commercial use"
         | that's actually licensed as such.
         | 
         | I'd be very interested to hear about the choice/evaluation of
         | the ALiBi approach for positional embedding (perhaps in the
         | technical report).
         | 
         | My intuition suggests that while this allows for better
         | generalizability for longer sequence lengths, it penalizes
         | scenarios where an LLM might need to check for things like a
         | function signature far away from where the next token is
         | generated. My initial testing of this model tracks with this
         | intuition but that's by no means a rigorous evaluation.
        
           | ofirpress wrote:
           | (I wrote ALiBi) You can read the paper here
           | https://arxiv.org/abs/2108.12409
           | 
           | While intuitively it does seem like ALiBi would make it hard
           | for the model to attend to things that are far away, in many
           | scenarios we've tested with different models trained on
           | different datasets, ALiBi _always_ performs better than
           | sinusoidal, rotary, and other embedding types, even when we
           | 're not using it to extrapolate to longer sequence lengths.
           | 
           | These findings have been confirmed by others, including by
           | the BLOOM open source LM project.
        
             | newhouseb wrote:
             | Small world!
             | 
             | Thanks for the link (which I've now skimmed beyond the
             | abstract). What wasn't obvious to me from the abstract is
             | that different attention heads have different penalty
             | strengths, so if some prediction task requires long range
             | dependencies you might expect one of the less-penalized
             | heads to end up specializing. I wonder what would happen if
             | the penalty for one head is zero? (The paper suggests this
             | might've been tried and just made things worse, but
             | unclear)
             | 
             | I must admit that this is a wonderfully elegant (and
             | interpretable) way to do this... much more intuitive (to me
             | at least, a wannabe practitioner) than all of the trig-
             | based embeddings.
        
         | pera wrote:
         | Hi there, I have two question:
         | 
         | 1 - Why did you choose Markdown? It seems an odd choice for
         | training a model like this.
         | 
         | 2 - Have you tried to train only one single PL and then
         | benchmark it against this more general version?
        
           | runnerup wrote:
           | They trained on https://huggingface.co/datasets/bigcode/the-
           | stack-dedup which is a massive curated dataset accumulated
           | from GitHub. Details are here: https://www.bigcode-
           | project.org/docs/about/the-stack/
           | 
           | Many of the most-represented "languages" on GitHub are
           | actually things like JSON, XML, HTML, CSV, text, markdown,
           | YAML, and SVG.
           | 
           | More details from them here: https://blog.replit.com/llm-
           | training
        
           | amasad wrote:
           | 1- We trained on languages that are most popular on Replit.
           | Markdown is important because you need some amount of natural
           | language in the data, and it will act as a sort of "natural
           | language label" for code.
           | 
           | 2- I like how portable it is being a single small model doing
           | a lot of languages. Single code models are an approach that
           | models like Salesforce/Codegen did that, but I believe we
           | beat (or get very close) to their mono models on benchmarks.
        
             | fuzzythinker wrote:
             | Have you thought of finding or creating something like this
             | [0]?
             | 
             | I created this as the basis for my origami folding
             | descriptive language. I tried to find something similar,
             | requirements being both well structured and English-like
             | but couldn't find any, so I created it.
             | 
             | The origami folding app will hopefully be out in 2 weeks,
             | so you can see how it's used.
             | 
             | [0] https://github.com/fuzzthink/mation-spec
        
         | kir-gadjello wrote:
         | Impressive model, thank you for releasing it under a business-
         | friendly license!
         | 
         | Have you considered using Google's sparse "scaling transformer"
         | architecture as the base? Even at 3B scale it can generate 3-4x
         | more tokens per FLOP while being competitive at perplexity with
         | a dense transformer. I think OpenAI uses a variant of it in
         | their ChatGPT-3.5-Turbo product.
         | 
         | Here is the paper https://arxiv.org/abs/2111.12763 and the
         | implementation
         | https://github.com/google/trax/blob/master/trax/models/resea...
         | if you are interested.
         | 
         | Hope you get to look into this!
        
           | b33j0r wrote:
           | Thank you for releasing the weights along with the
           | announcement. The posts that made great headlines, but
           | "weights are on their way!"
           | 
           | Like why did we even get excited? This? Great work.
        
           | chaxor wrote:
           | I don't think it's a business friendly license?
        
         | gbasin wrote:
         | Very exciting, thanks for sharing all this
        
       | swyx wrote:
       | hi HN! back again with an exclusive deep dive with Replit's head
       | of AI. I attended their developer day last week
       | (https://twitter.com/swyx/status/1650989632413401089) just
       | expecting a regular fundraise announcement and was totally
       | shocked when they annoucned their own LLM and also said they
       | would open source it. so immediately asked them for a podcast
       | interview and this is the result.
       | 
       | my favorite learning is how they are pushing the state of the art
       | - openai's HumanEval is the industry standard benchmark for code
       | LLMs, but Reza kindly went above and beyond to show how they use
       | "AmjadEval" - using coder intuition to capture human preference
       | on what output is more helpful to coders (see screenshots
       | https://twitter.com/swyx/status/1653791019421569024?s=20)
       | 
       | please AMA!
        
         | FanaHOVA wrote:
         | This was a lot of fun to record, and second episode where I get
         | an eval question wrong, I'm going to be demoted to bot soon lol
        
           | swyx wrote:
           | means you are human! like the rest of us
        
         | marcodiego wrote:
         | Sorry, I have to ask this: how does this compare to ChatGPT?
        
           | swyx wrote:
           | it doesn't. replit-code-v1-3b is a code LLM, ChatGPT is an
           | app on top of LLMs. it compares to OpenAI Codex, a small
           | version of which is behind GitHub Copilot.
        
             | cubefox wrote:
             | Free ChatGPT is based on code-davinci-002 (GPT-3.5), which
             | is used in OpenAI Codex. See
             | 
             | https://platform.openai.com/docs/model-index-for-
             | researchers
             | 
             | https://help.openai.com/en/articles/6195637-getting-
             | started-...
        
             | mritchie712 wrote:
             | It (replit-code-v1-3b) is already quite good at explaining
             | code:
             | 
             | Input:                   below is a SQL statement:
             | SELECT           CAST(DATE_TRUNC('week', "t1"."TIMESTAMP")
             | AS DATE) AS "WEEK_START",           COUNT(\*) AS
             | "EVENT_COUNT"         FROM
             | "ANALYTICS"."POSTHOG"."POSTHOG_EVENTS" AS "t1"
             | GROUP BY           "WEEK_START"         ORDER BY
             | "WEEK_START"         LIMIT 2000              Explain this
             | SQL. Respond in JSON format with the following keys:
             | TITLE, DESCRIPTION, TABLES         JSON response:
             | 
             | output:                   {             "title": "Weekly
             | Events Count",             "description": "Count of weekly
             | events",             "tables": [                 {
             | "name": "POSTHOG_EVENTS",                     "columns": [
             | "WEEK_START",                         "EVENT_COUNT"
             | ]                 }             ]         }
        
           | runnerup wrote:
           | It's not crucial that it beat ChatGPT this year. That's a
           | pretty unattainable goal for a group like Replit. From the
           | users POV, none of the current copilots compare favorably
           | against ChatGPT, even Microsoft's OpenAI-powered GitHub
           | Copilot.
           | 
           | What's important is that they're preparing for the future by
           | building all the tooling/UI/UX around coding copilots. This
           | way, when costs and feasibility of building ChatGPT-quality
           | LLM's drop and multiple open-source models are available,
           | Replit has the ability to immediately drop them into their
           | production environment. They'll also have the skills and
           | systems to finetune any new models and wring extra
           | performance out of them.
           | 
           | This is more important to users than it seems at first
           | because current UX of things like GitHub Copilot don't allow
           | me to use their AI against my codebase the way that I want to
           | (the way I use ChatGPT). Right now GitHub Copilot is a
           | glorified auto-complete, but I want it to do widespread
           | scaffolding, refactoring, and analysis across my whole
           | codebase. Microsoft has access to LLM's that can do this
           | through their control of OpenAI -- but Microsoft lacks the
           | tooling/UI/UX to bring the power of ChatGPT to me as a user
           | of VSCode/IntelliJ/PyCharm/Visual Studio.
           | 
           | So if Replit can find more innovative, boundary-pushing ways
           | of integrating LLM's, they won't necessarily need the highest
           | quality LLM's to produce a superior user experience. It's a
           | strong signal that Replit is well-positioned for the future,
           | when ChatGPT-like models are democratized.
           | 
           | Hopefully JetBrains is paying attention. They definitely have
           | time to wait a bit more (1-2 years?), but not a lot of time.
           | JetBrains shouldn't solely rely on Github Copilot plug-in to
           | provide their users with LLM's, because it's not clear that
           | the user experience of that plug-in will stay competitive
           | with the user experience that GitHub Copilot will offer
           | directly in VSCode. The IntelliJ/PyCharm plugin may remain
           | "just a fancy auto-complete" while VSCode gets more
           | interactive workflows.
           | 
           | Future IDE's with LLM integration require novel, smart,
           | clever UX typically invented only by very creative people.
           | 
           | It's also worth noting that Replit is not just trying to be
           | an IDE -- they're also building a marketplace to buy/sell
           | coding work, and establishing a small foothold as a niche
           | cloud computing provider.
        
             | ohjfjfk wrote:
             | [dead]
        
             | worldsayshi wrote:
             | I'm a bit surprised that IP and infosec isn't a much bigger
             | part of this discussion.
             | 
             | ChatGPT ought to be a non starter for many use cases where
             | data cannot be shared with OpenAI or where the copyright
             | situation of the generated output could become too vague.
             | 
             | Having the option of open source models that potentially
             | could be self hosted could make those use cases viable.
        
             | furyofantares wrote:
             | Whether it beats ChatGPT right now is important to me,
             | right now.
             | 
             | I'm very excited about everyone doing work even when
             | they're not beating ChatGPT right now, of course.
             | 
             | But how it compares to ChatGPT right now is extremely
             | relevant to lots of people.
             | 
             | It's also become very common to vaguely reference OpenAI's
             | offerings when announcing new models without saying how
             | they actually compare, or only mentioning some small way in
             | which it compares favorably.
             | 
             | (Though it seems to often be that some comment from the
             | article comparing to OpenAI gets promoted to the title when
             | posted on HN, like here.)
        
               | jeron wrote:
               | I think this is somewhat of a naive way to look at this.
               | Yes, ChatGPT is really good, but they're basically
               | completely closed source. A lot of the power of LLMs can
               | and will come from open sourced models that anyone can
               | dig into the weights and tune it for their use case, as
               | well as train and run on their own platform.
        
             | valedan wrote:
             | What does this mean for the future of editors like emacs
             | and (neo)vim? Right now the Copilot plugin for Neovim works
             | pretty much the same as the one for VSCode, but as LLMs get
             | integrated more into IDEs and new workflows are built
             | around them, will the old-school editors be able to keep
             | up? I'm a little worried because I just switched from
             | VSCode to Neovim a few months ago!
        
               | davidkunz wrote:
               | It'll be great if they'd build language servers via the
               | language server protocol, that would be editor agnostic.
        
               | spenczar5 wrote:
               | Github Copilot actually works through the language server
               | protocol already. Document contents are sent to it and it
               | responds with code completions.
        
               | SheinhardtWigCo wrote:
               | This could be the dawn of a new day for the old-school
               | editors. Not to start any wars here, but I could never
               | get the hang of Vim, and that's hardly an unusual
               | complaint. But now, free high-quality personalized
               | "tuition" just became economically viable.
        
               | runnerup wrote:
               | Side note, potentially check out vimtutor, or also
               | https://vim-adventures.com/
        
               | mzz80 wrote:
               | Neovim already can't keep up by itself. The future of vim
               | won't be as a standalone application, but as a plugin
               | into other IDEs. The support for Neovim and VSCodeVim
               | within VSCode greatly reduces the utility of a standalone
               | app for anything other than edits to very small projects.
        
               | soulofmischief wrote:
               | vim is a text editor.
        
             | earthboundkid wrote:
             | I keep saying that it's obvious that local execution is the
             | future of LLMs. Remote execution makes a ton of sense for
             | databases, and most web apps are on some level just CRUD
             | over a remote DB, so we've all gotten used to the idea that
             | in the 21st century a software business should be running
             | remote servers... But LLMs don't need to run remotely, and
             | they don't especially benefit from running remotely either
             | (okay, more training data, but you can batch that and send
             | it back asynchronously). The future is local.
        
               | blueboo wrote:
               | The future is using the best possible tool to drive your
               | work. Won't local models be systematically inferior to
               | bigger commercial offerings for the next few years at
               | least?
        
               | pmoriarty wrote:
               | _" The future is using the best possible tool to drive
               | your work"_
               | 
               | Not if that tool is censored, and you need an uncensored
               | version to do your work. Or maybe you have privacy
               | considerations, or your company policies forbid using
               | something hosted remotely or owned by another company,
               | etc...
        
               | imoverclocked wrote:
               | For shops that want to ensure their own codebase stays
               | local, definitely no.
        
               | theaiquestion wrote:
               | I think that we'll reach "good enough" - and that the
               | commercial offerings won't have much tangible benefit for
               | at least simply being "fancy autocomplete".
               | 
               | Currently you don't really use LLMs for designing the
               | structure, just completing the implementation, and I
               | think that will be very doable locally.
        
               | steve_adams_86 wrote:
               | Maybe. I wonder if very narrow, multi-model systems might
               | eventually deliver better performance and utility than
               | monolithic models like GPT. Rather than pay for access to
               | that, you might be better off investing in resources that
               | can train and learn on exactly what you're doing, rather
               | than something general that is good at a lot of things
               | but not incredible at your specific task.
        
           | heliophobicdude wrote:
           | Hard to compare them actually. The thing about ChatGPT is the
           | chat part. It was trained to interact and respond with human
           | conversation. This is more like CodePilot, with code complete
           | based off of actual code
        
         | swyx wrote:
         | we also did an interview with Varun Mohan of Codeium, which is
         | another competing code model trained from complete scratch:
         | https://lspace.swyx.io/p/varun-mohan#details
        
       | robby_w_g wrote:
       | I recognized the name Replit and couldn't remember why. A quick
       | search reminded me: https://news.ycombinator.com/item?id=27424195
        
         | ec109685 wrote:
         | This founder has extreme views and full of hyperbole:
         | https://twitter.com/amasad/status/1504092244168478728?s=20
        
           | amasad wrote:
           | Is this the best you can find? not even top 10 bangers.
        
         | naillo wrote:
         | [flagged]
        
         | stephenjayakar wrote:
         | this feels like an attempt to hive mind against anything cool
         | from this company
        
           | dimgl wrote:
           | +1, this is unnecessary.
        
             | aardshark wrote:
             | Alternatively, it's called consequences of your actions.
             | Don't be surprised if shitty behaviour comes back to bite
             | you.
        
           | ibrarmalik wrote:
           | I think people are smart enough to receive extra information
           | and do whatever they want with that.
        
           | naillo wrote:
           | Threatening a guy for making an open source version of replit
           | sounds pretty crummy in my eyes.
        
           | robby_w_g wrote:
           | I think it's fair to evaluate a company's behavior before
           | engaging in business with them. And I personally dislike
           | persons in power abusing their position, which is why I
           | remembered the company name almost two years later.
           | 
           | I haven't heard of any similar behavior since then, which is
           | a good sign. But a reputation can be a hard thing to shake.
           | The CEO should have considered that before doing what he did.
        
       | doodlesdev wrote:
       | The model is way too small, comparing it to Codex feels
       | disingenous. Sure it's 77% smaller, it's also 77% worse.
       | Although, it's a cool project nonetheless.
       | 
       | For instance, even this simple snippet generates wrong inline
       | completions:                  // Only return even numbers bigger
       | than 10 from the array        const arrayFilter = (array) =>
       | 
       | Replit-code-v1:                  // Only return even numbers
       | bigger than 10 from the array        const arrayFilter = (array)
       | => {          return array.filter((item) => item > 10);        };
       | 
       | Gets it wrong, returns odd numbers.
       | 
       | Codeium:                  // Only return even numbers bigger than
       | 10 from the array        const arrayFilter = (array) => {
       | return array.filter((num) => num > 10 && num % 2 === 0);
       | };
       | 
       | ChatGPT (GPT-3.5 Turbo) - Code-only, without the rest of the
       | completion since it's instruction-tuned:                  const
       | arrayFilter = (array) => {          return array.filter(num =>
       | num % 2 === 0 && num > 10);        }
       | 
       | Not comparable at all. For reference if anyone wants to test I
       | ran this through the HuggingFace space using the default
       | parameters, ChatGPT through chat.openai.com, and Codeium through
       | the VSCodium extension on an empty JavaScript file.
        
         | amasad wrote:
         | Interesting. This seems like a weakness of natural language
         | understanding. If you rephrase your prompt slightly it would
         | get it right. Try:                 // return even numbers that
         | are also more than 10       const arrayFilter = (array) =>
         | 
         | It would do the right thing. The fine-tuned version gets your
         | prompt right so maybe it benefited from natural language data.
         | Will look more into it.
        
           | doodlesdev wrote:
           | That's really interesting, indeed I can reproduce this by
           | changing the comment. I also managed to get correct output
           | for this sample by renaming the function.
        
             | eevilspock wrote:
             | clearly your original comment was unfair.
        
           | SCLeo wrote:
           | I agree. Maybe it interpreted it as return the numbers that
           | are more than 10 in the given array of even numbers.
           | 
           | For example, if the instruction says "return person objects
           | that are at least 20 years old", it might be more reasonable
           | to generate:
           | 
           | array.filter(item => item.age >= 20)
           | 
           | as oppose to
           | 
           | array.filter(item => (item instanceof Person) && (item.age >=
           | 20))
        
         | johnfn wrote:
         | > Sure it's 77% smaller, it's also 77% worse.
         | 
         | Hehe, yeah, imagine saying you made a new programming language
         | with 77% less lines of code than Python.
        
           | Zababa wrote:
           | Finally, an opportunity to share this
           | https://nsl.com/papers/denial.html
        
             | barking_biscuit wrote:
             | I didn't get the punchline of this, so I asked GPT-4 to
             | explain the punchline. Actually quite amusing.
        
             | [deleted]
        
             | johnfn wrote:
             | I'm curious about the downvotes because I thought I was
             | just agreeing with OP. Obviously lines of code in a
             | programming language repo is no correlate at all to
             | quality. It's like the old adage about measuring aircraft
             | quality by weight.
        
         | moffkalast wrote:
         | Yeah I tried the demo, it wrote some wrong code with comments
         | in Chinese. I think I'll pass.
         | 
         | It's a pretty well accepted fact now that bigger LLM = moar
         | better without exceptions. I'm not sure why there's a race to
         | the bottom of who'll make the most useless model that can run
         | everywhere.
        
         | SheinhardtWigCo wrote:
         | It seems like every week someone comes out with some version of
         | "we can get results similar to OpenAI's API with our model that
         | you can run on a Commodore 64!"
         | 
         | And then you dig in, and it's always far behind in some
         | important way.
         | 
         | Not hating here, I love the pace of iteration, just not the
         | hyperbole.
        
           | barking_biscuit wrote:
           | >"we can get results similar to OpenAI's API with our model
           | that you can run on a Commodore 64!"
           | 
           | I have felt similar frustrations with statements that feel
           | disingenuous too. Thanks for articulating this with such a
           | beautifully hilarious metaphor.
        
         | thewataccount wrote:
         | I need more time to compare it, the short 128 tokens in the
         | demo is a bit rough but -
         | 
         | On first look this seems to blow the current llama based models
         | out of the water including the 30B ones.
         | 
         | Pasting what you want + url + example json with no other
         | context and it "knows" what the url and the json is for,
         | without even telling it.
         | 
         | I'm not even saying it's as good as chatGPT, but this is a
         | tenth the size of the best llama models I've seen.
        
       | jeremypress wrote:
       | Interesting how this guy has a finance background but knows how
       | to code, especially for emerging technologies like LLMs
        
         | ipsum2 wrote:
         | Didn't MosaicML do the training for them?
        
       | youssefabdelm wrote:
       | title is missing: "trained in 1 week, and like most open source
       | LLMs so far... it sucks compared to the closed source
       | alternatives"
       | 
       | Great effort of course bla bla bla...
       | 
       | Open source really needs some benchmarking, and up their game
       | quality-wise.
       | 
       | And yes I know they're expensive as shit to train... let's not
       | keep wasting our money and actually work together, pool our
       | resources, to make a GOOD model.
       | 
       | But oh no, everyone wants to put their stamp on it. "Replit did
       | this! Look at us!"
        
         | ImprobableTruth wrote:
         | This is easy to say, but I think the issue is that getting an
         | LLM right isn't easy, so it's not clear who should steward such
         | a project. Something like BLOOM shows that even if you have the
         | necessary compute, you can still get a model that isn't good.
         | 
         | I think it will take some time for it to be clear who is a
         | leader in training open source models (maybe it will be the red
         | pajama folks?) and I think they'll get more support after that.
        
           | youssefabdelm wrote:
           | Fair point
        
       | Imnimo wrote:
       | Tried it out on the HuggingFace demo, with default settings.
       | 
       | Prompt:
       | 
       | >def nth_prime(n):
       | 
       | Completion:
       | 
       | > if n == 1:
       | 
       | > return 2
       | 
       | > if n == 2:
       | 
       | > return 3
       | 
       | > if n == 3:
       | 
       | > return 5
       | 
       | > if n == 4
        
       | GreedClarifies wrote:
       | This is amazing work and bravo on to the people working on
       | redpajama.
       | 
       | This is fantastic for the world, this means LLMs will not be
       | controlled by a couple of companies with the associated rents.
       | 
       | Yes, private LLMs will likely be a couple of years ahead of
       | 'free' alternatives, but that's _OK_ , we want to incentivize for
       | profit research so long as the services are low priced in time
       | (and in this case in short order).
       | 
       | AMAZING WORK.
        
         | m3kw9 wrote:
         | Have you even tried it? It's pretty bad
        
           | GreedClarifies wrote:
           | But that's _fine_ it can be a year or two behind the state of
           | the art. That 's not the point.
           | 
           | The point is that there will be alternatives and that will
           | reduce the price in time further increasing the impact of the
           | technology.
           | 
           | There was a possible future where only MSFT and maybe GOOG
           | and maybe one or two other companies had this technology and
           | extracted massive rents.
        
         | laweijfmvo wrote:
         | My first reaction was, "why is replit building LLMs," but I
         | guess it fits their needs to have one optimized for their use.
         | But I wonder, is this the beginning of another wave of "every
         | company is an AI company?" Are we going to see a spike in tech
         | hiring around AI/LLM, money starting to flow again, etc? And
         | how many years until it all blows up and the layoffs start?
        
           | dpflan wrote:
           | Finetuning models and LLMs (and any model) is going to a be
           | common practice . Each company is its own domain, which
           | domain knowledge and data to specialize open sourced models
           | or used other models to distill/teach their own proprietary
           | model (home grown or modify someone else's).
        
         | swyx wrote:
         | to be clear this work is not based on redpajama - though we did
         | discuss that in the previous episode
         | https://twitter.com/swyx/status/1648080532734087168?s=46&t=9...
        
           | GreedClarifies wrote:
           | Oh my bad!
           | 
           | I thought I read that, is it based upon:
           | 
           | https://arxiv.org/abs/2211.15533 (The Stack) ?
        
             | swyx wrote:
             | partially. Reza discussed their data pipeline in the
             | blogpost that we reference in the show notes
        
       | robertlucas wrote:
       | Is there any way to connect these new code focused LLMs into VS
       | Code in order to replace Github Copilot?
        
       | hinkley wrote:
       | I think that 20 years from now, we'll all be sitting around
       | wondering 1) where the fuck are my flying cars, and 2) what were
       | they thinking using computers to write code?
       | 
       | And the reason I say this is because these tools are answering a
       | question that we haven't asked yet: what common problems need to
       | be solved in this programming language, and where do I get code
       | to solve that problem?
       | 
       | These LLM modules are basically telling us how to duplicate code,
       | and what we need is the opposite: how to stop reinventing the
       | wheel for the 100th time.
       | 
       | Instead of writing code for me, tell me if I already have it. If
       | I'm writing it, tell me there's a library for that. If I'm a
       | library writer, give me suggestions for what libraries are
       | missing from the toolkit.
       | 
       | All we've done so far is begun the process of automating the
       | production of duplicate code. With absolutely no way to go back
       | in time and correct bugs introduced in earlier iterations. We are
       | likely, for instance, to see 0 day attacks that affect hundreds
       | of applications, but with no simple way to describe which
       | applications are affected. That's going to be a first rate
       | trainwreck.
        
         | moffkalast wrote:
         | Well fwiw, working with GPT 4 it often suggests which libraries
         | to use assuming the question allows for it, so it's not like
         | everyone's writing everything from scratch.
         | 
         | But libraries and especially frameworks as they are these days
         | are also a giant liability more often than not. APIs change for
         | no reason, they can be removed from the package manager at any
         | moment without warning, people may slip malicious code into
         | them past LGTM reviews, have recursive dependencies upon
         | dependencies that bloat and slow down your build process, etc.
         | 
         | Sometimes you don't need the to install the entire damn car
         | manufacturing plant and dealership it comes with just to get
         | that one wheel you needed. And an LLM can just write you the
         | code for a very nicely customized wheel in a few seconds
         | anyway.
        
         | webnrrd2k wrote:
         | I agree -- maybe someday LLMs will give me a the code for a set
         | of simple abstractions that are well-matched for the problems I
         | currently face. Something like a Pattern Language that was all
         | the rage, but, um, better? More objective and pragmatically
         | useful. Not galaxy-brain theory.
         | 
         | That's what I really want. But that would also put me out of a
         | job.
        
         | seydor wrote:
         | > how to stop reinventing the wheel for the 100th time.
         | 
         | The idea of libraries may not have been a good one. It saved
         | human time but no library is perfect because no abstraction is
         | perfect and this causes unnecessary bloat. It seems tha Nature
         | does not use libraries, it uses replication instead, and we can
         | now have that too.
        
           | x-shadowban wrote:
           | Ha I never wondered what the physical/life version of a
           | shared library is until I read your post so thanks for that.
        
           | webnrrd2k wrote:
           | You have a point, but I think there are some big trade-
           | offs...
           | 
           | Nature uses replication, but it's also horrifically complex
           | and we have no real idea about the specifics of how it all
           | works, or what to do when many, many things go wrong.
           | 
           | Also, I think nature uses cloning, which I kind of think
           | would be called a 'library' in this case, for single-celled
           | organisms (archaea and bacteria). In addition many eukaryotic
           | organisms can reproduce via cloning under special situations.
           | 
           | I don't know, I'm not really trying to argue one way or the
           | other. I'm kinda' thinking out loud here... but I'd like to
           | see LLMs used to create really great libraries, or some other
           | abstractions, that are easy to use and also understandable.
           | It might not happen soon, but I think that there is a lot of
           | value in moving things that way.
        
           | hinkley wrote:
           | So instead everyone who has to solve a problem has to be an
           | expert on that problem, rather than just an informed
           | consumer.
        
           | sicariusnoctis wrote:
           | Replication does not help in managing completely. That's why
           | we use abstractions, even with the problems they have.
        
       | tyingq wrote:
       | More tools in the field is great! I tried a few things, and it's
       | reasonable, but it does have some quirks that seem to repeat,
       | like:
       | 
       | I tried a prompt of:                 # python function that
       | returns a random integer between min and max
       | 
       | And it produced:                 def random_int(min, max):
       | return random.randint(min, max)            # define the size of
       | the grid       n = 5
       | 
       | It doesn't add the needed import statement, and I'm unclear why
       | it's "defining the size of the grid".
        
         | radq wrote:
         | I've had the issue of generating random code after the
         | completion with other models as well; it's due to how the
         | models are trained. You need to stop generating when you
         | encounter token(s) that indicate you're done - see
         | https://huggingface.co/replit/replit-code-v1-3b#post-process...
        
         | agilob wrote:
         | I get such unrelated statements from copilot too, not often,
         | but a few I remember.
        
         | tyingq wrote:
         | Based on the the replies, I tried a different prompt:
         | # python script that prints out an integer between min and max
         | 
         | And it did better. Included the import, didn't add unrelated
         | code, but did still put the code inside a function.
        
         | circuit10 wrote:
         | That's because it's not following instructions like ChatGPT,
         | it's just trying to guess that could plausibly come after what
         | you put, like Copilot or the old GPT-3 models
        
           | minkzilla wrote:
           | and imports are (almost) always at the top of your file not
           | with this function
        
             | vharuck wrote:
             | I tried the same input, except wrapping it in triple-quotes
             | instead of commenting it. So that it would match the
             | standard practice for module doc strings. Here's the
             | result:                   """python function that returns a
             | random integer between min and max"""             return
             | random.randint(min, max)                   def
             | gen_random_float(min, max):             """python function
             | that returns a random float between min and max"""
             | return random.uniform(
             | 
             | So, it assumed the triple-quote was a function's doc
             | string, despite it not being indented. It then assumes I'll
             | want a similar function for floats (I assume it was cut off
             | by a token limit).
        
           | jeremyjh wrote:
           | Isn't ChatGPT also just generating plausible text that could
           | be a response to an instruction?
        
             | circuit10 wrote:
             | "that could be a response to an instruction" is the
             | critical part here
        
             | travisjungroth wrote:
             | Yeah, at their core they're both trying to guess/generate
             | what comes next. Differences: Being trained towards
             | conversations versus code. Hyperparameters set to stop
             | differently. "Wrappers" that form the prompt.
        
             | whimsicalism wrote:
             | It's not generating the most likely next word in the 'meta-
             | corpora' of all possible discussions similar to the ones it
             | has been trained on, it is trying to generate plausible
             | text that would be scored well as a helpful assistant - and
             | in the process has transferred knowledge acquired from its
             | pre-training task.
        
         | amasad wrote:
         | LLMs generally but more so small models will keep going and
         | generate seemingly unrelated things. On the frontend tools like
         | Copilot and Ghostwriter do a lot of things like use stopwords
         | or simply not show completions outside a single block.
         | 
         | As for your prompt, it's following your prompt a little too
         | closely and generating just the function. You can however
         | condition it that this is the start of the program it will do
         | the import, e.g.                  # python function that
         | returns a random integer between min and max        import
         | 
         | This is in fact a suggestion from OpenAI on best practices for
         | prompting called "leading words"
         | https://help.openai.com/en/articles/6654000-best-practices-f...
        
       | fswd wrote:
       | I can barely keep up with this stuff, but quick question. Is
       | there a way to simply change the URL setting of copilot to point
       | to this model? Obviously it needs an endpoint, I could hack
       | something up, but asking if somebody has already done this? Would
       | be nice to cancel my copilot.
        
         | jacobrussell wrote:
         | I don't think it's possible to point Copilot to other models. I
         | don't think Microsoft would benefit much from that feature. You
         | could use existing tools [0] to host your own model which in
         | theory could be used by an extension your IDE uses. But I'm not
         | sure if an extension like that exists.
         | 
         | [0] https://github.com/oobabooga/text-generation-webui
        
           | circuit10 wrote:
           | Of course it's possible, just not officially
           | 
           | See https://github.com/fauxpilot/fauxpilot/blob/main/document
           | ati...
        
         | circuit10 wrote:
         | There's https://github.com/fauxpilot/fauxpilot but it doesn't
         | use this model
        
         | execveat wrote:
         | It's nowhere close to Codex/Copilot. Try the demo:
         | https://huggingface.co/spaces/replit/replit-code-v1-3b-demo
        
           | tarruda wrote:
           | So they are lying on this tweet?
           | https://twitter.com/Replit/status/1651344186715803648
        
             | naillo wrote:
             | Yep
        
       | tarruda wrote:
       | 3 billion parameters. Does that mean I will be able to run on a
       | 8gb consumer GPU?
        
         | generalizations wrote:
         | Means that once it's incorporated into llama.cpp, you can run
         | it on your laptop.
        
           | amasad wrote:
           | Hopefully on phones too
        
         | dontwearitout wrote:
         | Probably not out of the box but if some of the local deep
         | learning wizards get a quantized version working well and
         | optimize it a bit, definitely.
        
         | RHab wrote:
         | No, I could only get 2.7B to run on 8GB VRam unfortunatly.
        
           | amasad wrote:
           | it is 2.7B
        
             | tarruda wrote:
             | Loading seems to have worked on my laptop's RTX 3070,
             | `nvidia-smi` shows `5188MiB / 8192MiB` in memory usage.
        
         | pera wrote:
         | their pytorch_model.bin is 10.4GB
        
           | tarruda wrote:
           | I just loaded this on my laptop's RTX 3070 GPU by following
           | the instructions here: https://huggingface.co/replit/replit-
           | code-v1-3b
           | 
           | I don't know how I can test the model, but it seem loading
           | worked. When I run `nvidia-smi` on another terminal, I see
           | `5188MiB / 8192MiB` in the memory-usage column.
        
             | swyx wrote:
             | you can load it but you cant run inference? whats the
             | issue?
        
               | tarruda wrote:
               | No issue, I'm simply unfamiliar with python machine
               | learning APIs.
               | 
               | I managed to run inference locally by installing the
               | requirements and running app.py from the demo:
               | https://huggingface.co/spaces/replit/replit-
               | code-v1-3b-demo/...
               | 
               | It is very fast on my RTX 3070, VRAM usage goes to ~=
               | 6.3GB during inference.
        
       | chaxor wrote:
       | This is a bit hard to believe that the system is decent at
       | producing code which captures complex ideas and higher level
       | structure when the tokens/param value is >30 (it's ~200 here? )
       | The 'good' models (meaning having lots of 'knowledge' or
       | 'memorization' about the dataset) typically tend to be around 2
       | tokens/param and models with decent generation of language with
       | less knowledge/memorization are around 30 tokens/param. Perhaps
       | the domain allows for this, but due to the fact that the
       | linguistic interface on the input is still needed... It's hard to
       | believe.
        
         | swyx wrote:
         | this kind of critical thinking is exactly what replit is going
         | to need for their stated goal of doing whole-app generation.
         | right now they only test it on AmjadEval. you... might wanna
         | consider joining them to work on it?
        
         | EvgeniyZh wrote:
         | Are you saying the less you train the model the better it is?
         | I'm confused
        
         | gnramires wrote:
         | Tokens/param shouldn't matter more than the total training
         | FLOPs, I believe. Clearly if we train a your claimed 'ideal' 2
         | tokens/param a very small dataset (not many tokens in the first
         | place), it wouldn't have enough data to properly learn the
         | relevant languages. Once there is enough data, then it becomes
         | a question of model capacity (does it have enough degrees of
         | freedom to support the computational structures needed?).
         | 
         | I believe the overparametrization largely helps with
         | generalization and reducing overfitting, at 2 tokens/param
         | there's much more degrees of freedom than structures that can
         | be learned from what I can tell (the extra capacity just
         | provides good breathing room for internal structures). But if
         | your model has enough capacity, and you can find a good enough
         | training method (and you have enough data to learn the task),
         | then you should be able to succeed in arbitrary low
         | tokens/param, which is good to keep in mind to make efficient
         | models.
        
       | waffletower wrote:
       | No Clojure. No Julia. No Haskell. No Racket. No Scheme. No Common
       | Lisp. No OCaml. And, as much as I despise Microsoft, No C#. No
       | F#. No Swift. No Objective-C. No Perl. No Datalog. A glaringly
       | lacking choice of languages.
        
         | mclide wrote:
         | Despite the lack of examples, it still completes trivial
         | clojure like "(defn connect [" and other lisp syntax like
         | "(define (hello" which is promising for further refinement
         | training on Lisp languages.
        
         | Dayshine wrote:
         | C# was available in the dataset they link, and is the most
         | glaring ommission by global usage...
        
         | ubertaco wrote:
         | I fed it some OCaml and it worked, though the example was
         | trivial:                   type point = { x: int; y : int }
         | let manhattan_distance (a: point) (b: point) : int =
         | 
         | which it completed to                   type point = { x: int;
         | y : int }         let manhattan_distance (a: point) (b: point)
         | : int =             abs (a.x - b.x) + abs (a.y - b.y)
         | 
         | ...which is a valid and correct OCaml definition of this
         | method:
         | 
         | https://try.ocamlpro.com/#code/type'point'='$4'x:'int;'y':'i...
        
         | ebiester wrote:
         | I'm sure that has to do with the dataset available to them.
        
           | runnerup wrote:
           | Which is a deduplicated version of this: https://www.bigcode-
           | project.org/docs/about/the-stack/
           | 
           | And probably, yes. While it contains 358 programming
           | languages, obviously there's a long tail after the 20 most-
           | represented languages. Some people might not expect without
           | thinking about it for a bit that many of the most-represented
           | "languages" are actually things like JSON, XML, HTML, CSV,
           | text, markdown, YAML, SVG.
           | 
           | Also note that it won't be able to parse natural language
           | nearly as well without additionally being trained on
           | something like the LAION dataset, so this version will be
           | more of an autocomplete like Copilot rather than something
           | which can manifest high level business logic from whole cloth
           | like ChatGPT.
        
         | sitkack wrote:
         | You could take it and finetune it on a bunch of Lisps, probably
         | cost on the order of 50-500 to do that.
        
           | swyx wrote:
           | if anyone from MosaicML is reading this, i'd love a guide on
           | how to do exactly this!
        
       | abxytg wrote:
       | [flagged]
        
       | love2read wrote:
       | Can this be used with the copilot plugins for every ide?
        
       | SXX wrote:
       | Weak spot which I guess similar to other LLMs. If you mention
       | recursion somewhere in comments model sometimes start to
       | recursively generate the same lines over and over again.
        
       | user3939382 wrote:
       | Unfortunately I'm someone who sometimes can't separate the art
       | from the artist. Replit is the company where the founder sent
       | these nasty pompous threats to their ex-employee for their
       | innocent side project and then tried to double talk his way out
       | of it with a bs non-apology when it got exposed in public. I
       | won't support Replit or anything they make.
        
       | davidy123 wrote:
       | I keep thinking there should be a way to train a copilot against
       | just one set of code libraries. I know LLMs require training
       | against a lot of text to get their smarts, but is there a way to
       | set this up so a model can be created for a specific library by
       | anyone, so it could provide open source support via a transformer
       | + model? Maybe this would be a better approach than a jack of all
       | trades, master of none.
        
       ___________________________________________________________________
       (page generated 2023-05-03 23:00 UTC)