[HN Gopher] Launch HN: Morph (YC S23) - Apply AI code edits at 4...
       ___________________________________________________________________
        
       Launch HN: Morph (YC S23) - Apply AI code edits at 4,500 tokens/sec
        
       Hey HN, I'm Tejas at Morph. We've built a blazing-fast model for
       applying AI-generated code edits directly into your files at 4,500+
       tokens/sec. No more slow full-file rewrites or brittle search-and-
       replace hacks.  Here's a demo video:
       https://www.youtube.com/watch?v=LdT8epGHJPk.  Why? AI spits out
       code that can't reliably be inserted into existing code. Full file
       rewrites, brittle search-and-replace hacks are too slow, expensive,
       or error-prone.  Morph's approach:  - Your agent outputs edits
       "lazily", referencing unmodified lines in the existing file (ex: //
       ...existing code...)  - Morph instantly applies these edits to a
       file using our Fast Apply model + speculative decoding against the
       original file, making AI patches fast, reliable, and production-
       ready.  This approach was pioneered by Cursor last year, but their
       models aren't available as APIs--so we built Morph for developers
       everywhere (with a large free tier!)  Live demo (no signup):
       https://morphllm.com/dashboard and docs:
       https://docs.morphllm.com/quickstart  We have 2 Fast Apply models:
       morph-v3-fast - 4500+ tok/sec, and morph-v3-large - 2500+ tok/sec.
       These models power Fast Apply at create.xyz, databutton,
       continue.dev, and more!  We also provide retrieval models for
       embedding + reranking. Next Up: Inline Edit Model (Cmd-K):
       Extremely fast inline edits - keep dev flow state; and Morph Tab
       API: Our Next Edit Prediction model guesses your next code edit +
       action with sub-500ms latency. It's currently in private beta, but
       you can request early access here: https://morphllm.com/tab  Hot
       takes:  1) Raw inference speed matters more than incremental
       accuracy gains for dev UX--agree or disagree?  2) Full-file
       rewrites by frontier models are legacy--Fast Apply edits win on
       speed, cost, reliability.  3) As benchmarks on narrow tasks
       saturate to 99%+, complexity is shifting from single frontier
       models to specialized inference-optimized models. As frontier
       models move upmarket, they'll leave simple tasks behind, and
       they'll be used to do tasks only frontier models can do  We'd love
       to hear your ideas and experiences with coding agents!
        
       Author : bhaktatejas922
       Score  : 130 points
       Date   : 2025-07-07 14:40 UTC (8 hours ago)
        
       | handfuloflight wrote:
       | Is there anyway to bring this into Claude Code?
        
         | bhaktatejas922 wrote:
         | There might be a way to using their new hooks commands, but out
         | of the box, not yet. email us if you want to make it happen!
         | 
         | https://docs.anthropic.com/en/docs/claude-code/hooks
        
         | booli wrote:
         | If this proves the way forward, it will be in Claude Code soon
         | enough natively
        
           | koakuma-chan wrote:
           | There is already https://www.relace.ai/, albeit not as
           | blazing fast at mere 4300 tok/s
        
           | bhaktatejas922 wrote:
           | Perhaps. Boris from the Claude Code team shares a bit about
           | their view here https://www.youtube.com/watch?v=Yf_1w00qIKc
           | 
           | My read is that despite Claude moving upmarket in what it can
           | do, they are keen on clinging to all the (token heavy) tasks
           | they're leaving behind
        
         | halfjoking wrote:
         | Make an MCP server, and turn off the Write|Edit|MultiEdit
         | tools?
         | 
         | Actually - that's what this company should do. It should be an
         | MCP server so anyone could plug it into any agent with a url
         | and an API key.
        
           | bhaktatejas922 wrote:
           | great idea! we'll have one up soon :)
        
       | amelius wrote:
       | Can't you ask these LLMs to simply output a patch file?
       | 
       | https://man7.org/linux/man-pages/man1/patch.1.html
        
         | bhaktatejas922 wrote:
         | you can - but they dont work reliably in practice. Common
         | issues include search match fails, missing commas in replaced
         | items (model doesnt have surround context while replacing), and
         | a few other error cases. This issues are much worse for
         | scattered edits across a file from real world queries (ex: make
         | this page look nicer). Patches tend to work fine for single
         | line or extremely focused edits though - Cursor uses
         | s&r/patches for single line edits:
         | 
         | https://github.com/x1xhlol/system-prompts-and-models-of-ai-t...
        
           | treyd wrote:
           | I wonder if it'd be feasible to have a much smaller model
           | that could go in and correct these meshing issues that
           | require simpler reasoning?
        
             | bhaktatejas922 wrote:
             | hm maybe but correction/issue detection is a much harder
             | task for models. If you pipe back the errors in it could
             | work, but personally still see Fast Apply as the better
             | approach
        
       | deepdarkforest wrote:
       | > 1) Raw inference speed matters more than incremental accuracy
       | gains for dev UX--agree or disagree?
       | 
       | I know you are trying to generate some controversy/visibility,
       | but i think if we are being transparent here, you know this is
       | wrong. People prefer using larger (or reasoning) models, with
       | much bigger diff in tok/sec just for quality in coding, it comes
       | first. Even if i have a big edit to apply, like 5k tokens,
       | 200-300ms of difference in edit time are nothing. Edit speed is
       | definitely not a bottleneck for dev UX, quality is. A dev who
       | wants to save 200ms every code change over _quality_ is someone
       | who well, i cannot relate. If im using 1-2 agents in parallel,
       | most of the time the edits are already applied while im reviewing
       | code from the other agents. But again maybe that 's just me.
       | 
       | Speaking of quality, how do you measure it? Do you have any
       | benchmarks? How big is the difference in error rate between the
       | fast and large model?
        
         | bigyabai wrote:
         | The marketing language seems to suggest they're insecure over
         | quality and want to promote quantity. But I'm in the same boat
         | as you - I would happily take 10 tok/sec of a correct answer
         | instead of wasting an hour curating 4500 tok/sec throwaway
         | answers. Benchmark performance matters 100x more than your
         | latency.
         | 
         | If these "hot takes" extend into Morph's own development
         | philosophy, then I can be glad to not be a user.
        
           | bhaktatejas922 wrote:
           | There's no amount of error rate that's acceptable to us -
           | edits should always be correct. We've just found anecdotally
           | the saving users time is just provably also very important
           | for churn, retention and keeping developer flow state, right
           | after accuracy.
        
             | bigyabai wrote:
             | Then why are you using a custom model instead of an
             | industry-leading option?
             | 
             | I don't mean to be rude, but I can't imagine you're selling
             | a product on-par with Claude 3.7. _Some_ level of
             | performance tradeoff has to be acceptable if you prioritize
             | latency this hard.
        
               | bhaktatejas922 wrote:
               | We're not - our model doesn't actually think up the code
               | changes. Claude-4 or Gemini still writes the code, we're
               | just the engine that merges it into the original file.
               | 
               | Our whole thesis is that Claude and Gemini are extremely
               | good at reasoning/coding - so you should let them do
               | that, and pass it to Morph Fast Apply to merge changes
               | in.
        
           | johnfn wrote:
           | Anyone can get 10 tok/sec - just tell the model to output the
           | entire file with changes, rather than just the delta.
           | 
           | Whatever LLM you're using will have a baseline error rate a
           | lot higher than 2%, so you're going to be reviewing all the
           | code it outputs regardless.
        
             | bhaktatejas922 wrote:
             | yeah even claude is well over 11% error rates with search
             | and replace
        
           | IanCal wrote:
           | This is a code editing model. 10 tokens per second editing
           | may as well not exist for any interactive use case.
        
         | bhaktatejas922 wrote:
         | I think it depends - the actual thing to measure it to keep a
         | developer in flow state. Many errors as well as latency break
         | this. To be brief yes, accuracy comes first.
         | 
         | Quality is measured 2 main ways:
         | 
         | 1) End-to-end: User query -> to task resolution. These are
         | aider style benchmarks answering the question of actual task
         | completion
         | 
         | 2) Apply Quality: Syntax correctness, character diff, etc..
         | 
         | The error rate for large vs fast is around 2%. If you're doing
         | code edits that are extremely complex or on obscure languages -
         | large is the better option. There's also an auto option to
         | route to the model we think is best for a task
        
           | deepdarkforest wrote:
           | Glad to hear quality comes first! Then I assume you have some
           | public benchmarks like the ones you mention that are
           | reproducible? I could only find this graph
           | https://docs.morphllm.com/guides/apply but there is no
           | mention of what it refers to, what data it used etc.
        
           | candiddevmike wrote:
           | I don't believe anyone can be in some kind of "flow state"
           | while waiting on LLM responses. I think it's funny that we
           | complained for years about C and others being slow to compile
           | and now folks are fine waiting seconds++ everytime they want
           | to change something.
        
             | bhaktatejas922 wrote:
             | how so? Is your view that flow state at all isnt a thing,
             | or just with using LLMs?
        
               | candiddevmike wrote:
               | Flow state is 100% a thing, it's just impossible with
               | LLMs (at least, for me). I can't be blocked waiting on
               | things during a flow state or my mind starts wondering to
               | other places.
        
               | ada1981 wrote:
               | I've had the opposite experience.
        
               | bhaktatejas922 wrote:
               | same
        
               | bhaktatejas922 wrote:
               | Fast Apply definitely helps with keeping flow state and
               | is a large part of Cursor's success
               | 
               | Personally I work on multiple repos at a time to solve
               | for this
        
               | 0x457 wrote:
               | I do it like simultaneous exhibition in chess:
               | 
               | - Multiple repos or independent changes in monorepo
               | 
               | - First round of changes idgaf about anything beyond
               | public interface and unit tests                  - I
               | review public interface and make changes if needed
               | - I review unit tests it wrote to see that at least from
               | the outside it looks alright.           - here I either:
               | - make more unit tests (features, edge cases and make it
               | write code for it)             - polish what it generate
        
               | bhaktatejas922 wrote:
               | sounds like flow state to me
        
               | 0x457 wrote:
               | oh it's fore sure is. But I use amazon q almost
               | exclusively. One thing that gets me out of this state:
               | when I have to do the math on "should I just do it
               | myself" vs "keep refining prompt/context until this thing
               | finally gets it right".
        
               | bhaktatejas922 wrote:
               | so frustrating how slow edits are in Q dev
        
               | klank wrote:
               | Time really is a flat circle. My software career started
               | with me archaically flipping characters in a file I
               | vaguely understood with long pauses waiting on magic
               | compilers to give me my actual output.
               | 
               | Now it's dying in the same place. Thankfully I got to
               | spend the brunt of my career working through the fun,
               | intermediate years.
        
               | bhaktatejas922 wrote:
               | I've never had so much fun coding in my life - you should
               | definitely give it a try again!
        
               | klank wrote:
               | Thanks, I appreciate the good vibes.
               | 
               | However, it's kind of a trope for me at this point that
               | people assume a negative opinion of using generative AI
               | in the development process is due to a lack of experience
               | using it.
        
               | simonw wrote:
               | Have you tried any of the ludicrously fast LLM demos yet?
               | 
               | https://inference.cerebras.ai/ and https://groq.com/ and
               | https://deepmind.google/models/gemini-diffusion/
               | (waitlisted) are all 10 to 100x faster than regular
               | models, which really does have a meaningful impact on how
               | I interact with them because I don't have to disengage
               | for 15+ seconds while I wait for a response.
               | 
               | I have video demos of a few of those:
               | https://simonwillison.net/2024/Oct/25/llm-cerebras/ and
               | https://simonwillison.net/2024/Oct/31/cerebras-coder/ and
               | https://simonwillison.net/2025/May/21/gemini-diffusion/
        
           | Aurornis wrote:
           | > the actual thing to measure it to keep a developer in flow
           | state.
           | 
           | Personally, I find flow state hard to achieve when I
           | constantly have to switch modes to debugging LLM output or an
           | edit error that I missed.
           | 
           | When the majority of time is spent waiting for the main LLM
           | to think, I will always wait a few extra seconds for a better
           | edit than risk having to spend multiple cycles playing find-
           | the-bug because something didn't get applied correctly
           | somewhere.
        
             | bhaktatejas922 wrote:
             | Like most things its a tradeoff. Developer tolerance for
             | errors is extremely low - but the error rate for Fast Apply
             | is even lower
        
         | ashwindharne wrote:
         | I do find that having inference happen ~50% faster is much more
         | valuable to my workflow than a single digit accuracy increase.
         | If I'm going to have to check that the changes are correct
         | anyways, getting more iterations in faster feels much better
         | than incremental accuracy.
         | 
         | There's definitely a tipping point though. If the accuracy
         | gains are so high that I can check its work less carefully or
         | less often, the benefits of inference speed are effectively
         | nil.
        
           | bhaktatejas922 wrote:
           | exactly. The point is that none of the users even realize a
           | model is doing the apply - it should be so accurate and fast
           | that it feels like its not there
        
           | walthamstow wrote:
           | Agreed. Sonnet 4 is supposedly better than Sonnet 3.5, but in
           | Cursor 3.5 is much faster so that's what I use
        
         | smrtinsert wrote:
         | Slow is smooth and smooth is fast.
        
           | bhaktatejas922 wrote:
           | and speculative edits is faster
        
         | Cort3z wrote:
         | As far as i understand, this is not +-300ms. It is 300ms vs. 10
         | sec or something. That is a huge difference. I personally find
         | the time to wait for these larger models a limiting factor.
         | It's also probably a resource waste for fairly simple task like
         | this. (Compared to the general function approximation of the
         | llms)
         | 
         | But I honestly feel like the task of smartly applying edits
         | falls somewhat within traditional coding tasks. What about it
         | is so difficult it could not be done with a smart diffing
         | algorithm?
        
           | deepdarkforest wrote:
           | you misunderstood. its 300ms just for the apply model, the
           | model that takes your coding models output (eg sonnet) and
           | figures out where the code should be changed in the file.
           | Cursor has its own, and claude uses a different technique
           | with strings as well. So its 10sec vs 10sec +300ms using your
           | analogy
        
             | Cort3z wrote:
             | Their selling point is to be a more open version of what
             | cursor has. So the alternative is to use a full llm. So it
             | is 10s+ 10s vs 10s+ 300ms
        
           | bhaktatejas922 wrote:
           | it's a bit unclear why a model works best here. in short -
           | smart diffing is edge case hell and you'll never capture all
           | of them
        
         | k__ wrote:
         | I have to admit, that using slow models is unbearable when I
         | used fast one before.
         | 
         | I don't know if the quality and speed are linearly related,
         | though.
        
           | AirMax98 wrote:
           | Seriously agree -- try using something like Sonnet 3.7 and
           | then switching to Gemini 2.5 Pro. The code that both output
           | is fine enough -- especially given that I mostly use LLMs as
           | a fancy autocomplete. Generally a better prompt is going to
           | get me closer to what I want than a more robust model. The
           | speed hit with Gemini 2.5 Pro is just too substantial for me
           | to use it as a daily driver.
           | 
           | I imagine the speed difference might not matter so much if
           | you are performing seismic updates across a codebase though.
        
         | paulddraper wrote:
         | I do not use Opus for coding, I much prefer Sonnet.
         | 
         | Many tasks work better with iteration/supervision and Sonnet
         | makes that feasible.
        
           | bhaktatejas922 wrote:
           | yeah same. I feel like Opus tends to be slightly more
           | sycophancy leaning on technical topics
        
       | rs186 wrote:
       | Sounds interesting, but I imagine all the big players (Cursor,
       | Windsurf, and maybe even OpenAI/Anthropic) will achieve something
       | similar very quickly in their tools first-party, which will
       | decimate the company. And I don't get the API part of this -- at
       | the end of the day people use those IDEs, and I don't see
       | developers/companies want to send their code to yet another
       | endpoint.
        
         | bhaktatejas922 wrote:
         | Perhaps - Cursor does this in house. I see the coding agent
         | space being large as we shift into a market of on-demand
         | software.
         | 
         | Sending code externally is meh especially for companies with
         | tight security rules. We do self-hosting for them in their
         | infra
        
       | Qerbz wrote:
       | Heard some insane rumors of the efficacy increase of this in
       | action even though I don't know how you do it
        
         | bhaktatejas922 wrote:
         | the rumors are true! learn how we do it by joining the team :)
        
         | bigyabai wrote:
         | That's an interesting first comment to post from a 5-month old
         | account.
        
       | zackangelo wrote:
       | For anyone more curious about how this works, Fireworks wrote a
       | blog post about it last year (I think):
       | 
       | https://fireworks.ai/blog/cursor
        
       | simonw wrote:
       | This uses an OpenAI-compatible endpoint, so got this working with
       | my https://llm.datasette.io/ CLI tool.
       | 
       | First I added their models to my ~/Library/Application
       | Support/io.datasette.llm/extra-openai-models.yaml file:
       | - model_id: morph-auto         model_name: auto         api_base:
       | https://api.morphllm.com/v1         api_key_name: morph
       | 
       | Then I added the API key like this:                 llm keys set
       | morph       # Paste in API key from https://morphllm.com/api-keys
       | 
       | Then I saved an LLM template with their prompting pattern:
       | llm -m morph-auto '<code>$code</code><update>$update</update>'
       | --save morph
       | 
       | Now I can run operations like this:                 llm -t morph
       | -p code "$(cat orig.txt)" -p update "$(cat update.txt)"
       | 
       | The -t option is the template I named when I ran --save. The -p
       | name value options then set the content for the template $code
       | and $update variables.
       | 
       | Example transcript here:
       | https://gist.github.com/simonw/de67818603d448a3fee788ace2976...
       | 
       | One thing that worries me: since it's using XML-style tags <code>
       | and <update>, if my own source code contains those tags I expect
       | it may get confused.
        
         | bhaktatejas922 wrote:
         | Wow that was fast - this is awesome. it shouldnt be a problem
         | unless your code has both <code> and <update> internally. 1 or
         | the other should be fine
        
           | nailer wrote:
           | > it shouldnt be a problem unless your code has both <code>
           | and <update> internally. 1 or the other should be fine
           | 
           | That is a horrifying answer.
        
       | seanw265 wrote:
       | Last time I looked into Morph, I noticed you weren't yet on
       | OpenRouter. I see that's changed, but it looks like only an older
       | model is listed. Any plans to be more active there?
       | 
       | Also, are there any benchmarks comparing your fast apply models
       | to others like Relace or even Llama via Cerebras? I'm
       | particularly interested in output accuracy.
        
         | bhaktatejas922 wrote:
         | the v2 model listed currently points to morph-v3-large. We're
         | working with them to get v3-large and v3-fast listed
        
         | bhaktatejas922 wrote:
         | the power of hacker news! New models are listed there now
        
       | bijection wrote:
       | How does this compare to relace, which I believe is also a YC
       | company? They seem to have very similar functionality [0]
       | 
       | [0] https://www.relace.ai/
        
       | Workaccount2 wrote:
       | Just for clarification here because I am a bit confused,
       | 
       | Morph is a tool for _integrating_ the output of other LLMs and
       | not an LLM itself? It doesn 't generate 4500 tok/sec, it can edit
       | 4500 tok/sec?
        
         | bhaktatejas922 wrote:
         | Correct, but morph is a LLM as well. In practice its basically
         | Big LLM using small LLM as a tool call
        
           | Workaccount2 wrote:
           | I see. How is this not going to get run over immediately by
           | big players? Google's diffusion model is already in the
           | wings, and it's both wicked fast and ~flash-lite intelligent.
        
             | bhaktatejas922 wrote:
             | you could make the argument about any startup really. To me
             | its the same reason they don't build the foundational model
             | for legal, for sales, etc.. - everything comes at a cost.
             | Allocating researcher time to this is attention not spent
             | on the general frontier model - losing 1-2% there is the
             | difference of billions of dollars for them
        
             | nailer wrote:
             | Google's a great tech organization but they generally don't
             | create dominant tech products like they used to back in the
             | Maps / Mail days (this is nearly two decades ago).
             | 
             | Google wrote AKYNIA. OpenAI wrote ChatGPT.
        
       | eabeezxjc wrote:
       | why not ruby?
       | 
       | because ruby no need corecting. It works.
        
       | nico wrote:
       | Would be awesome to have a browser extension that could create a
       | bridge between ChatGPT and VSCode, applying Morph in between (or
       | Claude instead of ChatGPT). Essentially use the web interface,
       | instead of the APIs for agentic coding
        
         | bhaktatejas922 wrote:
         | I think an MCP would do the job. We're shipping one out as we
         | speak
        
           | sidgarimella wrote:
           | +1 hyped for an mcp that I might be able to plug zed into
        
       | elzbardico wrote:
       | 1) Raw inference speed matters more than incremental accuracy
       | gains for dev UX--agree or disagree?
       | 
       | Yeah, I love reviewing and debugging thousands of lines of buggy
       | and dirty AI generated code. Who cannot love it?
        
         | bhaktatejas922 wrote:
         | key word incremental - for fast apply to be useful it should be
         | so fast and accurate that most people don't realize there's a
         | model there at all
        
       | callamdelaney wrote:
       | Yeah sounds like exactly what we need
        
       | laborcontract wrote:
       | Really like this. I've been trying microsoft's copilot and it's
       | so clunky, particularly when applying edits. One would assume
       | they have the resources to train the model..
       | 
       | Request: please provide a system prompt in the docs to help the
       | llm generate the diff format that performs best w/ your models.
       | LLMs frequently change the way they present diffs on upgrades and
       | I don't want to be guessing which format is best.
       | 
       | EDIT: Please clarify your privacy policy. If my interpretation is
       | correct, paying users will have their data retained and trained
       | on? Is there any way to pay to use the service (w/o picking up
       | the phone) and not have my data trained on?                 4.1
       | Use of Service Data            Depending on your subscription
       | tier:            Free Tier: We may use your submitted code data
       | to train our models, improve our Services, and develop new
       | features.       Engineer Tier: We may use your submitted code
       | data to train our models, improve our Services, and develop new
       | features, subject to the confidentiality provisions in your
       | service agreement.       Enterprise Tier: We do not use your
       | submitted code data for any purpose other than processing your
       | immediate request. Your code data is never used for model
       | training or service improvement.
       | 
       | [0] https://morphllm.com/privacy
        
         | bhaktatejas922 wrote:
         | done! Yeah we have ZDR options as well, just email us to enable
         | it info@morphllm.com
         | 
         | Morph via OpenRouter is always zero data retention
        
       | lastdong wrote:
       | Is this similar to Gemini Diffusion? Thanks
        
         | bhaktatejas922 wrote:
         | No, we use autoregressive llms. Diffusion models would be super
         | interesting here. Mercury is doing some interesting work with
         | diffusion in code gen but still too early to tell if it'll get
         | good enough for production usage
        
       | scottpersinger wrote:
       | I'd just like to put a pitch in here for someone to do "smart
       | rebase+merge" with AI. Now THAT would really speed up
       | development, if my AI was intelligently merging code from
       | different users in the background, based on understanding the
       | intent behind each conflicting change.
        
         | bhaktatejas922 wrote:
         | how often do you run into merge conflicts?
        
       | FridgeSeal wrote:
       | > Raw inference speed matters more than incremental accuracy
       | gains for dev UX
       | 
       | Now I can be wrong, faster!
        
       | z3ugma wrote:
       | How do I start using this on a codebase on my local computer? I'm
       | quite confused by the quickstart. Do I use a VSCode extension?
       | One of the Claude Code like clones but with this as a custom
       | model?
        
       | michaelneale wrote:
       | Have been using morph for a while (I am one of the authors of
       | goose) and was surprised when introduced at the boost it gave me
       | (much less iteration with the main expensive LLM, and I can even
       | make the editing process simpler to take a load off the agent).
       | Used it with claude 3.5, 3.7, 4 and currently with a o3/openai
       | and anthropic/claude4 + morphllm combo today.
        
       | orge wrote:
       | It would be great to have an integration with Aider or OpenCode.
        
       ___________________________________________________________________
       (page generated 2025-07-07 23:00 UTC)