[HN Gopher] Show HN: Plandex - an AI coding engine for complex t...
       ___________________________________________________________________
        
       Show HN: Plandex - an AI coding engine for complex tasks
        
       Hey HN, I'm building Plandex (https://plandex.ai), an open source,
       terminal-based AI coding engine for complex tasks.  I built Plandex
       because I was tired of copying and pasting code back and forth
       between ChatGPT and my projects. It can complete tasks that span
       multiple files and require many steps. It uses the OpenAI API with
       your API key (support for other models, including Claude, Gemini,
       and open source models is on the roadmap). You can watch a 2 minute
       demo here: https://player.vimeo.com/video/926634577  Here's a
       prompt I used to build the AWS infrastructure for Plandex Cloud
       (Plandex can be self-hosted or cloud-hosted):
       https://github.com/plandex-ai/plandex/blob/main/test/test_pr...
       Something I think sets Plandex apart is a focus on working around
       bad outputs and iterating on tasks systematically. It's relatively
       easy to make a great looking demo for any tool, but the day-to-day
       of working with it has a lot more to do with how it handles edge
       cases and failures. Plandex tries to tighten the feedback loop
       between developer and LLM:  - Every aspect of a Plandex plan is
       version-controlled, from the context to the conversation itself to
       model settings. As soon as things start to go off the rails, you
       can use the `plandex rewind` command to back up and add more
       context or iterate on the prompt. Git-style branches allow you to
       test and compare multiple approaches.  - As a plan proceeds,
       tentative updates are accumulated in a protected sandbox (also
       version-controlled), preventing any wayward edits to your project
       files.  - The `plandex changes` command opens a diff review TUI
       that lets you review pending changes side-by-side like the GitHub
       PR review UI. Just hit the 'r' key to reject any change that
       doesn't look right. Once you're satisfied, either press ctrl+a from
       the changes TUI or run `plandex apply` to apply the changes.  - If
       you work on files you've loaded into context outside of Plandex,
       your changes are pulled in automatically so that the model always
       uses the latest state of your project.  Plandex makes it easy to
       load files and directories in the terminal. You can load multiple
       paths:                 plandex load components/some-component.ts
       lib/api.ts ../sibling-dir/another-file.ts       You can load entire
       directories recursively:                 plandex load src/lib -r
       You can use glob patterns:                 plandex load
       src/**/*.{ts,tsx}       You can load directory layouts (file names
       only):                 plandex load src --tree        Text content
       of urls:                 plandex load
       https://react.dev/reference/react/hooks       Or pipe data in:
       cargo test | plandex load       For sending prompts, you can pass
       in a file:                 plandex tell -f "prompts/stripe/add-
       webhooks.txt"       Or you can pop up vim and write your prompt
       there:                 plandex tell       For shorter prompts you
       can pass them inline:                 plandex tell "set the
       header's background to #222 and text to white"       You can run
       tasks in the background:                 plandex tell "write tests
       for all functions in lib/math/math.go. put them in lib/math_tests."
       --bg        You can list all running or recently finished tasks:
       plandex ps       And connect to any running task to start streaming
       it:                 plandex connect       For more details, here's
       a quick overview of commands and functionality:
       https://github.com/plandex-ai/plandex/blob/main/guides/USAGE...
       Plandex is written in Go and is statically compiled, so it runs
       from a single small binary with no dependencies on any package
       managers or language runtimes. There's a 1-line quick install:
       curl -sL https://plandex.ai/install.sh | bash       It's early
       days, but Plandex is working well and is legitimately the tool I
       reach for first when I want to do something that is too large or
       complex for ChatGPT or GH Copilot. I would love to get your
       feedback. Feel free to hop into the Discord
       (https://discord.gg/plandex-ai) and let me know how it goes. PRs
       are also welcome!
        
       Author : danenania
       Score  : 97 points
       Date   : 2024-04-03 15:10 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lprubin wrote:
       | Looks interesting. Can you go into more detail about why you like
       | this better for large/complex tasks compared to GH Copilot?
        
         | Cieric wrote:
         | Not the author, but I'm in a discord with him, I believe the
         | main selling point here is that it allows you to manage your
         | updates and conversations in a branching pattern that's saved.
         | So if you can't get the AI to do something you can always
         | revert to a prior state and try a different method.
         | 
         | Also it doesn't work on a "small view of the world" like
         | Copilot from when I was using it could only insert code around
         | your cursor (I understand that copilot pulls in a lot of
         | context from all the files you have open, but the area it can
         | modify is really small). This can add/remove/update code in
         | multiple files at once. But it'll also just show you a diff
         | first before it applies and you can select some or all of the
         | changes made.
        
           | danenania wrote:
           | Yes, couldn't have said it better myself!
        
       | BirbSingularity wrote:
       | It's pretty annoying that every project like this lately is just
       | a wrapper for OpenAI API calls.
        
         | danenania wrote:
         | Supporting more models, including Claude, Gemini, and open
         | source models is definitely at the top of the roadmap. Would
         | that make it less annoying? :)
        
           | codeapprove wrote:
           | Not affiliated with the project but you could use something
           | like OpenRouter to give users a massive list of models to
           | choose from with fairly minimal effort
           | 
           | https://openrouter.ai/
        
             | danenania wrote:
             | Thanks, I need to spend some time digging into OpenRouter.
             | The main requirement would be reliable function calling and
             | JSON, since Plandex relies heavily on that. I'm also
             | expecting to need some model-specific prompts, considering
             | how much prompt iteration was needed to get things behaving
             | how I wanted on OpenAI.
             | 
             | I've also looked at Together (https://www.together.ai/) for
             | this purpose. Can anyone speak to the differences between
             | OpenRouter and Together?
        
               | kekebo wrote:
               | I can't speak to the differences of Openrouter to
               | Together but the Openrouter endpoint should work as a
               | drop-in replacement for OpenAI api calls after replacing
               | the endpoint url and the value of $OPENAI_API_KEY. The
               | model names may differ to other apis but everything else
               | should work the same.
        
               | danenania wrote:
               | Awesome, looking forward to trying it out.
        
             | j45 wrote:
             | Would love to hear any feedback from people who have gotten
             | to know OpenRouter, as well as any similar tools.
        
           | npace12 wrote:
           | I think Mistral-2-Pro would work really well for this,
           | judging by the great results I've had with it on another
           | heavy on tool calling project [1]
           | 
           | [1] https://github.com/radareorg/r2ai
        
             | danenania wrote:
             | Thanks, I'll give it a try. Plandex's model settings are
             | version-controlled like everything else and play well with
             | branches, so it will be fun to start comparing how all
             | different kinds of models do vs. each other on longer
             | coding tasks using a branch for each one.
        
               | p1esk wrote:
               | For challenging tasks, I typically get code outputs from
               | all three top models (gpt4, opus, and ultra), and pick
               | the best one. It would be nice if your tool could simply
               | this for me: run all three models and perhaps even
               | facilitate some type of model interaction to produce a
               | better outcome.
        
               | danenania wrote:
               | Definitely, I'm very interested in doing something along
               | these lines.
        
           | mritchie712 wrote:
           | https://github.com/ollama/ollama
        
         | CharlieDigital wrote:
         | OpenAI API is simply a utility. The question is given this
         | utility, how does one find the right use case, structure the
         | correct context, and build the right UX.
         | 
         | OP has certainly built something interesting here and added
         | significant value on top of the base utility of the OpenAI API.
        
         | _ink_ wrote:
         | But the open source models have Open AI compatible APIs, so as
         | long as you can set the API endpoint you can use whatever you
         | want.
        
         | aerhardt wrote:
         | I'm moving an inordinate amount of data between the ChatGPT
         | browser window and my IDE (a lot through copying and pasting)
         | and this demonstrates two things: 1) ChatGPT is incredibly
         | useful to me and 2) the worflow UX is still terrible. I think
         | there is room for building innovative UXs with OpenAI, and so
         | far what I've seen in Jetbrains and VSCode isn't it...
        
           | danenania wrote:
           | That was also my experience and thought process.
        
         | bottlepalm wrote:
         | Every program is a wrapper around a CPU, so annoying.
        
       | jayloofah wrote:
       | What is the cost of planning and working through, let's say, a
       | manageable issue in a repo? Does it make sense to use 3.5/Sonnet
       | or some lower cost endpoint for these tasks?
        
         | danenania wrote:
         | It's hard to put a precise number on it because it depends on
         | exactly how much context is loaded, how many model responses
         | the task needs to finish, and how much iteration you need to do
         | in order to get the results you're looking for.
         | 
         | That said, you can do quite a meaty task for well under $1. If
         | you're using it heavily it can start to add up over time, so
         | you'd just need to weigh that cost against how you value your
         | time I suppose. In the future I do hope to incorporate fine
         | tuned models that should bring the cost down, as well as other
         | model options like I mentioned in the post.
         | 
         | You can try different models and model settings with `plandex
         | set-model` and see how you go. But in my experience gpt-4 is
         | really the minimum bar for getting usable results.
        
       | asadalt wrote:
       | In demo it modified UI components, is there any model that can
       | look at the rendered page to see if it looks right? Right now all
       | these wrappers just blindly edit the code.
        
         | danenania wrote:
         | Plandex can't do this yet, but soon I want to add GPT4-vision
         | (and other multi-modal models) as model options, which will
         | enable this kind of workflow.
        
           | asadalt wrote:
           | Well I have built similar project that lives in github
           | action, communicates via issues and sends PR when done.
           | 
           | 4-vision isn't there yet. It can mostly OCR or pattern
           | recognize the image if it's popular or has some known object.
           | It cannot detect pixel differences or css/alignment issues.
        
         | razster wrote:
         | I paired mine with VSCode and used the live view addon for that
         | folder. So far so good.
        
       | aksyam wrote:
       | Love this. Super excited AI-SWEs, will give it a try.
        
       | rglover wrote:
       | This is something I've been thinking a lot about (a way to set
       | context for an LLM against my own code), thank you for putting
       | this out. Looks really polished.
        
         | danenania wrote:
         | Thanks! Please let me know how it goes for you if you try it :)
        
       | j45 wrote:
       | Congrats on the launch.
        
         | danenania wrote:
         | Thank you!
        
       | mbil wrote:
       | Looks really interesting. Is it wrapping git for the rollback and
       | diffing stuff? If I were a user I'd probably opt to use git
       | directly for that sort of thing.
        
         | danenania wrote:
         | Yes, it does use git underneath, with the idea of exposing a
         | very simple subset of git functionality to the user. There's
         | also some locking and transaction logic involved to ensure
         | integrity and thread safety, so it wouldn't really be
         | straightforward to expose the repo directly.
         | 
         | I tried to build the backend so that postgres, the file system,
         | and git would combine to form effectively a single
         | transactional database.
        
       | brap wrote:
       | This seems very interesting, but I think the interface choice is
       | not good. There would have been much less friction if this was
       | purely a GitHub/GitLab/etc bot.
        
         | vertis wrote:
         | I disagree, having used Sweep extensively, I've found the
         | GitHub Issue -> PR flow to be incredibly clunky with a lack of
         | ability to see what is happening and what has gone wrong.
        
         | danenania wrote:
         | I see where you're coming from and I do plan to add a web UI
         | and plugin/integration options in the future.
         | 
         | I personally wanted something with a tighter feedback loop that
         | felt more akin to git. I also thought that simplifying the UI
         | side would help me stay focused on getting the data structures
         | and basic mechanics right in the initial version. But now that
         | the core functionality is in place, I think it will work well
         | as a base for additional frontends.
        
           | ENGNR wrote:
           | I haven't tried it yet, but I think making it fast iteration
           | and simple initially is the right way to go. Nice one sharing
           | this as open source!
        
       | ldelossa wrote:
       | Show me one of these things do something more complex then a
       | front end intern project.
        
         | danenania wrote:
         | Here's a prompt I used to build the AWS infrastructure for
         | Plandex Cloud with Plandex: https://github.com/plandex-
         | ai/plandex/blob/main/test/test_pr...
        
           | chmod2 wrote:
           | It's not something I would consider a complex job. A simple
           | prompt to chatgpt could even produce a working CDK template.
        
             | danenania wrote:
             | Here's another one, for the backend of a Stripe billing
             | system: https://github.com/plandex-
             | ai/plandex/blob/main/test/test_pr...
             | 
             | It seems like more examples demonstrating relatively
             | complex tasks would be helpful, so I'll work on those.
             | 
             | I'm certainly not trying to claim that it can handle _any_
             | task. The underlying model 's intelligence and context size
             | do place limits on what it can do. And it can definitely
             | struggle with code that uses a lot of abstraction or
             | indirection. But I've also been amazed by what it _can_
             | accomplish on many occasions.
        
         | IshKebab wrote:
         | I agree, these things seem to do okish on trivial web projects.
         | I've never seen them do anything more than that.
         | 
         | I still use ChatGPT for some coding tasks, e.g. I asked it to
         | write C code to do some annoying fork/execve stuff (can't
         | remember the details) and it did a decentish job, but it's like
         | 90% right. Great for figuring out a rough shape and what
         | functions to search for, but you definitely can't just take the
         | code and expect it to work.
         | 
         | Same when I asked it to write a device driver for some simple
         | peripheral. It had the shape of an answer but with random
         | hallucinated numbers.
         | 
         | I've also noticed that because there is a ton of noob-level
         | code on the internet it will tend to do noob-level things too,
         | like for the device driver it inserted fixed delays to wait for
         | the device to perform an operation rather than monitoring for
         | when it had actually finished.
         | 
         | I wonder if coding AIs would benefit from fine tuning on
         | programming best practices so they don't copy beginner
         | mistakes.
        
           | danenania wrote:
           | I used a web project in the demo because I figured it would
           | be familiar to a wide range of developers, but actually many
           | nontrivial pieces of Plandex have been built with the help of
           | Plandex itself.
           | 
           | That's not to say it's perfect or will never make "noob-
           | level" mistakes. That can definitely happen and is ultimately
           | a function of the underlying model's intelligence. But I can
           | at least assure you that it's quite capable of going far
           | beyond a trivial web project.
           | 
           | It's also on me to show more indepth examples, so thanks for
           | calling it out. I'd love it if you would try some of the
           | projects you mention and let me know how it goes.
        
       | visarga wrote:
       | This approach works. I just built a SPA in 3 days with GPT-4 of
       | which about 50% was generated. My only tooling was a bash script
       | to list all the files in the repo (with some exceptions),
       | including a README.md planning the project, a file list, and at
       | the end I type my task.
       | 
       | I run about 10-15 rounds with it. At the beginning I was using
       | GPT more heavily, but in the middle I found it easier to just fix
       | the code myself. The context got as big as 10k tokens, but was
       | not a problem. At some point I might need to filter the files
       | more aggressively.
       | 
       | But surprisingly all that is needed for a bare-bone repo-level
       | coding assistant is a script to list all the files so I could
       | easily copy paste the whole thing into the chatGPT window.
        
         | danenania wrote:
         | Yes, well said. Doing exactly this kind of thing for months
         | with ChatGPT is what convinced me the idea could work in the
         | first place. I knew the underlying intelligence was there--the
         | challenge is giving it the right prompts and supporting infra.
        
         | nico wrote:
         | > a script to list all the files so I could easily copy paste
         | the whole thing
         | 
         | Just in case you are using a Mac, you can pipe the output of
         | your script to pbcopy so that it goes directly into your
         | clipboard
         | 
         | script.sh | pbcopy
        
         | ugh123 wrote:
         | Do you have any boilerplate part of your prompt you can share?
        
       | timfsu wrote:
       | Congrats on the launch, I'm excited to give it a try. I'm curious
       | how you're having it edit files in place - having built a similar
       | project last summer, I had trouble with reliably getting it to
       | patch files with correct line numbers. It was especially a
       | problem in React files with nested div's.
        
         | danenania wrote:
         | Thanks! I tried _many_ different ways of doing it before
         | settling on the current approach. It 's still not perfect and
         | can make mistakes (which is why the `plandex changes` diff
         | review TUI is essential), but it's pretty good now overall.
         | 
         | I was able to improve reliability of line numbers by using a
         | chain-of-thought approach where, for each change, the model
         | first summarizes what's changing, then outputs code that starts
         | and ends the section in the original file, and then finally
         | identifies the line numbers from there.
         | 
         | The relevant prompts are here: https://github.com/plandex-
         | ai/plandex/blob/main/app/server/m...
        
           | nico wrote:
           | Amazing work. Loved the video and looking forward to trying
           | it
           | 
           | Can a user ask plandex to modify a commit? Maybe the commit
           | just needs a small change, but doesn't need to be entirely
           | re-written. Can the scope be reduced on the spot to focus
           | only on a commit?
        
             | danenania wrote:
             | Thanks! There isn't anything built-in to specifically
             | modify a commit, but you could make the modification to the
             | file with Plandex and then `git commit --amend` for
             | basically the same effect.
        
       | ahstilde wrote:
       | this looks neat i can't wait to try it out.
        
       | splatzone wrote:
       | This is really cool. I tried it and ran into a few syntax errors
       | - it kept missing closing braces in PHP for some reason.
       | 
       | It seems it might be useful if it could actually try to execute
       | the code, or somehow check for syntax errors/unimplemented
       | functions before accepting the response from the LLM.
        
         | danenania wrote:
         | Thanks! Was this on cloud or self-hosted? If cloud and you
         | created an account, feel free to ping me on Discord
         | (https://discord.gg/plandex-ai) or by email (dane@plandex.ai)
         | and let me know your account email so I can investigate. If you
         | have an anonymous trial account on cloud, please still ping me
         | --I can track it down based on file names. There is definitely
         | some work to do in ironing out these kinds of edge cases.
         | 
         | "It seems it might be useful if it could actually try to
         | execute the code, or somehow check for syntax
         | errors/unimplemented functions before accepting the response
         | from the LLM."
         | 
         | Indeed, I do have some ideas on how to add this.
        
       ___________________________________________________________________
       (page generated 2024-04-03 23:00 UTC)