hngopher.com

       [HN Gopher] Show HN: Ellipsis - Automated PR reviews and bug fixes
       ___________________________________________________________________
        
       Show HN: Ellipsis - Automated PR reviews and bug fixes
        
       Hi HN, hunterbrooks and nbrad here from Ellipsis
       (https://www.ellipsis.dev). Ellipsis automatically reviews your PRs
       when opened and on each new commit. If you tag @ellipsis-dev in a
       comment, it can make changes to the PR (via direct commit or side
       PR) and answer questions, just like a human.  Demo video:
       https://www.youtube.com/watch?v=X61NGZpaNQA  So far, we have dozens
       of open source projects and companies using Ellipsis. We seem to
       have landed in a kind of sweet spot where there's a good match
       between the current capabilities of AI tools and the actual needs
       of software engineers - this doesn't replace human review, but it
       saves you time by catching/fixing lots of small silly stuff.
       Here's an example in the wild: https://github.com/relari-
       ai/continuous-eval/pull/38. Ellipsis (1) adds a PR summary; (2)
       finds a bug and adds a review comment; (3) after a (human) user
       comments, generates a side PR with the fix; and (4) after a (human)
       user merges the side PR and adds another commit, re-reviews the PR
       and approves it  Here's another example: https://github.com/SciPhi-
       AI/R2R/pull/350#pullrequestreview-..., where Ellipsis adds several
       comments with inline suggestions that were directly merged by the
       developer.  You can configure Ellipsis in natural language to
       enforce custom rules, style guides, or conventions. For example,
       here's how the `jxnl/instructor` repo uses natural language rules
       to make sure that docs are kept in sync:
       https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L...,
       and here's an example PR that Ellipsis came up with based on those
       rules: https://github.com/jxnl/instructor/pull/346.  Installing
       into your repo takes 2 clicks at https://www.ellipsis.dev. You do
       have to sign up to try it out because we need you to authorize our
       GitHub app to read your code. Don't worry, your code is never
       stored or used to train models
       (https://docs.ellipsis.dev/security).  We'd really appreciate your
       feedback, thoughts, and ideas!
        
       Author : hunterbrooks
       Score  : 52 points
       Date   : 2024-05-09 16:14 UTC (6 hours ago)
        
 (HTM) web link (www.ellipsis.dev)
 (TXT) w3m dump (www.ellipsis.dev)
        
       | bilekas wrote:
       | Interesting project for sure but I'm trying to find the reasons
       | for shifting the AI to the PR stage, wouldn't this be more
       | efficient at development time? I.e copilot/openAI tool chain?
        
         | hunterbrooks wrote:
         | Teams should use both Copilot (synchronous code generation) and
         | Ellipsis (async code gen).
         | 
         | Sure, Copilot speeds up human dev productivity, but our take is
         | that humans should only be spending their time on the highest
         | value code changes and use products like Ellipsis to handle the
         | rest.
         | 
         | The downside of async code gen is that Ellipsis workflows take
         | a few minutes to run because Ellipsis is actually building the
         | project, running the tests, fixing it's mistakes, etc. The
         | upside is that a developer can have multiple workflows running
         | at once, and each workflow delivers higher quality code because
         | it's guaranteed to be working + tested.
         | 
         | I'm super bullish on async code gen. I think there's a whole
         | category of tedious development tasks with unambiguous
         | solutions that can be automated to the point where a human just
         | needs to give it a LGTM.
        
           | mirsadm wrote:
           | Do you use your own product in that way?
        
             | hunterbrooks wrote:
             | Yeah, I use it for lots of boilerplate work like adding new
             | API endpoints, new Celery jobs, and building react
             | components.
        
         | danenania wrote:
         | I think we're going to see different AI tools optimized for
         | different stages of the development workflow, with developers
         | assembling and using a collection of them, just like they
         | currently assemble a collection of tools for different tasks
         | within the stack (backend language, frontend language,
         | database, cache, infrastructure, etc.). It's unlikely that
         | there's ever going to be one AI coding tool to rule them all.
         | 
         | As someone building (and frequently using) an AI coding tool
         | that is very much focused on development time[1], I still also
         | use GH Copilot and ChatGPT plus heavily as well. In a team
         | setting, I could definitely see using my tool in conjunction
         | with Ellipsis too. A feature built partly (or entirely) with AI
         | still needs PR review. And an agent focused specifically on
         | that job is likely going to be better at it than an agent
         | designed for building new features from scratch.
         | 
         | 1 - https://github.com/plandex-ai/plandex
        
           | hunterbrooks wrote:
           | Definitely agree with this.
           | 
           | One analogy I see today is the typical testing pipeline. Unit
           | tests get run locally, then maybe a CI job runs unit tests +
           | integration tests, then maybe there's a deployment job which
           | does a blue/green release. At every stage the system is being
           | "tested", but because the tests validate different
           | capabilities, it's like concentric circles that grow
           | confidence for the change.
           | 
           | A software dev lifecycle that uses AI dev tools is similar.
           | Agents will review/contribute at the various stages of the
           | SDLC, sometimes with overlap, but mostly additive and
           | building on the output of one another.
        
       | biggoodwolf wrote:
       | What is the value add vs using an LLM agent?
        
         | hunterbrooks wrote:
         | Ellipsis uses BUNCH of LLM agents internally. If you built your
         | own code generation LLM agent you'd need to also build a way to
         | execute the code that the agent writes, which is a bit of an
         | engineering headache.
         | 
         | We handle this, the result is that if you set up a Dockerfile,
         | we promise to return working, tested code:
         | https://docs.ellipsis.dev/code#from-a-pr
        
       | yesbut wrote:
       | Will this enable me to slowly, over time, add a back door without
       | anyone detecting it?
        
         | hunterbrooks wrote:
         | No, but maybe if your widely adopted, poorly supported open
         | source project uses Ellipsis for code reviews we may be able to
         | catch that type of hack ;)
        
           | yesbut wrote:
           | I have my doubts.
        
             | Arch-TK wrote:
             | "This PR appears to add some kind of autotools gibberish to
             | the codebase. Since autotools needs to be regularly fed
             | gibberish in order to continue to live, this is normal and
             | expected. However please note that some gibberish may be
             | malicious.
             | 
             | As an AI code review model, I am unable to advise on
             | whether this autotools gibberish is malicious or not. Human
             | review will be required."
        
             | hunterbrooks wrote:
             | Totally fair - there's a saturation right now of magic AI
             | dev tools. We try to differentiate by not over
             | promising/under delivering and by solving a problem that's
             | closely matched to what today's state of the art LLM's can
             | handle: code review.
             | 
             | But the only really way to figure out if it's useful for
             | your team is to try it. That's why we added a free trial.
        
           | internetter wrote:
           | How could an open source project afford the $20/user/month
           | license fee?
        
             | hunterbrooks wrote:
             | We offer Ellipsis to large open source projects for free.
             | Email us team@ellipsis.dev
             | 
             | I was referencing the recent xz backdoor hack.
        
               | kyawzazaw wrote:
               | Anything similar for hobbyist or student projects?
        
               | hunterbrooks wrote:
               | Hmm... probably, send me an email.
        
       | theamk wrote:
       | That is pretty horrible, on the level of "junior engineer" who
       | has no idea of good industry practices and needs careful code
       | review. I would hate to see the system as presented on any of my
       | projects.
       | 
       | Summary: The point of summary is to tell "why" the change was
       | made and highlight unusual/non-trivial parts.. and examples
       | absolutely fails there. To look at first one:
       | 
       | - Why was "generate" result type updated? Was it customer request
       | or general cleanup or prep for some ongoing work?
       | 
       | - The other 3 points - are they logic fallout of output type
       | update, or are those separate changes? For latter case, you
       | really want to list the changes ("Updated examples to use more
       | recent gpt-4 model" for example)
       | 
       | - What's the point of just saying "updating X in Y" if you don't
       | say how? this is just visual noise duplicating "file changes" tab
       | in the PR.
       | 
       | Suggested changes: those are even worse - like
       | https://github.com/relari-ai/continuous-eval/pull/38#discuss...
       | 
       | - This is an example file and you know where the "dataset" comes
       | from. Why would you have non-serializeable records to begin with?
       | 
       | - This changes semantics from "let programmer know in case of
       | error" to "produce corrupted/truncated data file in case of
       | error" - which generally makes debugging harder and gives people
       | nasty surprises when their file is somehow missing records. Sure,
       | sometimes this is needed, but in that particular file it's all
       | downsides. This should not have been proposed at all.
       | 
       | - Even if you do want the check somehow, it's pretty terrible as
       | written - the message does not include original error text nor
       | bad object nor even line number. What is one supposed to do if
       | they see it?
       | 
       | ----
       | 
       | And I know some people say "you are supposed to review AI-
       | generated content before submitted", but I am also sure many new
       | users will think the kind of crap advice that AI generates is OK.
       | 
       | Ellipsis authors: please stop making open source worse. Buggy
       | patches are worse than no patches, and 1 line hand-written
       | summary is better then AI-generated useless one.
       | 
       | Maintainers: don't install this in your repo if you don't want
       | crap PRs.
        
       | hnz101 wrote:
       | How is this different from coderabbit, codegen and codium's PR
       | agent?
        
         | nbrad wrote:
         | Hi there! I recommend trying them out to see which one you like
         | best :)
        
           | hnz101 wrote:
           | Could you provide more information or elaborate on how
           | Ellipsis is better? I'd appreciate a more detailed
           | explanation.
        
             | hunterbrooks wrote:
             | All those tools do code review, so you'll have to try them
             | out for yourself to see which is the most helpful.
             | 
             | But when it comes to writing code for you, not all those
             | tools actually run your unit tests/linter/compile/etc.
             | Ellipsis will, and it'll use the stdout/stderr to fix it's
             | own mistakes, meaning the commit delivered to you actually
             | compiles/passes CI.
        
       | diwank wrote:
       | We are using ellipsis and sweep both for our open source project
       | and they are quite helpful in their own ways. I think selling
       | them as an automated engineer is a little over the top at the
       | moment but once you get the hang of it they can spot common
       | problems in PRs or do small documentation related stuff quite
       | accurately.
       | 
       | Take a look at this PR for example: https://github.com/julep-
       | ai/julep/pull/311
       | 
       | Ellipsis caught a bunch of things that would have come up only in
       | code review later. It also got a few things wrong but they are
       | easy to ignore. I like it overall, helpful once you get the hang
       | of it although far from a "junior dev".
        
         | hartator wrote:
         | > Take a look at this PR for example: https://github.com/julep-
         | ai/julep/pull/311
         | 
         | I am still confused if vector size should be 1024 or 728 lol.
        
         | runlevel1 wrote:
         | > I think selling them as an automated engineer is a little
         | over the top at the moment
         | 
         | Indeed. Amazon originally advertised CodeGuru as being "like
         | having a distinguished engineer on call, 24x7".[^1] That became
         | a punchline at work for a good while.
         | 
         | I can definitely see the value of a tool that helps identify
         | issues and suggest fixes for stuff beyond your typical linter,
         | though. In theory, getting that stuff out of the way could make
         | for more meaningful human reviews. (Just don't overpromise what
         | it can reasonably do.)
         | 
         | [^1]:
         | https://web.archive.org/web/20191203185853/https://aws.amazo...
        
       | skyfallsin wrote:
       | I've been using Ellipsis for a few months now. I have zero
       | regrets about paying for it now and likely will pay them more in
       | the future as their new features ship.
       | 
       | For a solo engineer like me who's working in multiple codebases
       | across multiple languages, it's excellent as another set of eyes
       | to catch big and small things in a pull request workflow I'm used
       | to (and it has caught more than a few). I'd argue even as a
       | backstop for catching edge cases/screwups that may lead to
       | wasting my time that it's already more than paid for itself.
        
       | benzible wrote:
       | Is there a list of supported languages?
        
         | hunterbrooks wrote:
         | Nearly all languages are supported, but some perform better
         | than others.
         | 
         | JS/TS, Python, Java, C++, Ruby are highly supported.
        
       | GrinningFool wrote:
       | A sampling of PRs looks pretty good code-wise, but the commit
       | messages/descriptions don't. They just summarize the changes done
       | (something that can be gleaned from the diff) but don't give
       | context or rationale around why the changes were necessary.
        
         | hunterbrooks wrote:
         | It's most helpful when a GitHub/Linear issue is linked because
         | the "why" is extracted, and also for larger PR's
        
       | tyrw wrote:
       | We're required to have code review as part of our SOC2 process,
       | and I assume automated agents wouldn't count.
       | 
       | The other end of the spectrum is linting and tests, which catch
       | errors before review.
       | 
       | Does Ellipsis have a role between these two? If so, what is the
       | role?
        
         | hunterbrooks wrote:
         | Ellipsis increases the quality of code a developer brings to
         | their teammates, meaning fewer PR comments, meaning the team
         | ships faster. It's not a replacement for human review. It's
         | most often used by the PR author after a self review but before
         | asking for reviews from human teammates.
         | 
         | Ellipsis will use your existing lint/test commands when making
         | code changes. For example, you can leave a comment on a PR that
         | says "@ellipsis-dev fix the assertion in the failing unit test"
         | and Ellipsis will run the tests, observe the failure, make the
         | code change, confirm the tests pass, lint-fix the code, and
         | push a commit or new PR
        
           | luskira wrote:
           | This is a very interesting usecase
        
           | thih9 wrote:
           | > It's most often used by the PR author after a self review
           | 
           | Why run it as part of a PR then? I'd prefer to run a tool
           | like this before a PR is even open, and ideally on my local
           | machine.
        
       | hartator wrote:
       | Do you have real life examples on GitHub to see?
       | 
       | [Edit] You can see a bunch of them here:
       | https://github.com/search?q=%22ellipsis.dev%22&type=issues
       | Nothing breathtaking unfortunately.
        
         | hunterbrooks wrote:
         | Hmm, that searches issues, which isn't the best way to see
         | Ellipsis' work.
         | 
         | Example of PR review: https://github.com/getzep/zep-
         | js/pull/67#discussion_r1594781...
         | 
         | Example of issue-to-PR:
         | https://github.com/getzep/zep/issues/316
         | 
         | Example of bug fix on a PR:
         | https://github.com/jxnl/instructor/pull/546#discussion_r1544...
        
       ___________________________________________________________________
       (page generated 2024-05-09 23:00 UTC)