hngopher.com

       [HN Gopher] Launch HN: Codeparrot (YC W23) - Automated API testi...
       ___________________________________________________________________
        
       Launch HN: Codeparrot (YC W23) - Automated API testing using
       production traffic
        
       Hi HN, we're Royal and Vedant, co-founders of CodeParrot
       (https://www.codeparrot.ai/). CodeParrot automates API testing so
       developers can speed up release cycles and increase test coverage.
       It captures production traffic and database state to generate test
       cases that update with every release.  Here's a short video that
       shows how it works:
       https://www.loom.com/share/dd6c12e23ceb43f587814a2fbc165c1f .  As
       managers of engineering teams (I was CTO of an ed-tech startup,
       Vedant was the founding engineer of a unicorn company) both of us
       faced challenges in enforcing high test coverage. We ended up
       relying a lot on manual testing but it became hard to scale, and
       led to reduced velocity and higher production bugs. This motivated
       us to build CodeParrot.  How it works: we auto-instrument backend
       services to capture production traffic. Requests and responses
       coming to your backend service, as well as the downstream calls
       made by it like DB calls are stored. As part of your CI pipeline,
       we replay the captured requests whenever your service is updated.
       The responses are compared with the responses from production env
       and regressions are highlighted to the developers. To ensure that
       the same codebase gives the same response in CI environment and
       production, we mock all downstream calls with the values from
       production.  Most tools to record and replay production traffic for
       the purpose of testing capture traffic on the network layer (as
       sidecar or through load balancer), CodeParrot instead relies on an
       instrumentation agent (built on top of OpenTelemetry) to capture
       traffic, enabling us to capture downstream request/response like
       database responses which are otherwise encrypted on network layer.
       This helps us mock downstream calls and compare the response from
       CI environment vs production environment. Additionally, this helps
       us sample requests based on code flow and downstream responses
       which provide better test coverage compared to just relying on API
       headers & parameters.  Our self-serve product will be out in a few
       weeks. Meanwhile, we can help you integrate CodeParrot, please
       reach out at royal@codeparrot.ai or you can choose a slot here -
       https://tidycal.com/royal1/schedule-demo. We'll be selling
       CodeParrot via a subscription model but the details are TBD. In
       addition, we will be open sourcing the project soon.  If you've
       already tried or are thinking of using tools in this space, we'd
       love to hear your experience and what you care about most. We look
       forward to everyone's comments!
        
       Author : royal0203
       Score  : 44 points
       Date   : 2023-03-17 18:35 UTC (4 hours ago)
        
       | poobags wrote:
       | Are the requests being only replayed, or is there some amount of
       | mutation going on too, to potentially reach buggy states?
       | 
       | If the latter, I'm wondering how it might compare to tooling with
       | a similar intent, e.g. https://www.microsoft.com/en-
       | us/research/publication/restler...?
        
       | shireboy wrote:
       | I just this week did something similar for a line of business app
       | API I'm migrating. I added code to the API to record all unique
       | requests to a .http/.rest format file (natively supported in
       | vlatest Visual Studio 22 and via extension in VS Code). I can
       | then play those back manually or via automated integration tests
       | that read the .http files. Yes, I'm hand-waving over
       | authentication and downstream database and 3rd party APIs, but
       | overall it's working well to quickly test against tons of
       | production-like API calls.
        
         | vedant_ag wrote:
         | Wow very interesting!
         | 
         | Did you manage to play it back manually and/or via tests? I ran
         | into unexpected challenges while doing this.
        
       | eganist wrote:
       | Neat concept. In regulated environments, how would you propose
       | implementing this to minimize the spread of live, regulated data
       | into non-prod environments? think full PANs (PCI) etc.
       | 
       | I don't know that there's necessarily a wrong answer here (well,
       | there probably are, but wrong only in the sense that a given
       | solution might be prohibited by the regulation), just want to see
       | how y'all have thought through the prompt.
        
         | lttlrck wrote:
         | This seems like the burning question, maybe that's the .ai...
         | 
         | HAR can already be recorded in middleware (e.g. loadmill/har-
         | recorder) and replayed in multiple CI compatible ways.
        
       | vaishnavsm wrote:
       | Oh wow, we're building something very similar to this. We were
       | literally just filling up the YC form for S23 too haha. I may be
       | a bit biased, but I think this is an excellent idea, and it can
       | change the way testing works for backend services. We were
       | inspired by Meticulous, too! Hope we can learn from y'all as well
       | :)
        
         | royal0203 wrote:
         | Nice! Happy to share our experience if it helps :)
        
       | keeptrying wrote:
       | Nothing gives you confidence in your system than testing on high
       | variance sample of last 4 months of production traffic.
       | 
       | Especially for rewrites or code refactorings.
       | 
       | I've built this system at many companies myself. Never thought of
       | doing this as a service for others.
        
         | roflyear wrote:
         | Do you issue new sessions and modify the requests?
        
       | nfw2 wrote:
       | I worked on something like this briefly years ago and caching and
       | mocking the downstream calls was a beast of a technical problem.
       | 
       | For example, suppose a startup has an Express app that made
       | downstream requests to Postgres, Redis, Kafka, other REST APIs,
       | etc. How do the outgoing requests, which may follow different
       | protocols, with varying serialization formats and handshakes, all
       | get intercepted, recorded, and matched to the same outgoing calls
       | when the session is replayed? How are the outgoing requests
       | indexed and then evaluated for sameness when replayed?
       | 
       | It definitely seems possible to implement something like this as
       | a series of high-level one-off integrations, for example, a
       | middleware package for Django apps. But if the goal is a
       | universal and automatic downstream service mocking system, it's
       | not clear to me where this sort of middleware would even sit in
       | the application stack. It seems like it would need to be pretty
       | low-level, but not too low-level because the data needs to be
       | decrypted first.
       | 
       | Anyway, if you guys have figured out a good way to do this, I'm
       | definitely interested in hearing how you managed it. I'm in no
       | way a great BE engineer, so there could be something elegant or
       | obvious that I missed.
        
         | Aeolun wrote:
         | I think the fact they use opentelemetry figures into this.
         | They've already done the hard work of intercepting calls to
         | many libraries.
        
           | royal0203 wrote:
           | Yes, we rely a lot on openTelemetry for this. They have
           | really good support for most libraries in Java, node and are
           | progressing quickly in others. We are also contributing to it
           | by extending support for other languages, which we'll be open
           | sourcing soon.
        
       | rkhacker wrote:
       | Few missing details that are crucial to usage within an
       | organization:
       | 
       | 1. what is the type of service instrumentation needed to capture
       | the data? Wonder why this is needed when typically the data is
       | already captured in an APM log? The instrumentation might add
       | performance and security concerns.
       | 
       | 2. what is the sampling logic to capture the traffic? It might
       | compromise the fidelity of the test data and give a false sense
       | of test accuracy.
       | 
       | 3. what is the duration of data capture? Is it a week's or
       | month's or quarterly data? Meeting 90% coverage on a week's
       | production sample data will provide a false metric.
       | 
       | 4. can it faithfully handle data privacy and customer
       | anonymization? This is critical for API's dealing with PCI and
       | other sensitive data.
        
         | vedant_ag wrote:
         | > 1. what is the type of service instrumentation needed to
         | capture the data? Wonder why this is needed when typically the
         | data is already captured in an APM log? The instrumentation
         | might add performance and security concerns.
         | 
         | Implementation is very similar to an APM log. So the same
         | performance and security concerns apply. We are working on
         | giving both at the same time (Automated tests, and APM), to
         | reduce overhead.
         | 
         | > 2. what is the sampling logic to capture the traffic? It
         | might compromise the fidelity of the test data and give a false
         | sense of test accuracy.
         | 
         | It is random sampling. I feel, 1M or 10M randomly sampled
         | requests should cover all cases.
         | 
         | > 3. what is the duration of data capture? Is it a week's or
         | month's or quarterly data? Meeting 90% coverage on a week's
         | production sample data will provide a false metric.
         | 
         | I was thinking 1 week should be enough. Maybe we will have to
         | add some custom sampling logic for lesser frequency calls (like
         | monthly crons).
         | 
         | > 4. can it faithfully handle data privacy and customer
         | anonymization? This is critical for API's dealing with PCI and
         | other sensitive data.
         | 
         | Yes. Additionally, for compliance, we offer a self-hosted
         | solution- Our code runs on your servers and no data ever leaves
         | your cloud / on-prem.
        
       | debarshri wrote:
       | If you are looking at this. You might be also be interested in
       | Speedscale [1]. They have been around for a while. Interesting
       | thing is that it is a YC company too.
       | 
       | [1] https://speedscale.com/
        
         | royal0203 wrote:
         | Yes, and Ken from speedscale is a very helpful person too.
        
       | pixelatedindex wrote:
       | Neat idea! Two nitpicks from reading the website:
       | 
       | > How does CodeParrot work
       | 
       | This part without a question mark is making me feel like I'm left
       | hanging. If you don't want a question mark, I would much prefer
       | this to be "How CodeParrot works".
       | 
       | > CodeParrot ... What's more? ...
       | 
       | There's no question mark where there should be one, and where I
       | don't expect one I do see one. Here it can be a simple comma
       | instead of a question mark. This flow of language works if you're
       | speaking live to an audience but on paper it feels awkward.
       | 
       | It bothers me more than it should, mostly because of my reading
       | cadence. Hopefully others can chime in and let me know if I'm off
       | base here or not.
        
         | royal0203 wrote:
         | Nice catch, thanks for taking the time to review the website!
         | Have updated it, should reflect in sometime.
        
       | YPCrumble wrote:
       | How is this different from Meticulous, another YC company?
        
         | royal0203 wrote:
         | it's similar in the sense that both rely on production traffic
         | and user sessions to generate tests. However, we are focusing
         | on API testing and I think Meticulous is building for UI
         | testing.
        
       | kyriakosel wrote:
       | Nice one guys! Congrats on the launch!
        
       | bluelightning2k wrote:
       | I built something like this years ago. It's just on the edge of
       | roll-your-own-solution.
       | 
       | I definitely agree with your approach of auto mocking the
       | database and/or third party services too. That's what I did with
       | my home rolled solution.
       | 
       | Happy to see more projects in the space. Generating tests by
       | snapshot including the DB and service calls always seemed like
       | the obvious way to go for me.
       | 
       | As a dev I'm not keen on using services which I could easily
       | replicate so this would have to be both substantial and
       | cheap/free for smaller teams.
        
         | royal0203 wrote:
         | I can relate to this perspective, however, some complexities we
         | have come across in building this so far:
         | 
         | - Support for high number of languages, downstream dependencies
         | - Intelligent sampling to choose requests with high coverage
         | and auto update them over time - Performance, safety and data
         | compliance guarantees
        
       | rishsriv wrote:
       | Looks promising! Would love to know how this is different from
       | creating postman collections, and running tests on those
        
         | vedant_ag wrote:
         | - Postman collections have to be created manually by a
         | developer (usually).
         | 
         | - Downstream calls (think DB, 3rd party API calls, kafka) are
         | not handled.
        
         | [deleted]
        
       | Jarobq18 wrote:
       | Isn't your tech fragile? Feels like it'd break/provide false
       | positives easily. Integration is also not easy, the matrix of
       | possible tech stacks is big. It doesn't test new edge cases, so
       | it's useful only for regression, which is important but if your
       | testing hygiene is good, which means devs are writing tests, you
       | wouldn't have the need for your product, am i wrong? Good luck
       | either way. Seems like VCs are gambling at products that save
       | costs.
        
         | royal0203 wrote:
         | Good observation, it's challenging to solve these problems,
         | here's how we are going about it -
         | 
         | To reduce false positives - we run the same request twice to
         | eliminate flaky fields in response like current timestamp, mock
         | the downstream dependencies as they behaved in prod env and are
         | providing options to ignore / modify the sampled requests
         | 
         | To make integration easier - we are building on top of
         | opentelemetry which has seen remarkable increase in support
         | across languages / frameworks, which makes it easier for us to
         | support different tech stacks.
         | 
         | Regression testing - our primary goal is to provide regression
         | tests. We have come across two type of teams where this makes
         | sense - companies with low test coverage and companies which
         | high number of micro-services as they find it hard to cover
         | every production scenario in tests
        
       | a_c wrote:
       | Congratulations on the launch!
       | 
       | Quick question, how do you (plan to) deal with schema/API change?
       | Or the tool is more intended for regression testing?
        
         | vedant_ag wrote:
         | New API tests can be generated by enabling the agent locally
         | and/or on a staging env.
         | 
         | Its simply a matter of using a baseline of: `production`
         | (usually for regression tests), `staging` or `local`.
        
       ___________________________________________________________________
       (page generated 2023-03-17 23:00 UTC)