[HN Gopher] Launch HN: Corgea (YC S23) - Auto fix vulnerable code
       ___________________________________________________________________
        
       Launch HN: Corgea (YC S23) - Auto fix vulnerable code
        
       Hi HN, I'm the founder of Corgea (https://corgea.com). We help
       companies fix their vulnerable source code using AI.  Originally,
       we started with a data security product that would detect data
       leaks at companies. Despite initial successes and customer
       acquisitions, we frequently heard that highlighting issues wasn't
       enough; customers wanted proactive fixes. They had hundreds (yes
       hundreds!) of security tools alerting them about vulnerabilities,
       but couldn't afford a dedicated team to go through them all and fix
       them. One prospect we spoke to had tens of thousands of reported
       vulnerabilities in their SAST tool. With the rise of AI code
       generation, we saw an opportunity to give customers what they
       really wanted.  Having Corgea is like having a security engineer on
       staff focused on making your code more secure. We want security to
       be an enabler of engineering rather than a blocker to it, and the
       reverse to be true. To accomplish this, we built it on top of
       existing LLMs to issue code fixes.  To show Corgea's capabilities,
       we took some popular vulnerable-by-design applications like Juice
       Shop (https://github.com/juice-shop/juice-shop), scanned them and
       issued fixes for their vulnerabilities. You can see some of them
       here: https://demo.corgea.com. Some examples of vulnerabilities it
       solves are like SQL injection, Path Traversal and XSS.  What makes
       this tough is that currently LLMs struggle at generalist coding
       tasks because it has to understand your whole code base, the domain
       you're in, and the user's request to do something. This can lead to
       a lot of unintended behavior where it codes things incorrectly
       because it's giving a best guess at what you want. Adam, one of the
       founding engineers on the team coined it well: LLMs don't reason,
       they fuzz.  We made several decisions that helped the LLM become
       more deterministic. First, what we're doing is extremely domain
       specific: vulnerable code fixes in a limited number of programming
       languages. There are roughly 900 security vulnerabilities in code,
       called CWE's (https://cwe.mitre.org/), that we've built into
       Corgea. An SQL injection vulnerability in a Javascript app is the
       same regardless if you're a payments company or a travel booking
       website. Second, we have no user generated input going into the
       LLM, because SAST scanners everything needed to issue a fix. This
       makes it much more predictable and reproducible for us and
       customers. We can also create robust QA processes and checks.  To
       illustrate the point, let's put some of this to the test using some
       napkin math. Assume you're serving 5,000 enterprises that ship on
       average 300 domain specific features a year in 5 different
       programming languages that each require 30 lines of code changes
       across multiple files. You'll have about 300m permutations the
       product needs to support. What a nightmare!  Using the same napkin
       math, Corgea needs to support the ~900 vulnerabilities (CWE's).
       Most of them require 1 - 2 line changes. It doesn't need to
       understand the whole codebase since the problem is usually isolated
       to a few lines. We want to support the 5 most popular programming
       languages. If we have 5,000 customers, we have to support ~4,500
       permutations (900 issues x 5 different languages). This leads to a
       massive difference in accuracy. Obviously, this is an
       oversimplification of the whole thing but it illustrates the point.
       What makes this different from Copilot and other code-gen tools is
       that they do not specialize in security and we've seen them
       inadvertently introduce security issues unbeknownst to the
       engineer. Additionally, they do not integrate into existing
       scanning tools that companies are using to resolve those issues. So
       unless a developer is working on every part of the product, they're
       unable to clear security backlogs, which can be in the thousands of
       tickets.  As for security scanners, the current market is flooded
       with tools that report and overwhelm security teams and are not
       effective at fixing what they're reporting. Most vulnerability
       scanners do not remediate issues, and if they do they're mostly
       limited to upgrading packages from one version to another to reduce
       a CVSS. If they do offer CWE remediation capabilities their success
       rates are very low because they're often based on traditional AI
       methodologies. Additionally, they do not integrate with each other
       because they want to only serve their own findings. Enterprises use
       multiple tools like Snyk, Semgrep, Checkmarx, but also have a
       penetration testing program, and a bug bounty program. They need a
       solution that consolidates across their existing tools. They also
       use Github, Gitlab and Bitbucket for their code repository.  We're
       offering a free tier for smaller teams and priced tiers. We believe
       we can reduce 80% of the engineering effort for security fixes,
       which would equate to at least $10m a year for enterprises.  We're
       really excited to share this with you all and we'd love any
       thoughts, feedback, and comments!
        
       Author : asadeddin
       Score  : 24 points
       Date   : 2024-01-09 16:32 UTC (6 hours ago)
        
       | blakesterz wrote:
       | I help with some small open source things, would this be a thing
       | I can use to scan a public GitHub repository and see what it
       | found?
        
         | asadeddin wrote:
         | You can use one of the existing scanners we support like
         | Semgrep and Snyk to scan, and use Corgea to issue pull-requests
         | for the fixes. We will support scanning in the future with some
         | advanced capabilities.
        
       | tikkun wrote:
       | Very smart, seems super useful. Congrats on the launch
        
         | asadeddin wrote:
         | Thank you!
        
       | sidcool wrote:
       | Congrats on launching!
        
         | asadeddin wrote:
         | Thank you!
        
       | waihtis wrote:
       | > We help companies fix their vulnerable source code using AI.
       | 
       | I think like 95% of the general "vulnerability market" exists
       | because companies have assets they don't own the codebase of, and
       | have to wait for and test patches when they finally arrive.
       | 
       | > It doesn't need to understand the whole codebase since the
       | problem is usually isolated to a few lines.
       | 
       | I'm not a terrific coder but isn't this a pretty risky
       | simplification? It's a very common occurence that a minor, one
       | line change breaks something in a whole different part of the
       | codebase.
        
         | asadeddin wrote:
         | Thanks for the comment.
         | 
         | - I would agree that a big chunk of vulnerability market helps
         | with things that companies don't own, but I'm not sure it's
         | that high. A lot of companies use SaaS tools and deploy tools
         | they don't own on their private cloud that they have to wait on
         | patches for. Our perspective is that a lot of the tools that
         | emerged in the last few years center around detection, and very
         | little on remediation. We didn't want to be another tool that
         | contributes to alert fatigue. With budgets tightening, tougher
         | scrutiny of security and increasing threats, companies need
         | automation in remediation.
         | 
         | - We've designed Corgea to always have a human in the loop.
         | Corgea doesn't push code automatically to prod. It creates a PR
         | when someone clicks on the button to do so, and it will have an
         | engineer to review the PR to ensure nothing breaks. Almost
         | every company has those controls in place. Additionally, for
         | the vast majority of cases the fixes are safe and don't lead to
         | dependency issues further downstream, and for others we will be
         | building logic to account for that. For example this SQL
         | injection fix requires you to parameterize the inputs
         | correctly, which is a one line code change and doesn't have
         | dependency. https://demo.corgea.com/501.
        
           | waihtis wrote:
           | > A lot of companies use SaaS tools and deploy tools they
           | don't own on their private cloud that they have to wait on
           | patches for
           | 
           | Dont forget ~ 85% of global compute is still on-premise
           | stuff.
           | 
           | > With budgets tightening, tougher scrutiny of security and
           | increasing threats, companies need automation in remediation.
           | 
           | Yes, but the main bulk of vulns for enterprises and SMB+ (as
           | in, the orgs that actually do security spending) are in
           | products they don't own the codebase of. Windows, Redhat,
           | Cisco, Confluence, Jenkins, and more recently also solutions
           | like Okta, Forgerock and others are getting exploited via
           | vulns for the benefit of attackers.
           | 
           | I'm not trying to be a dick btw, but I think you're confusing
           | the market you're trying to play in. You're selling something
           | closer to a dev tool than a security tool, and talking about
           | detection and such doesn't really concern this area.
        
             | asadeddin wrote:
             | No worries. I think we might be talking past each other,
             | and that's ok :). I believe we're defining the market
             | differently. It sounds like you're talking about vulnerable
             | code in other apps, and I'm talking about vulnerability
             | detection in general in security. Is that correct?
             | 
             | For code you own, control or have over-sight over, we can
             | help. Otherwise, we can't.
             | 
             | When I mention vulnerability detection, I mean that
             | generally as "findings". For example, our original product
             | was detection data leakage issues in SaaS tools like Slack,
             | Snowflake, JIRA, etc. For example, we could detect someone
             | pasting credentials by mistake to their colleague in Slack
             | because they wanted to share logs. This is a human caused
             | problem facilitated by not properly sanitizing logs. I
             | include these when I'm talking about the market.
             | 
             | The comment about the dev tool vs security tool is
             | interesting. How would you categorize Snyk?
        
               | waihtis wrote:
               | Gotcha, alert fatigue is usually associated with threat
               | detection hence the qualys
               | 
               | > How would you categorize Snyk?
               | 
               | Im not a heavy user, but I believe they themself position
               | themself as "developer security" (think its even in their
               | slogan)
        
               | asadeddin wrote:
               | I believe we're in a similar category. A telling sign of
               | what category a company belongs to is based on what
               | conferences and events they're sponsoring and attending.
               | Snyk and the likes sell to the security teams with dev
               | friendliness in mind. We're aiming to do the same here.
        
       | wbl wrote:
       | Your sample fix for the ssrf bug is wrong: it ignores IPv6 and
       | DNS returning localhost or other interesting things on the
       | network. Really there isn't a great answer without knowing
       | something about the network or not having the feature.
        
         | griffinmb wrote:
         | Yeah, it doesn't fix the issue at all. Rough to have a security
         | product demo be fundamentally insecure.
        
           | asadeddin wrote:
           | Thanks for commenting. We're always trying to learn more and
           | iterate to make Corgea better. How should've the fix looked
           | like?
        
             | dmd wrote:
             | If you don't know that - or rather, if nobody on your team
             | recognized this issue and brought it up - you should not be
             | selling and shipping this product.
        
           | smt88 wrote:
           | LLMs writing code are fundamentally insecure. This product is
           | completely batshit insane and I'd fire any vendor I knew used
           | it.
        
             | griffinmb wrote:
             | Agreed that there's no way to do this meaningfully and
             | securely.
             | 
             | Looking forward to the archeological audits of LLM-
             | developed apps x years from now that are a total mystery to
             | the product owners...
        
           | asadeddin wrote:
           | I'll respond to this comment to provide a general response
           | for all of the sub-comments here.
           | 
           | As I highlighted in my post, LLM's generally are still not in
           | a position to replace a developer for more complex tasks and
           | refactoring. We're in the early days of the technology, but
           | we are seeing extremely strong improvements in it over the
           | last year. We on the team have QA'd thousands of results for
           | public, and private repositories. The private ones are
           | particularly interesting because the LLM's do not have that
           | in their corpus, and have seen very strong fix results.
           | 
           | Most people just assume we're wrapping around an LLM, but
           | there's a lot that goes underneath the hood that needs to
           | happen to ensure that fixes are going to be secure and
           | correct. Here are the standards we're setting for fix
           | quality:
           | 
           | - The fix needs to be best-practice and complete. A partial
           | security fix isn't a security fix. This is something we're
           | constantly working on. - Supporting the widest coverage in
           | CWE's.
           | 
           | - Not introducing any breaking changes in the rest of the
           | code. - Understanding the language, the framework being used,
           | and any specific packages. For example, fixing an CSRF issue
           | in Django is different than Flask. Both are python frameworks
           | but approach it differently. - Reusing existing packages
           | correctly to improve security and if it does need to add a
           | package does so in a standard way. - Placing imports in the
           | correct part of the file. - Not using deprecated or risky
           | packages. - Avoiding LLM hallucinations. - Ensuring syntax
           | and formatting are correct. - Follow the coding and naming
           | convention in the file being fixed. - Making sure fixes are
           | consistent within the same issue type. - Explain the fix
           | properly and clearly so that someone can understand it. -
           | Avoiding assumptions that could cause problems. - Not
           | removing any code that is not part of the issue.
           | 
           | Our goal is to get to 90% - 95% accuracy in fixes this year,
           | and we're on a trajectory to do that. I will be the first to
           | say 100% accuracy is impossible, and our goal is to get it
           | right more times than engineers would.
           | 
           | We take fix quality and transparency extremely seriously.
           | We'll be publishing a whitepaper showing the accuracy in
           | results because it's the right thing to do. I hope this
           | helps.
        
         | asadeddin wrote:
         | Thanks for highlighting this.
         | 
         | This fix is to demonstrate more sophisticated fixes, and it
         | does require human input to determine the correct domain and IP
         | config. We are introducing in the product for the ability for
         | humans to add additional context pre-fix generation, provide
         | feedback to generate a new fix after it's been generated and
         | edit the proposed fix. Users have asked for these tools because
         | of scenarios that require more insight.
        
         | 8organicbits wrote:
         | I worry that this tool will rewrite code such that the security
         | scanning tool can no longer detect the problem, but won't
         | actually fix it (as above). This ends up being an adversarial
         | system that makes it even harder to detect the vulnerabilities
         | left behind. If the generated patches are reviewed by non-
         | experts, these details will be missed.
         | 
         | Edit: To highlight a specific problem here: a classic target
         | for SSRF is the instance metadata IP address[1]. This IP
         | address is not on the generated blacklist. Worse you've made it
         | harder to detect this problem in the future.
         | 
         | I don't want to recommend a fix here; you're selling the fix.
         | You should consider hiring a security expert to determine if
         | LLM is really up for this task.
         | 
         | [1]
         | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance...
        
           | asadeddin wrote:
           | Thanks for commenting, and it is a great point. Keep in mind,
           | we're not touting that Corgea is the fixer-all for every
           | vulnerability. CWE's typically have simpler and more
           | standardized fixes. You can see examples of that here:
           | https://cwe.mitre.org/
           | 
           | A few things we're doing to combat this: 1) We've given the
           | entire corpus of CWE's to Corgea, and how to fix things
           | safely. This we've found from our testing and users does a
           | really good job. I personally QA a lot of results (in the
           | thousands) and we've not seen that to be a common problem. 2)
           | Corgea is designed to require two sets of eye balls. The
           | first being the security engineer and the second being the
           | developer that reviews the PR. We hope, that things should be
           | caught. Additionally, we believe our fixes will be better
           | than what a developer does. There are over 900 CWE's, and
           | it's really hard for engineers to know how to fix every
           | issue. Googling answers and asking ChatGPT can lead to them
           | introducing issues. 3) We provide in the product AI generated
           | explanations on the fix and how it was appropriate. This is
           | to educate non-experts on the topic. 4) We already have
           | checks in place to make sure things aren't misbehaving, but
           | we're rolling out soon a more advanced fix checker to make
           | sure we didn't introduce any new vulnerabilities. Based on
           | our testing, and 5) Finally we QA a lot every week, and run
           | reports on the areas we're good at or not so good at to help
           | us iterate.
        
             | 8organicbits wrote:
             | I guess I'm not convinced. This is a demo that should have
             | been chosen to show the product in the best light, but it
             | doesn't fix the problem. Was the demo reviewed by QA?
             | 
             | I really like the idea and I think that you're right about
             | the goal being to fix bugs "better than the average
             | engineer". I don't think you've reached that bar.
        
         | reflexe wrote:
         | Looks like this is is not the only problematic example, for
         | example: https://demo.corgea.com/338 Makes sure you don't try
         | to get ctf.key (but not .env for example). Another issue:
         | https://demo.corgea.com/531# The LLM makes up a usage of
         | shell=True despite the original "vulnerable" code not using it.
         | 
         | Well, at least they are showing a real demo and not some made
         | up results.
         | 
         | I think that overall the idea has some potential, but not sure
         | we are there yet.
        
           | asadeddin wrote:
           | Thanks for the feedback!
           | 
           | For the first one the SAST scanner reports to us issues based
           | on lines and issue type, so we generate fixes isolated for
           | that issue. We do not generate fixes for other
           | vulnerabilities in the same file for the same finding in the
           | same because we want to have one fix to one finding. There
           | might be another issue reported on another issue, and we plan
           | on allowing people to group fixes in the same file together.
           | 
           | Not sure if I'm missing something on the shell=True. It's in
           | the vulnerable code, which is why it changed it. You have to
           | scroll to the right in the code viewer. https://github.com/Rh
           | inoSecurityLabs/cloudgoat/blob/8ed1cf0e...
           | 
           | Is there something I'm missing?
        
             | reflexe wrote:
             | For the first issue: I understand. Thanks.
             | 
             | As for the second, There is no shell=True for me in the
             | demo but it is present in the code you sent. So maybe it is
             | just a bug in the presentation somewhere.
        
               | rplnt wrote:
               | Same here, must be a bug in the view, for me it's missing
               | the closing parenthesis as well.
        
               | asadeddin wrote:
               | Scrolling to the right should work, but you'll need to do
               | so on each code editor section. We should combine
               | scrolling of these two windows to be in sync.
               | 
               | We'll also take a look at what's causing this. It might
               | be a browser issue.
        
               | throwanem wrote:
               | They scroll in sync for me, but long lines seem truncated
               | in iOS 16.2 Safari. No visible code on that second linked
               | page includes the string in question.
        
               | asadeddin wrote:
               | Thanks for sharing! Will look into it :)
        
       | debarshri wrote:
       | Whats vulnerable for one company, may not be for another. It
       | needs lot of context. Low hanging thing is to update dependencies
       | but code fix is very tricky. Generic best practices is not very
       | valuable as it might conflict with the code context as well as
       | generate code might create confusion.
       | 
       | Historically, a generated code and human driven code is often
       | segregated. With copilot, it is still assisting. But here, it
       | generating and replacing code over the existing code therefore
       | impacting the ownership of the code as well as value is not so
       | much here. I think the PR that it would generate would endup not
       | being merged.
       | 
       | Having said that, it is a great hook to get attention but i think
       | you might fail in delivering meaningful value.
        
         | asadeddin wrote:
         | For your first point, we're not responsible for detecting what
         | is vulnerable (or even exploitable). Today there are 4
         | categories of vulnerabilities in software: 1 - CVE's which
         | you've mentioned around updating dependencies. This requires
         | the least amount of context to detect, and it's easiest to
         | change but requires a lot of context to fix downstream
         | dependent code. 2 - CWE's which are common weaknesses in
         | software. This requires a medium amount of context for
         | detection and a small to medium amount of context to fix. 3 -
         | Business and code logic flaws. This is currently unserved by
         | most tools, and this is where the wide variance between code
         | bases is. This requires a lot of context to both detect and
         | fix, and it's what most people think of when it comes to your
         | first point. 4 - Misconfiguration of environments.
         | 
         | Currently, we've focused on CWE's because of how focused and
         | isolated some of the fixes are in relation to items #1, #3 and
         | #4. We've run thousands of tests and see a very high accuracy
         | in results. We do have plans to support #1 after we feel
         | accomplished with #2. This requires more sophisticated tools
         | and logic to handle upgrade changes safely.
         | 
         | At the moment, the responsibility is on the SAST tools at the
         | moment to perform these detections. We've heard a lot of
         | complaints about false positives and it's probably one of the
         | biggest problems in the industry. We have future plans to
         | tackle both detection and prioritization, but that's a separate
         | thing.
         | 
         | To comment on your second point, Dependabot and other code-gen
         | products like ourselves, code ownership will be impacted. I
         | believe our understanding of code ownership will change
         | fundamentally as more of these tools come out and get adopted.
         | One clarification point, Corgea doesn't auto issue PR's like
         | Dependabot does. Someone needs to look at the fix, before
         | issuing a PR.
         | 
         | Thanks for commenting, and it's definitely a different
         | perspective on meaning value. For other code-gen tools like
         | copilot, where do you think the value is then?
        
       | WhackyIdeas wrote:
       | I like the idea of this, but in a way it seems like going on to a
       | website to enter your password to see if it was involved in any
       | leaks. And that makes me uneasy.
       | 
       | A system like this would be so much better if all the scanning
       | was done locally, keeping the source private from leaking at all.
        
         | asadeddin wrote:
         | Thanks for commenting, and totally get your perspective on it.
         | 
         | Scanning today can be done locally with many tools like Semgrep
         | before you use Corgea. We do send over vulnerability
         | information over to Corgea to make sure we can issue fixes for
         | them reliably and at-scale. Keep in mind repos have
         | vulnerabilities in the thousands or even tens of thousands. So
         | it's not as simple as copilot running on your IDE reading your
         | current likes of code. We have to be able to do this at-scale.
         | 
         | Finally, we've put a lot of effort into securing things down
         | and you can read some of those details here:
         | https://docs.corgea.app/security
        
       | tylerekahn wrote:
       | This looks awesome. Congrats on the launch
        
         | asadeddin wrote:
         | Thank you!
        
       | autonomousErwin wrote:
       | I get why people can get a bit apprehensive with using AI tools
       | for Pull Requests because of hallucination but this is such a
       | great application and will give it a spin on some of my Django
       | boilerplates to see what it comes up with, congratulations to the
       | team!
       | 
       | My question would be are you using it on your own codebase or an
       | open-source tool you're fond of, would love to see this operating
       | in the wild (examples are great but real life PRs hit different)?
        
         | asadeddin wrote:
         | Thank you! Please give it a spin. We'd love any feedback or
         | thoughts. :)
         | 
         | We are using it on our codebases, and it's helped us secure our
         | own product. Users have also been trying it out with their
         | private codebases, and we even used our own personal projects
         | to test it.
         | 
         | If you'd like to try Corgea with some open-source tools, there
         | are a ton of applications that are vulnerable by design like.
         | Some popular ones:
         | 
         | https://github.com/bkimminich/juice-shop
         | https://github.com/we45/Vulnerable-Flask-App
         | https://github.com/adeyosemanputra/pygoat
         | 
         | Edit: Forgot to mention, we've put in some controls to avoid
         | hallucinations like comparing diff sizes between the two
         | changes. Sometimes LLM's like to truncate code when generating
         | a fix or generates too much. We actually stop the result from
         | being generated and we retry again.
        
       ___________________________________________________________________
       (page generated 2024-01-09 23:00 UTC)