[HN Gopher] Ignore 98% of dependency alerts: introducing Semgrep...
___________________________________________________________________
Ignore 98% of dependency alerts: introducing Semgrep Supply Chain
Author : ievans
Score : 113 points
Date : 2022-10-04 15:45 UTC (7 hours ago)
(HTM) web link (r2c.dev)
(TXT) w3m dump (r2c.dev)
| snowstormsun wrote:
| Really nice idea to only show warnings if they are relevant. It's
| indeed annoying if you need to upgrade lodash only to make your
| audit tool not show critical warnings because of some function
| that is not used at all.
|
| This is not open source, though? It does make a big difference
| for some whether you're able to run the check offline or you're
| forced to upload your code to some service.
|
| One feature I'd love in such tool would be to be able to get the
| relevant parts of the changelog of the package that needs to be
| upgraded. It's not responsible to just run the upgrade command
| without checking the changelog for breaking or relevant changes.
| That's exactly why upgrades tend to be done very late, because
| there is a real risk of breaking something even if it's just a
| minor version.
| mattkopecki wrote:
| There are definitely other approaches that don't require code
| to be uploaded anywhere. For example, we (https://rezilion.com)
| work with your package managers to understand what dependencies
| your program has, and then analyze that metadata on the back
| end. Net result is still to be able to see what vulnerabilities
| are truly exploitable and which are not.
| ievans wrote:
| All the engine functionality is FOSS
| https://semgrep.dev/docs/experiments/r2c-internal-project-de...
| (code at https://github.com/returntocorp/semgrep); but the
| rules are currently private (may change in the future).
|
| As with all other Semgrep scanning, the analysis is done
| locally and offline -- which is a major contrast to most other
| vendors. See #12 on our development philosophy for more
| details: https://semgrep.dev/docs/contributing/semgrep-
| philosophy/
|
| Relevant part of the changelog is a good idea--others have also
| come out with statistical approaches based on upgrades others
| made (eg dependabot has a compatibility score which is based on
| "when we made PRs for this on other repos, what % of the time
| did tests pass vs fail")
| freeqaz wrote:
| Here is some code on GitHub that does call site checking using
| SemGrep: https://github.com/lunasec-
| io/lunasec/blob/master/lunatrace/...
|
| (Note: I helped write that. We're building a similar service to
| the r2c one.)
|
| You're right that patching is hard because of opaque package
| diffs. I've seen some tools coming out like Socket.dev which
| show a diff between versions.
| https://socket.dev/npm/package/react/versions
|
| But, that said, this is still a hard problem to solve and it's
| happened before that malware[0][1] has been silently shipped
| because of how opaque packages are.
|
| 0:
| https://web.archive.org/web/20201221173112/https://github.co...
|
| 1: https://www.coindesk.com/markets/2018/11/27/fake-
| developer-s...
| feross wrote:
| Thanks for mentioning Socket.dev :)
|
| Looking at package diffs is super important because of the
| rise of "protestware". For example, a maintainer of the
| event-source-polyfill package recently added code which
| redirects website visitors located in Eastern European
| timezones to a change.org petition page. This means that real
| users are being navigated to this random URL in production.
|
| See the attack code here:
| https://socket.dev/npm/package/event-source-
| polyfill/diff/1....
|
| It's very unlikely that users of event-source-polyfill are
| aware that this hidden behavior has been added to the
| package. And yet, the package remains available on npm many
| months after it was initially published. We think that supply
| chain security tools like Socket have an important role to
| play in warning npm users when unwanted 'gray area' code is
| added to packages they use.
| stevebmark wrote:
| I've always thought that dependabot was busy-work, a waste of
| time. This article makes a good point that drives it home:
| Alarams that aren't real make all alarms useless. Dependabot is
| especially painful in non-typed languages (Python, Ruby, and
| especially Javascript) where "upgrading" a library can break
| things that there's no way to know until production.
|
| Maybe the constant work, extra build time (and cash for all
| that), and risk of breaking production, is worth it for the 0.01%
| of the time there's a real vulnerability? It seems like a high
| price to pay though. When there are major software
| vulnerabilities (like log4j), the whole industry usually swarms
| around it, and the alarm has high value.
|
| I just realized how much CircleCI probably loves Dependabot. I
| wonder what hit % their margins would take if we moved off it
| collectively as an industry.
| bawolff wrote:
| I kind of feel like dependabot alerts should be treated like a
| coding convention error - that extra whitespace isnt actually
| causing a problem but we fix it right away.
|
| Otherwise you have to start analyzing the alerts, and good luck
| with that. The low severity ones are marked critical and the
| scary ones are marked low. Suddenly you have 200 unfixed alerts
| and its impossible to know if somewhere in that haystack is an
| important one.
| mfer wrote:
| > When there are major software vulnerabilities (like log4j),
| the whole industry usually swarms around it, and the alarm has
| high value.
|
| You're leaving me with the impression that you think we should
| only patch major software vulnerabilities. This I would
| disagree with. Minor vulnerabilities can be used, especially in
| groups, to do things we don't anticipate. It's not just about a
| single vulnerability but about how an attacker can leverage
| multiple different vulnerabilities together.
| danenania wrote:
| If you use vendoring, it's also worth considering that there's
| always some inherent security risk in upgrading dependencies.
| If an attacker takes control of a package somewhere in your
| dependency tree, you don't get compromised until you actually
| install a new version of that package. This risk can often
| outweigh the risk of very minor/dev-facing CVEs.
| feross wrote:
| Shameless plug: This is what I'm building Socket.dev to
| solve.
|
| Socket watches for changes to "package manifest" files such
| as package.json, package-lock.json, and yarn.lock. Whenever a
| new dependency is added in a pull request, Socket analyzes
| the package's behavior and leaves a comment if it is a
| security risk.
|
| You can see some real-world examples here:
| https://socket.dev/blog/socket-for-github-1.0
| e1g wrote:
| We use Socket and my favorite feature is when you highlight
| new dependencies with a post-install hook. It's not always
| a problem, but almost always a smell.
|
| One feature request: please allow me to "suppress" warnings
| for a specific package+version combo. This is useful for
| activist libs that take a political stance - I know it
| happens, but often cannot remove them, and don't want to
| continue flagging the same problem at every sec review.
| smcleod wrote:
| IMO Dependabot is really dreadful at its job. Try Renovate -
| it's really brilliant, fast, flexible, supports properly
| binding PRs/MRs.
| scinerio wrote:
| Will this ever be integrated with Gitlab Ultimate?
| mattkopecki wrote:
| Gitlab Ultimate uses Rezilion to accomplish a similar aim.
| Rather than using the principle of "reachability", Rezilion
| analyzes at runtime what functions and classes are loaded to
| memory. Much more deterministic and less of a guess about what
| code will be called.
|
| https://about.gitlab.com/blog/2022/03/23/gitlab-rezilion-int...
| masklinn wrote:
| How does it do that in the face of lazy loading, or for
| languages in which "what functions and classes are loaded in
| to memory" is not really a thing (e.g. C)?
| tsimionescu wrote:
| Shouldn't this be very easy in C? With static linking,
| you're vulnerable if you're linking the package. With
| dynamic linking, you're vulnerable if you're importing the
| specific functions. Otherwise, you're not vulnerable -
| there's no other legal way to call a function in C.
|
| Now, if you're memory mapping some file and jumping into it
| to call that function, good luck. You're already well into
| undefined behavior territory.
|
| Now, for lazy loading, I'm assuming the answer is the same
| as any other runtime path analysis tool: it's up to you to
| make sure all relevant code paths are actually running
| during the analysis. Presumably your tests should be
| written in such a way as to trigger the loading of all
| dependencies.
|
| I think there's really no other reasonable way to handle
| this, though I can't say I've worked with either GutHub
| Ultimate or Rezilion, so maybe I'm missing something.
| underyx wrote:
| Hey, I work on OP's product, and just wanted to mention
| that reachability is not always about a function being
| called. Sometimes insecure behavior is triggered by
| setting options to a certain value[0]. Other times it's
| feasible to mark usages of an insecure function as safe
| when we know that the passed argument comes from a
| trusted source[1]. The Semgrep rules we write understand
| these nuances instead of just flagging function calls.
|
| [0]: e.g. https://nvd.nist.gov/vuln/detail/CVE-2021-28957
|
| [1]: e.g. https://nvd.nist.gov/vuln/detail/CVE-2014-0081
| mattkopecki wrote:
| Rezilion works at runtime when the Gitlab runner spins up a
| container for testing the app. Rezilion observes the
| contents of memory and can reverse-engineer back to the
| filesystem to see where everything was loaded from.
|
| In the CI pipeline this depends on your tests exercising
| the app, but when you deploy Rezilion into a longer-lived
| environment like Stage or Prod then you may get some new
| code pathways that are used, although most find that the
| results aren't surprisingly different between all of the
| environments.
| scinerio wrote:
| Ah, thank you. It's not entirely clear whether this is
| something baked into Gitlab Ultimates SAST CI/CD
| feature/template, or if it's a third party that I would have
| to license first. Do you happen to know?
| jollyllama wrote:
| Sounds nice. I've never worked with a tool like this that doesn't
| turn up a ridiculous number of false positives.
| henvic wrote:
| How the hell do you end up with 1644 vulnerable packages anyways?
|
| * rhetorical question, JS...
|
| It was actually one of the main drivers for me to start using Go
| instead of JavaScript for server-side applications and CLIs about
| 8 years ago.
| nightpool wrote:
| Roughly: NPM, Github, and others funded open bug bounties for
| all popular NPM packages. These bug bounties led to a rash of
| security "vulnerabilities" being reported against open source
| project, to satisfy the terms of the bounty conditions. Public
| bug bounty "intermediary" companies are a major culprit here--
| they have an incentive to push maintainers to accept even
| trivial "vulnerabilities", since their success is tied to
| "number of vulnerabilities reported" and "amount of bounties
| paid out". This leads to classes of vulnerabilities like reDOS
| or prototype pollution that would never have been noticed or
| worth any money otherwise.
| thenerdhead wrote:
| The problem really comes down to data quality in disclosing
| vulnerabilities.
|
| With higher quality data, better CVSS scores can be calculated.
| With higher quality data, affected code paths can be better
| disclosed. With higher quality data, unknown vulnerabilities may
| be found in parallel to the known ones.
|
| I don't think any tool or automation can solve the problem of
| high quality data. Humans have to discern to provide it. No
| amount of code analysis can solve that. But it sure can help.
| light24bulbs wrote:
| You're right. Nobody bothers to make scanners because there's
| no data, and nobody has come up with a good format to convey
| the data between producers (like NVD) and consumers (like
| dependabot).
|
| I wrote a blog post talking about some of this stuff:
| https://www.lunasec.io/docs/blog/the-issue-with-vuln-scanner...
|
| It truly is a chicken and egg problem. There are next to no
| automated scanners that make use of data like that, semgrep is
| the furthest along and my company is close behind them at
| taking a stab at it as far as I can tell. Heck there are hardly
| any that do anything with the existing "Environmental" part of
| the CVSS, and that has been pretty well populated by NVD, I
| believe.
|
| The existing interchange formats for vulnerability data, such
| as OSV, are underdesigned to the point that it feels like
| GitHub CoPilot designed them. It's real work to even get to the
| point that you can consume them, given all the weird choices in
| there. Sorry if I'm salty.
|
| There is an attempt to create a standard for situational
| vulnerability exposure called "VEX" or Vulnerability Exchange
| Format, but it's almost entirely focused on conveying
| information about what vulnerabilities have been manually
| eliminated, so that software "vendors" can satisfy their
| customers, especially in government contracts. It's not
| modeling the full picture of what can happen in a dependency
| tree and all the useful false-positive information in there.
| thenerdhead wrote:
| Yeah agreed. When I see these problem statements, I see us
| addressing problems that are by-products of vulnerability
| fatigue.
|
| I.e "be lazy and ignore those vulnerabilities by using our
| tools!"
|
| It hardly solves the true issue of an industry wide challenge
| of lack of useful information or even transparency of said
| information from responsible parties. I believe this laziness
| is what got us here in the first place.
| CSDude wrote:
| Jokes on you I already ignore %100 of them /s
|
| I like the promise however how can I trust it completely that the
| ignored part is not actually reachable? All the languages (except
| a few) do some magic that might not be detected? At previous
| work, we were bombarded with dependency upgrades, I can still
| feel the pain in my bones.
| thefrozenone wrote:
| How does this tool go from a vuln. in a library to -> a set of
| affected functions/control paths? My understanding was that the
| CVE format is unustructed which makes an analysis like this
| difficult
| theptip wrote:
| My question too. All I see is this citation:
|
| > [1] We'll be sharing more details about this work later in
| October. Stay tuned!
| ievans wrote:
| We added support to the Semgrep engine for combining package
| metadata restrictions (from the CVE format) with code search
| patterns that indicate you're using the vulnerable library
| (we're writing those mostly manually, but Semgrep makes it
| pretty easy): - id: vulnerable-awscli-
| apr-2017 pattern-either: - pattern:
| boto3.resource('s3', ...) - pattern:
| boto3.client('s3', ...) r2c-internal-project-depends-
| on: namespace: pypi package: awscli
| version: "<= 1.11.82" message: this version of awscli
| is subject to a directory traversal vulnerability in the s3
| module
|
| This is still experimental and internal
| (https://semgrep.dev/docs/experiments/r2c-internal-project-
| de...) but eventually we'd like to promote it and also maybe
| open up our CVE rules more as well!
| mattkopecki wrote:
| Here is a good writeup of some of the pros and cons of using
| a "reachability" approach.
|
| https://blog.sonatype.com/prioritizing-open-source-
| vulnerabi...
|
| >Unfortunately, no technology currently exists that can tell
| you whether a method is definitively not called, and even if
| it is not called currently, it's just one code change away
| from being called. This means that reachability should never
| be used as an excuse to completely ignore a vulnerability,
| but rather reachability of a vulnerability should be just one
| component of a more holistic approach to assessing risk that
| also takes into account the application context and severity
| of the vulnerability.
| DannyBee wrote:
| Err, "no technology currently exists" is wrong, "no
| technology can possibly exist" to say whether something if
| definitively called.
|
| It's an undecidable problem in any of the top programming
| languages, and some of the sub problems (like aliasing)
| themselves are similarly statically undecidable in any
| meaningful programming language.
|
| You can choose between over-approximation or under-
| approximation.
| sverhagen wrote:
| I saw that Java support was still in beta. But it makes me
| wonder if it's going to come with a "don't use reflection"
| disclaimer, then...?
| jrockway wrote:
| This is a similar mechanism as govulncheck
| (https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck), which has
| been quite nice to use in practice. Because it only cares about
| vulnerable code that is actually possible to call, it's quiet
| enough to use as a presubmit check without annoying people. Nice
| to see this for other languages.
| Hooray_Darakian wrote:
| How does it deal with vulnerability alerts which don't say
| anything about what code is affected?
| jrockway wrote:
| From https://go.dev/security/vuln/: "A vulnerability database
| is populated with reports using information from the data
| pipeline. All reports in the database are reviewed and
| curated by the Go Security team."
|
| I would imagine that's what Semgrep is doing as well. You're
| paying for the analysis; the code is the easy part.
| ievans wrote:
| Both Semgrep Supply Chain and govulncheck (AFAIK) are doing
| this work manually, for now. It would indeed be nice if the
| vulnerability reporting process had a way to provide
| metadata, but there's no real consensus on what format that
| data would take. We take advantage of the fact that Semgrep
| makes it much easier than other commercial tools (or even
| most linters) to write a rule quickly.
|
| The good news is there's a natural statistical power
| distribution: most alerts come from few vulnerabilities in
| the most popular (and often large) libraries, so you get
| significant lift just by writing rules starting with
| libraries.
| Hooray_Darakian wrote:
| > Both Semgrep Supply Chain and govulncheck (AFAIK) are
| doing this work manually, for now.
|
| Ya I get that, but surely you don't have 100% coverage.
| What does your code do for the advisories which you don't
| have coverage for? Alert? Ignore?
| nightpool wrote:
| Since security vulnerability alerts are already created
| and processed manually (e.g., every Dependabot alert is
| triggered by some Github employee who imported the right
| data into their system and clicked "send" on it), adding
| an extra step to create the right rules doesn't seem
| impossibly resource intensive. Certainly much more time
| is spent "manually" processing even easier-to-automate
| things in other parts of the economy, like payments
| reconciliation (https://keshikomisimulator.com/)
___________________________________________________________________
(page generated 2022-10-04 23:00 UTC)