[HN Gopher] How we applied fuzzing techniques to cURL
___________________________________________________________________
How we applied fuzzing techniques to cURL
Author : ingve
Score : 174 points
Date : 2024-03-01 15:09 UTC (7 hours ago)
(HTM) web link (blog.trailofbits.com)
(TXT) w3m dump (blog.trailofbits.com)
| SamuelAdams wrote:
| I am curious how much effort goes into creating and maintaining
| unit tests and fuzzing tests. Sometimes it takes longer / more
| lines of code to write thorough tests than it does to implement
| the core feature.
|
| At that point, is it worth the time invested? Every new feature
| can take 2-3 times longer to deliver due to adding tests.
| epistasis wrote:
| There's a tradeoff for sure, but to fully evaluate that
| tradeoff you must also take into account the future time spent
| on the code base, including the amount of time needed to debug
| future features, adopt a new build environment, allow new team
| members to muck around and see what does and does not work...
|
| If it's a small project that won't be extended much, then
| perhaps then re-runnable unit tests may not make the bar as a
| net positive tradeoff. But some time must still be spent
| testing the code, even if it's not tests that are written in
| code.
| avgcorrection wrote:
| The best-case IMO is a test suite where writing the test is
| almost as easy (or maybe easier?) than doing the equivalent
| manual test. The test code is maybe 2-4x longer than the
| change.
|
| The worst case is... arbitrarily bad, to the point of being
| impossible. Test setup is hell because there are too many
| dependencies. You have to mock the "real world" since (again)
| things depend on other things too much and you can't really do
| a simple string-manipulating change without setting up a whole
| environment. Also you introduce bugs in your tests that take
| longer to debug than the real change you are making.
|
| What I feel that we (current project) have fallen into the trap
| of is that there wasn't a test suite from the start. Just
| manual testing. Then when you get to a stage where you feel you
| need one, the code base is not structured to make automated
| testing simple.
| bmcniel wrote:
| Don't test protect against future unintended regressions?
|
| If all code was write-only then testing is probably a waste of
| time but code changes constantly.
| yjftsjthsd-h wrote:
| In the case of curl, the cost/benefit analysis is probably
| skewed by it being deployed on a _massive_ scale, such that any
| bugs in curl have an unusually large impact. If your company 's
| in-house CRM has an exploitable bug, that's _bad_ but the
| impact is just your company. If libcurl has an exploitable bug,
| that 's millions of devices affected.
| Attummm wrote:
| For libraries, tools, and frameworks, testing is crucial as it
| ensures that the code relying on them can address the issue at
| hand. Code can only be as reliable as what it's leaning on. So,
| to answer your question, a lot of time.
|
| In a business-oriented project(at most jobs), code may undergo
| frequent changes due to requests from business, thus too much
| focus on testing could potentially slowing down development
| speed if extensive testing is implemented for each change.
| However, regression tests can still provide valuable insights
| and allow for faster development later in the life of the
| project.
|
| While many projects only focus on happy path testing, the use
| of such tests might not be as high. Coupling them with Negative
| Testing, and even better, implementing boundary testing,
| compels developers to consider both valid and invalid inputs,
| helping to identify and address potential edge cases before
| they become bugs or security issues in production.
|
| For instance, this [0] codebase has more tests than actual
| code, including fuzzing tests.
|
| [0]https://github.com/Attumm/redis-dict/blob/main/tests.py
| hgs3 wrote:
| Code that isn't tested, isn't done. Tests not only verify the
| expectations, but also prevent future regression. Fuzzing is
| essential for code that accepts external inputs. Heartbleed was
| discoverable with a fuzzer.
| 1over137 wrote:
| >At that point, is it worth the time invested? Every new
| feature can take 2-3 times longer to deliver due to adding
| tests.
|
| Depends if you are writing software to control a pacemaker, or
| writing software for some silly smartphone game.
| pests wrote:
| Everyone always praises SQLite
|
| > As of version 3.42.0 (2023-05-16), the SQLite library
| consists of approximately 155.8 KSLOC of C code [...] the
| project has 590 times as much test code and test scripts -
| 92053.1 KSLOC.
|
| https://www.sqlite.org/testing.html
| guerrilla wrote:
| What the fuck. If they're investing that much then why don't
| they just go straight to formal verification. This is what
| things like frama-c (or whatever's popular now) are for.
| gavinhoward wrote:
| The problem is that effort to do formal verification goes
| exponential beyond a certain point.
|
| seL4 is around 10-12 KLoC, and it took a decade of effort
| from multiple people to make it happen.
|
| At the size of SQLite, especially where they have to
| operate on platforms with different behavior (as an OS,
| seL4 _is_ the platform), formal verification is just too
| much effort.
|
| All that said, your reaction is _totally_ understandable.
| tonyarkles wrote:
| Link to how SQLite is tested, for anyone who's curious:
| https://www.sqlite.org/testing.html
|
| There's also an interesting thing where formal
| verification requires a formal specification, which afaik
| there isn't one for SQLite. One of the toughest problems
| that someone would run into trying to put together a
| formal specification for code as widely deployed as
| SQLite boils down to Hyrum's Law[1]: on a long enough
| time scale, all observable behaviours of your system
| become interfaces that someone, somewhere depends on.
|
| That massive suite of test cases isn't a formal
| specification but given that it achieves 100% branch
| coverage that implies to me that it:
|
| - pretty tightly bounds the interface without formally
| specifying it
|
| - also pretty tightly constrains the implementation to
| match the current implementation
|
| Which, when you have as many users as you do with SQLite,
| is probably a fair way of providing guarantees to your
| users that upgrading from 3.44.0 to 3.45.1 isn't going to
| break anything you're using unless you were relying on
| explicitly-identified buggy behaviour (you'd be able to
| see the delta in the test cases if you looked at the
| Fossil diffs).
|
| [1] https://www.hyrumslaw.com
| summerlight wrote:
| Because formal verification doesn't scale at the moment.
| There are a relatively few number of experts on formal
| method around the globe and some of them need to work on
| SQLite indefinitely unless you're okay with just one-off
| verification. Frama-C has a different background because
| it's from Inria, the institution with many formal method
| experts.
| datadeft wrote:
| You can do many things other than testing:
|
| - use a memory safe language
|
| - formal verification (multiple implementations even)
|
| - build a simulator like FoundationDB did
| burnished wrote:
| Generally speaking, yes. Its not like most code isn't changed
| as a consequence of writing those tests, so the practice has
| immediate benefit, but future changes can be made swiftly and
| securely due to the confidence those tests should be giving
| you.
|
| But also your time estimate does sound wonky, 2-3x sounds
| extreme. Maybe you need to improve your test writing process?
| dogcomplex wrote:
| I would generally double whatever your expectations are for the
| initial feature development. Tests are essentially a second
| implementation from a different angle running in parallel,
| hoping the results match. Every feature change means changing 2
| systems now. You save a bit of subsequent time with easier
| debugging when other features break tests, but that's somewhat
| eaten up by maintaining a system twice the size.
|
| There are reasons many MVP developers and small teams whose
| focus is more on rapid feature implementation than large team
| coordination or code stability forego writing tests. It doesn't
| make sense in all circumstances. Generally, more complex, less
| grokable, more large-team-oriented or public library code is
| when you need testing.
| summerlight wrote:
| It's an economical decision. If you're developing a new indie
| game with $10k revenue expectation in its lifetime, then
| probably it's not worth your time. But if it's a core
| infrastructure of multi-billion dollar business, then yes it's
| worth your time since any non-trivial security incidents may
| cost more than your annual salary.
| tylerhou wrote:
| It depends on how risk tolerant you are. If you are at a
| startup, the growth of your startup is often largely dependent
| on how quickly you can add features / onboard customers. In
| that context, writing tests not only slows you down from adding
| new features, it might make it harder to modify existing
| features. In addition, early customers also tend to be
| accepting of small bugs -- they themselves already have "taken
| a risk" in trusting an early startup. So testing is not really
| valuable -- you want to remain agile, and you won't lose much
| money because of a bug.
|
| On the other hand, if you are Google, you already have found a
| money-printing firehose, and you /don't/ want to take on any
| additional unnecessary risk. Any new code needs to /not/ break
| existing functionality -- if it does, you might lose out of
| millions of revenue. In addition, your product becomes so large
| that it is impossible to manually test every feature. In this
| case, tests actually help you move /faster/ because they help
| you at scale automatically ensure that a change does not break
| anything.
|
| While cURL does not make any money, it is solidly on the
| mature/Google end of the testing spectrum. It has found a
| footing in the open source tooling "market" and people rely on
| it to maintain its existing functionality. In addition, it has
| accumulated a fairly large surface area, so manual testing is
| not really feasible for every feature. So testing similarly
| helps cURL developers move faster (in the long run), not
| slower.
| PoignardAzur wrote:
| I don't get the part about custom mutators:
|
| > _If the data can't be parsed into a valid TLV, instead of
| throwing it away, return a syntactically correct dummy TLV. This
| can be anything, as long as it can be successfully unpacked._
|
| If you're creating a dummy value, how is that better than
| failing? How does that give your fuzzer better coverage?
| pstrateman wrote:
| The file format they choose is difficult for a fuzzer to
| produce valid examples by random chance.
|
| The file format isn't what's being fuzzed, so trying to accept
| as many things as possible as valid is useful.
|
| It's a trick to make the fuzzer faster.
| gavinhoward wrote:
| Not an expert, but I am a power user of fuzzers.
|
| The problem is that the space of invalid inputs is far larger
| than the space of valid inputs. Sometimes orders of magnitude
| larger, say billions or more invalid inputs to one valid input.
|
| Naive fuzzing will hit so many error cases that it will hardly
| produce a valid input. For the ratio that I mentioned, you
| might run a fuzzer for a billion runs and only get one valid
| input in the bunch.
|
| Using a custom mutator and returning a dummy value will give
| the fuzzer a starting point from a valid input and makes
| generating other valid inputs more likely.
|
| For my part, I prefer to use custom mutators to generate valid
| test cases most of the time, but I want some invalid inputs
| because error handling is where most bugs are.
| stefan_ wrote:
| Remember TLV is for "Type Length Value". It's better to take
| your fuzzer output for value (or possibly type and value) but
| generate the length and final TLV yourself than having tons of
| fuzzer generated sequences already fail at the very basic
| type/length check that is unlikely to be vulnerable in software
| like cURL (but can be in many others..).
___________________________________________________________________
(page generated 2024-03-01 23:00 UTC)