[HN Gopher] How we applied fuzzing techniques to cURL
       ___________________________________________________________________
        
       How we applied fuzzing techniques to cURL
        
       Author : ingve
       Score  : 174 points
       Date   : 2024-03-01 15:09 UTC (7 hours ago)
        
 (HTM) web link (blog.trailofbits.com)
 (TXT) w3m dump (blog.trailofbits.com)
        
       | SamuelAdams wrote:
       | I am curious how much effort goes into creating and maintaining
       | unit tests and fuzzing tests. Sometimes it takes longer / more
       | lines of code to write thorough tests than it does to implement
       | the core feature.
       | 
       | At that point, is it worth the time invested? Every new feature
       | can take 2-3 times longer to deliver due to adding tests.
        
         | epistasis wrote:
         | There's a tradeoff for sure, but to fully evaluate that
         | tradeoff you must also take into account the future time spent
         | on the code base, including the amount of time needed to debug
         | future features, adopt a new build environment, allow new team
         | members to muck around and see what does and does not work...
         | 
         | If it's a small project that won't be extended much, then
         | perhaps then re-runnable unit tests may not make the bar as a
         | net positive tradeoff. But some time must still be spent
         | testing the code, even if it's not tests that are written in
         | code.
        
         | avgcorrection wrote:
         | The best-case IMO is a test suite where writing the test is
         | almost as easy (or maybe easier?) than doing the equivalent
         | manual test. The test code is maybe 2-4x longer than the
         | change.
         | 
         | The worst case is... arbitrarily bad, to the point of being
         | impossible. Test setup is hell because there are too many
         | dependencies. You have to mock the "real world" since (again)
         | things depend on other things too much and you can't really do
         | a simple string-manipulating change without setting up a whole
         | environment. Also you introduce bugs in your tests that take
         | longer to debug than the real change you are making.
         | 
         | What I feel that we (current project) have fallen into the trap
         | of is that there wasn't a test suite from the start. Just
         | manual testing. Then when you get to a stage where you feel you
         | need one, the code base is not structured to make automated
         | testing simple.
        
         | bmcniel wrote:
         | Don't test protect against future unintended regressions?
         | 
         | If all code was write-only then testing is probably a waste of
         | time but code changes constantly.
        
         | yjftsjthsd-h wrote:
         | In the case of curl, the cost/benefit analysis is probably
         | skewed by it being deployed on a _massive_ scale, such that any
         | bugs in curl have an unusually large impact. If your company 's
         | in-house CRM has an exploitable bug, that's _bad_ but the
         | impact is just your company. If libcurl has an exploitable bug,
         | that 's millions of devices affected.
        
         | Attummm wrote:
         | For libraries, tools, and frameworks, testing is crucial as it
         | ensures that the code relying on them can address the issue at
         | hand. Code can only be as reliable as what it's leaning on. So,
         | to answer your question, a lot of time.
         | 
         | In a business-oriented project(at most jobs), code may undergo
         | frequent changes due to requests from business, thus too much
         | focus on testing could potentially slowing down development
         | speed if extensive testing is implemented for each change.
         | However, regression tests can still provide valuable insights
         | and allow for faster development later in the life of the
         | project.
         | 
         | While many projects only focus on happy path testing, the use
         | of such tests might not be as high. Coupling them with Negative
         | Testing, and even better, implementing boundary testing,
         | compels developers to consider both valid and invalid inputs,
         | helping to identify and address potential edge cases before
         | they become bugs or security issues in production.
         | 
         | For instance, this [0] codebase has more tests than actual
         | code, including fuzzing tests.
         | 
         | [0]https://github.com/Attumm/redis-dict/blob/main/tests.py
        
         | hgs3 wrote:
         | Code that isn't tested, isn't done. Tests not only verify the
         | expectations, but also prevent future regression. Fuzzing is
         | essential for code that accepts external inputs. Heartbleed was
         | discoverable with a fuzzer.
        
         | 1over137 wrote:
         | >At that point, is it worth the time invested? Every new
         | feature can take 2-3 times longer to deliver due to adding
         | tests.
         | 
         | Depends if you are writing software to control a pacemaker, or
         | writing software for some silly smartphone game.
        
         | pests wrote:
         | Everyone always praises SQLite
         | 
         | > As of version 3.42.0 (2023-05-16), the SQLite library
         | consists of approximately 155.8 KSLOC of C code [...] the
         | project has 590 times as much test code and test scripts -
         | 92053.1 KSLOC.
         | 
         | https://www.sqlite.org/testing.html
        
           | guerrilla wrote:
           | What the fuck. If they're investing that much then why don't
           | they just go straight to formal verification. This is what
           | things like frama-c (or whatever's popular now) are for.
        
             | gavinhoward wrote:
             | The problem is that effort to do formal verification goes
             | exponential beyond a certain point.
             | 
             | seL4 is around 10-12 KLoC, and it took a decade of effort
             | from multiple people to make it happen.
             | 
             | At the size of SQLite, especially where they have to
             | operate on platforms with different behavior (as an OS,
             | seL4 _is_ the platform), formal verification is just too
             | much effort.
             | 
             | All that said, your reaction is _totally_ understandable.
        
               | tonyarkles wrote:
               | Link to how SQLite is tested, for anyone who's curious:
               | https://www.sqlite.org/testing.html
               | 
               | There's also an interesting thing where formal
               | verification requires a formal specification, which afaik
               | there isn't one for SQLite. One of the toughest problems
               | that someone would run into trying to put together a
               | formal specification for code as widely deployed as
               | SQLite boils down to Hyrum's Law[1]: on a long enough
               | time scale, all observable behaviours of your system
               | become interfaces that someone, somewhere depends on.
               | 
               | That massive suite of test cases isn't a formal
               | specification but given that it achieves 100% branch
               | coverage that implies to me that it:
               | 
               | - pretty tightly bounds the interface without formally
               | specifying it
               | 
               | - also pretty tightly constrains the implementation to
               | match the current implementation
               | 
               | Which, when you have as many users as you do with SQLite,
               | is probably a fair way of providing guarantees to your
               | users that upgrading from 3.44.0 to 3.45.1 isn't going to
               | break anything you're using unless you were relying on
               | explicitly-identified buggy behaviour (you'd be able to
               | see the delta in the test cases if you looked at the
               | Fossil diffs).
               | 
               | [1] https://www.hyrumslaw.com
        
             | summerlight wrote:
             | Because formal verification doesn't scale at the moment.
             | There are a relatively few number of experts on formal
             | method around the globe and some of them need to work on
             | SQLite indefinitely unless you're okay with just one-off
             | verification. Frama-C has a different background because
             | it's from Inria, the institution with many formal method
             | experts.
        
             | datadeft wrote:
             | You can do many things other than testing:
             | 
             | - use a memory safe language
             | 
             | - formal verification (multiple implementations even)
             | 
             | - build a simulator like FoundationDB did
        
         | burnished wrote:
         | Generally speaking, yes. Its not like most code isn't changed
         | as a consequence of writing those tests, so the practice has
         | immediate benefit, but future changes can be made swiftly and
         | securely due to the confidence those tests should be giving
         | you.
         | 
         | But also your time estimate does sound wonky, 2-3x sounds
         | extreme. Maybe you need to improve your test writing process?
        
         | dogcomplex wrote:
         | I would generally double whatever your expectations are for the
         | initial feature development. Tests are essentially a second
         | implementation from a different angle running in parallel,
         | hoping the results match. Every feature change means changing 2
         | systems now. You save a bit of subsequent time with easier
         | debugging when other features break tests, but that's somewhat
         | eaten up by maintaining a system twice the size.
         | 
         | There are reasons many MVP developers and small teams whose
         | focus is more on rapid feature implementation than large team
         | coordination or code stability forego writing tests. It doesn't
         | make sense in all circumstances. Generally, more complex, less
         | grokable, more large-team-oriented or public library code is
         | when you need testing.
        
         | summerlight wrote:
         | It's an economical decision. If you're developing a new indie
         | game with $10k revenue expectation in its lifetime, then
         | probably it's not worth your time. But if it's a core
         | infrastructure of multi-billion dollar business, then yes it's
         | worth your time since any non-trivial security incidents may
         | cost more than your annual salary.
        
         | tylerhou wrote:
         | It depends on how risk tolerant you are. If you are at a
         | startup, the growth of your startup is often largely dependent
         | on how quickly you can add features / onboard customers. In
         | that context, writing tests not only slows you down from adding
         | new features, it might make it harder to modify existing
         | features. In addition, early customers also tend to be
         | accepting of small bugs -- they themselves already have "taken
         | a risk" in trusting an early startup. So testing is not really
         | valuable -- you want to remain agile, and you won't lose much
         | money because of a bug.
         | 
         | On the other hand, if you are Google, you already have found a
         | money-printing firehose, and you /don't/ want to take on any
         | additional unnecessary risk. Any new code needs to /not/ break
         | existing functionality -- if it does, you might lose out of
         | millions of revenue. In addition, your product becomes so large
         | that it is impossible to manually test every feature. In this
         | case, tests actually help you move /faster/ because they help
         | you at scale automatically ensure that a change does not break
         | anything.
         | 
         | While cURL does not make any money, it is solidly on the
         | mature/Google end of the testing spectrum. It has found a
         | footing in the open source tooling "market" and people rely on
         | it to maintain its existing functionality. In addition, it has
         | accumulated a fairly large surface area, so manual testing is
         | not really feasible for every feature. So testing similarly
         | helps cURL developers move faster (in the long run), not
         | slower.
        
       | PoignardAzur wrote:
       | I don't get the part about custom mutators:
       | 
       | > _If the data can't be parsed into a valid TLV, instead of
       | throwing it away, return a syntactically correct dummy TLV. This
       | can be anything, as long as it can be successfully unpacked._
       | 
       | If you're creating a dummy value, how is that better than
       | failing? How does that give your fuzzer better coverage?
        
         | pstrateman wrote:
         | The file format they choose is difficult for a fuzzer to
         | produce valid examples by random chance.
         | 
         | The file format isn't what's being fuzzed, so trying to accept
         | as many things as possible as valid is useful.
         | 
         | It's a trick to make the fuzzer faster.
        
         | gavinhoward wrote:
         | Not an expert, but I am a power user of fuzzers.
         | 
         | The problem is that the space of invalid inputs is far larger
         | than the space of valid inputs. Sometimes orders of magnitude
         | larger, say billions or more invalid inputs to one valid input.
         | 
         | Naive fuzzing will hit so many error cases that it will hardly
         | produce a valid input. For the ratio that I mentioned, you
         | might run a fuzzer for a billion runs and only get one valid
         | input in the bunch.
         | 
         | Using a custom mutator and returning a dummy value will give
         | the fuzzer a starting point from a valid input and makes
         | generating other valid inputs more likely.
         | 
         | For my part, I prefer to use custom mutators to generate valid
         | test cases most of the time, but I want some invalid inputs
         | because error handling is where most bugs are.
        
         | stefan_ wrote:
         | Remember TLV is for "Type Length Value". It's better to take
         | your fuzzer output for value (or possibly type and value) but
         | generate the length and final TLV yourself than having tons of
         | fuzzer generated sequences already fail at the very basic
         | type/length check that is unlikely to be vulnerable in software
         | like cURL (but can be in many others..).
        
       ___________________________________________________________________
       (page generated 2024-03-01 23:00 UTC)