[HN Gopher] New differential fuzzing tool reveals novel HTTP req...
       ___________________________________________________________________
        
       New differential fuzzing tool reveals novel HTTP request smuggling
       techniques
        
       Author : feross
       Score  : 96 points
       Date   : 2021-11-25 17:02 UTC (5 hours ago)
        
 (HTM) web link (portswigger.net)
 (TXT) w3m dump (portswigger.net)
        
       | kdbg wrote:
       | On a whole what I found more interesting here was just the
       | techniques they came across through fuzzing that had some impact.
       | Yes its interesting to see the specific combinations that were
       | impacted, but in the real-world there are so many other potential
       | combinations.
       | 
       | The dominate method for request smuggling as of the last few
       | years has been with `Content-Length` and `Transfer-Encoding`.
       | What I found most interesting and the biggest take-away as
       | someone who has worked doing web-app assessments is more just the
       | attacks that they found to work and cause problems.
       | 
       | I mean the details about particular server pairs having issues is
       | great information, as is the fuzzing setup (great use of
       | differential fuzzing) but I think more important is being able to
       | take these potential attack avenues that they had success with
       | and running them against your own deployments. Given how many
       | applications internally are running their own stacks there is
       | still a lot of room for potential issues. I can imagine people
       | running with some of these for bounties in the near future.
       | 
       | A brief summary of the manipulations they had some success with
       | are on pages 7-8. Though if you don't feel like reading, the
       | "headline" version of the list gives you a pretty decent idea of
       | what was having an impact:
       | 
       | Request Line Mutations
       | 
       | - Mangled Method - Distorted Protocol - Invalid Version -
       | Manipulated Termination - Embedded Request Lines
       | 
       | Request Headers Mutations
       | 
       | - Distorted Header Value - Manipulated Termination - Expect
       | Header - Identity Encoding - V1.0 Chunked Encoding - Double
       | Transfer-Encoding
       | 
       | Request Body Mutations
       | 
       | - Chunk-Size Chunk-Data Mismatch - Manipulated Chunk-Size
       | Termination - Manipulated Chunk-Extension Termination -
       | Manipulated Chunk-Data Termination - Mangled Last-Chunk
       | 
       | ---
       | 
       | And a bit of self-promotion but we talked about this paper on
       | Monday on the DAY[0] podcast I cohost (24m53s - 35m30s)
       | https://www.youtube.com/watch?v=GmOuX8nHZuc&t=1497s
        
       | Joker_vD wrote:
       | You know, articles like this make me wish an OS would actually
       | have a built-in fast, reliable, fully-featured HTTP parser. I've
       | written a couple of (very strict) HTTP parsers on my own, and
       | this whole "request smuggling" is possibly only because the HTTP
       | messages have rather delicate and fragile, totally non-robust
       | framing and structure. Miss a letter in one of the relevant RFCs
       | (IIRC, you need to read at least the first 3 RFCs to learn the
       | complete wire format of an HTTP message) and you'll end with a
       | subtly non-compliant and vulnerable parser.
       | 
       | And yet, every single programming language/platform build their
       | own HTTP-handling library, usually several, of very varying
       | quality and feature support. Again, it would not be as bad if
       | HTTP was a robust format where you could skip recognizing and
       | correctly dealing with half of the features you don't intend to
       | support but it is not: even if you don't want to accept e.g.
       | trailers, you still have to be aware of those. We have OpenSSL,
       | why not also have OpenHTTP (in sans-io style)?
        
         | staticassertion wrote:
         | I think FFI, as you mention with OpenSSL, would be the better
         | approach. And I think this would be a good idea in general. But
         | most languages don't make FFI easy on either side.
        
         | silvestrov wrote:
         | The best solution would be to make an http version 4 with a
         | non-fragile format, e.g. json.
         | 
         | Otherwise we will keep chasing bugs forever.
        
           | dtech wrote:
           | http 2 and 3 have much stricter defined binary formats. JSON
           | would be a step back in terms of spec and performance.
        
           | josephg wrote:
           | A lot of the new http bugs aren't caused by ambiguities in
           | http1 headers, or ambiguities in http2 headers. They happen
           | when an http2 message gets rewritten into http1 and "valid"
           | http header characters (like new lines) show up as header
           | separators in http1.
           | 
           | The problem isn't that we don't have a good header format.
           | The problem is we have too many.
        
           | nmadden wrote:
           | Configuring your webserver/reverse proxy to talk HTTP/2 to
           | backend appservers is a good improvement against request
           | smuggling. (If they support it, sadly not guaranteed). The
           | binary format is much less ambiguous.
           | 
           | There is a talk by James Kettle about request smuggling with
           | HTTP/2, but it is largely about attacks when the frontend
           | talks HTTP/2 and then converts to HTTP/1.1 to talk to backend
           | servers [1]. That said, it does also highlight some
           | HTTP/2-only quirks, so it's not completely perfect, but it's
           | so much better than HTTP/1.1.
           | 
           | [1]: https://portswigger.net/kb/papers/rfekn2Uv/HTTP2whitepap
           | er.p...
        
         | richdougherty wrote:
         | I've implemented a few things from RFCs and I always wish that
         | for each RFC there was a library of test cases to test your
         | implementation.
         | 
         | Does anyone know if there is anything like this for HTTP or
         | associated RFCs?
         | 
         | Eg, for HTTP header parameter, names can have a * to change the
         | character encoding of the parameter value. How many
         | implementations test this? Or tests for decoding of URI paths
         | that contain escaped / characters to make sure they're not
         | confused with the /s that are the path separators.
        
           | Joker_vD wrote:
           | Or at least a bunch of examples in the RFC itself. Don't you
           | just love reading a long description of a convoluted data
           | format with literally zero examples of how the full thing
           | looks and what it is supposed to mean? Sadly, leaving the
           | validation undocumented is pretty common across
           | formats/protocol descriptions, and RFCs actually seem to
           | generally be on the "more specific" end of scale, thanks to
           | the ubiquitous use of MUST/SHOULD language. But I've recently
           | wrote a toy ELF parser and it's amazing how many things in
           | its spec are left implicit: e.g. you probably should check
           | that calculating a segment's end (base+size) doesn't overflow
           | and wrap over zero... should you? Maybe you're supposed to
           | support segments that span over the MAX_MEMORY_ADDRESS into
           | the lower memory, who knows? The spec does not say.
        
             | toomim wrote:
             | You know, the IETF is an open group, and you can write some
             | examples and submit them as a pull request:
             | 
             | https://httpwg.org/CONTRIBUTING.html
        
           | aaaaaaaaaaab wrote:
           | >Eg, for HTTP header parameter, names can have a * to change
           | the character encoding of the parameter value
           | 
           | Where did you read this? HTTP header fields may contain MIME-
           | encoded values using the encoding scheme outlined in rfc2047,
           | but I haven't heard of the asterisk having any special
           | meaning...
        
         | pid-1 wrote:
         | Why is HTTP so complex? The base use case (hypermedia request-
         | response) sounds really simple.
        
           | dtech wrote:
           | HTTP 1.1 is an old protocol, over time new requirements made
           | modifications necessary, some things fell out of use, and
           | some changes turned out to be mistakes. That it's text-based
           | without using doesn't help
           | 
           | The basis is simple, but then add Cookies, HTTPS,
           | Authentication, Redirecting, Host headers, caching, chunked
           | encoding, WebDAV, CORS, etc etc. All justifiable but all
           | adding complexity.
        
           | bawolff wrote:
           | Http/0.9 is pretty simple, but for a fast web we needmore
           | complexity.
        
             | Joker_vD wrote:
             | More parsing and data processing = faster web. It all makes
             | sense, really!
             | 
             | Joking aside, some "features" in HTTP/1.1 are _really_
             | questionable. Trailing headers? 1xx responses? Comments in
             | chunked encoding? The headers that proxies _must_ cut out
             | in addition to those specified in  "Connection" header
             | except the complete list of those is specified nowhere? The
             | methods that _prohibit_ the request /response to have a
             | body but again, the full list is nowhere to be found?
             | 
             | All these features have justifications but the end result
             | is a protocol with rather baroque syntax and semantics.
             | 
             | P.S. By the way, the HTTP/1.1 specs allow a GET request to
             | have a body in chunked encoding -- guess how many existing
             | servers support that.
        
           | xxpor wrote:
           | Nearly all L7 protocols and their parsers are complex. HTTP
           | is kind of simple, relatively speaking.
        
         | goodthenandnow wrote:
         | Fuchsia has an http client component [0][1] which is part of
         | the platform and, given Fuchsia's component architecture, it's
         | accessed through a message-passing protocol [2] which is
         | programming language agnostic.
         | 
         | [0]:
         | https://cs.opensource.google/fuchsia/fuchsia/+/main:sdk/fidl...
         | [1]:
         | https://cs.opensource.google/fuchsia/fuchsia/+/main:src/conn...
         | [2]: https://fuchsia.dev/fuchsia-
         | src/reference/fidl/language/lang...
        
       | tdumitrescu wrote:
       | Nice. Link to the whitepaper:
       | https://bahruz.me/papers/ccs2021treqs.pdf
       | 
       | I only skimmed very quickly to look for which server setups they
       | found new vulnerabilities for, and it looked like they tested a
       | 2D matrix of popular webservers/caches/reverse-proxies with each
       | other? Which is neat for automation, but in the real world I'm
       | not usually going to be running haproxy behind nginx or vice
       | versa. I'd be much more interested in findings for popular
       | webserver->appserver setups, e.g., nginx in front of
       | gunicorn/django.
        
         | capitainenemo wrote:
         | Tomcat in front of anything else (Apache, Nginx) is a common
         | combination they tested. This is for a Java application with a
         | webserver frontend that's enforcing
         | rules/caching/authentication.
        
         | dorianmariefr wrote:
         | the grammar is pretty great
        
         | bawolff wrote:
         | I've definitely seen people do nginx+haproxy setups in the real
         | world.
        
           | tdumitrescu wrote:
           | Sure, I'm not saying it doesn't happen or that there's no
           | reason to do it, I just think that in practical terms the
           | much more widespread attack surface area would be interaction
           | between one of these and common application servers.
        
       | staticassertion wrote:
       | This sort of thing is why it's nice to have authz throughout your
       | environment. A client request that gets incorrectly forwarded to
       | a proxy should be rejected by the downstream service.
       | 
       | The problem is that people keep all of their authn/authz at the
       | boundary and then, once you're past that, it's a free for all.
       | 
       | Every service needs to validate authorization of the request.
        
       ___________________________________________________________________
       (page generated 2021-11-25 23:00 UTC)