[HN Gopher] I found (and fixed) a vulnerability in Python
       ___________________________________________________________________
        
       I found (and fixed) a vulnerability in Python
        
       Author : signa11
       Score  : 89 points
       Date   : 2021-12-26 13:43 UTC (9 hours ago)
        
 (HTM) web link (tldr.engineering)
 (TXT) w3m dump (tldr.engineering)
        
       | earth2mars wrote:
       | I like the side navigation on your site. it would be nice if you
       | add keyboard shortcuts (Arrows) to navigate the site.
        
       | tyingq wrote:
       | I wondered why this bug wouldn't raise itself as a data
       | corruption problem, where url parameter keys or values had a
       | semicolon in them on purpose, which would then corrupt the data.
       | 
       | It seems it's because you're supposed to encode semicolons. So
       | while urllib.parse is doing the wrong thing, it doesn't come up
       | much in practice because urllib.parse.urlencode() encodes the
       | semi-colon as %3B.
        
       | jepler wrote:
       | This broke my blog, but then I fixed it.
        
       | hk1337 wrote:
       | It was brought up once of the possibility it was intended to
       | support matrix URIs.
       | 
       | https://www.reddit.com/r/programming/comments/rosx5i/comment...
       | 
       | I hope they didn't fix one bug without considering its impact to
       | other features.
        
       | charcircuit wrote:
       | Isn't this a vulnerability with the proxies who don't understand
       | ; can be used as a separator?
        
         | tyingq wrote:
         | In this case, he initially presumed that the problem would be
         | with Flask, Bottle, and Tornado still honoring a semi-colon as
         | a separator in a uri. You're not supposed to, according to 2014
         | W3C guidance:
         | 
         |  _" Let strings be the result of strictly splitting the string
         | payload on U+0026 AMPERSAND characters"_
         | 
         | From: https://www.w3.org/TR/2014/REC-
         | html5-20141028/forms.html#url...
         | 
         | They removed the old 1999 guidance that said _" in particular,
         | CGI implementors support the use of ";" in place of "&" to save
         | authors the trouble of escaping "&" characters in this
         | manner."_
         | 
         | As it turns out, the issue was in urllib.parse, a library that
         | ships with python.
        
           | charcircuit wrote:
           | This seems like a breaking change that can break links
           | throughout the web. It seems better for me to keep the links
           | working and fix the proxies then to break links and change
           | python.
        
             | tyingq wrote:
             | I'm not sure why you keep referencing proxies. Lots of
             | different software uses urllib, so it would affect web
             | servers too...many of them without fronting proxies. Some
             | client side software projects use it too, like scrapers,
             | for example.
        
               | charcircuit wrote:
               | >I'm not sure why you keep referencing proxies
               | 
               | That's what I thought the article was about.
        
               | tyingq wrote:
               | Ah, okay. It is about web cache poisoning, but there are
               | many web caches that would use urllib.parse without a
               | proxy being involved. Like an app directly counting on
               | urllib.parse to generate a key for a redis cache. And the
               | base problem happens to have broader implications than
               | just cache poisoning. It could cause, for example, plain
               | old data corruption for url parameter values.
        
           | jasonhansel wrote:
           | From that spec:
           | 
           | > If any of the name-value pairs in pairs have a name
           | component consisting of the string "_charset_" encoded in US-
           | ASCII, and the value component of the first such pair, when
           | decoded as US-ASCII, is the name of a supported character
           | encoding, then let encoding be that character encoding
           | (replacing the default passed to the algorithm).
           | 
           | Ouch. I hope nobody actually implements that.
        
             | tyingq wrote:
             | I couldn't find support for that in urllib or python
             | requests.
             | 
             | But googling a bit does show it's supported in a lot of
             | software.
             | 
             | For example:
             | https://bugzilla.mozilla.org/show_bug.cgi?id=18643
        
           | the_mitsuhiko wrote:
           | > In this case, he initially presumed that the problem would
           | be with Flask, Bottle, and Tornado still honoring a semi-
           | colon as a separator in a uri
           | 
           | Flask (or rather Werkzeug what it's based on) removed
           | implicit semicolon support for URL 13 years ago: https://gith
           | ub.com/pallets/werkzeug/commit/0ea28bbc6f5f05eef...
        
             | tyingq wrote:
             | Yeah, it appears Flask doesn't use urllib either. I think
             | he was just outlining his approach though. That he started
             | with popular frameworks, assuming any issues would be
             | there, but ended up in urllib.
             | 
             | Bottle does use parts of urllib but appears to do it's own
             | splitting on ampersands: https://github.com/bottlepy/bottle
             | /blob/d1413a81ead7a6f130a0...
        
         | dmurray wrote:
         | A quick look at the W3C source on the matter only mentions & as
         | a separator, so I think it's correct to say allowing something
         | else is non-standard.
         | 
         | https://www.w3.org/TR/REC-html40/interact/forms.html#form-co...
        
           | oblvious-earth wrote:
           | ";" is mentioned in the appendix as an implementation side
           | note: https://www.w3.org/TR/1999/REC-
           | html401-19991224/appendix/not...
        
             | jbverschoor wrote:
             | > We recommend that HTTP server implementors, and in
             | particular, CGI implementors support the use of ";" in
             | place of "&" to save authors the trouble of escaping "&"
             | characters in this manner.
             | 
             | Thankfully that did not happen (widespread), and now people
             | understand they have to escape.
             | 
             | I do remember jsessionid being after the semi-colon..
        
             | tyingq wrote:
             | That's the old 1999 spec. You would want the newer, 2014
             | one: https://www.w3.org/TR/2014/REC-
             | html5-20141028/forms.html#url...
        
               | btilly wrote:
               | But at the time that urllib.parse was written, that 1999
               | spec WAS the official spec.
        
         | ploxiln wrote:
         | ... and only if the proxy attempted to interpret the query
         | parameters. If they just route based on path, and do not modify
         | the query at all, and do not try to use a query parameter as a
         | cache key, then there's no problem.
         | 
         | This is really about caching proxies. If the proxy attempts to
         | cache based on some attributes of the request, then it has a
         | lot of request normalizing it needs to do. Some more examples
         | of things that some languages (in this case Go) may parse
         | differently than some web proxies:
         | https://github.com/golang/go/issues/25192#issuecomment-80625...
         | 
         | And a more overall argument that putting this "vulnerability"
         | on the language library (in Python and Go) does not make the
         | most sense:
         | https://github.com/golang/go/issues/25192#issuecomment-79685...
        
       ___________________________________________________________________
       (page generated 2021-12-26 23:01 UTC)