[HN Gopher] I found (and fixed) a vulnerability in Python
___________________________________________________________________
I found (and fixed) a vulnerability in Python
Author : signa11
Score : 89 points
Date : 2021-12-26 13:43 UTC (9 hours ago)
(HTM) web link (tldr.engineering)
(TXT) w3m dump (tldr.engineering)
| earth2mars wrote:
| I like the side navigation on your site. it would be nice if you
| add keyboard shortcuts (Arrows) to navigate the site.
| tyingq wrote:
| I wondered why this bug wouldn't raise itself as a data
| corruption problem, where url parameter keys or values had a
| semicolon in them on purpose, which would then corrupt the data.
|
| It seems it's because you're supposed to encode semicolons. So
| while urllib.parse is doing the wrong thing, it doesn't come up
| much in practice because urllib.parse.urlencode() encodes the
| semi-colon as %3B.
| jepler wrote:
| This broke my blog, but then I fixed it.
| hk1337 wrote:
| It was brought up once of the possibility it was intended to
| support matrix URIs.
|
| https://www.reddit.com/r/programming/comments/rosx5i/comment...
|
| I hope they didn't fix one bug without considering its impact to
| other features.
| charcircuit wrote:
| Isn't this a vulnerability with the proxies who don't understand
| ; can be used as a separator?
| tyingq wrote:
| In this case, he initially presumed that the problem would be
| with Flask, Bottle, and Tornado still honoring a semi-colon as
| a separator in a uri. You're not supposed to, according to 2014
| W3C guidance:
|
| _" Let strings be the result of strictly splitting the string
| payload on U+0026 AMPERSAND characters"_
|
| From: https://www.w3.org/TR/2014/REC-
| html5-20141028/forms.html#url...
|
| They removed the old 1999 guidance that said _" in particular,
| CGI implementors support the use of ";" in place of "&" to save
| authors the trouble of escaping "&" characters in this
| manner."_
|
| As it turns out, the issue was in urllib.parse, a library that
| ships with python.
| charcircuit wrote:
| This seems like a breaking change that can break links
| throughout the web. It seems better for me to keep the links
| working and fix the proxies then to break links and change
| python.
| tyingq wrote:
| I'm not sure why you keep referencing proxies. Lots of
| different software uses urllib, so it would affect web
| servers too...many of them without fronting proxies. Some
| client side software projects use it too, like scrapers,
| for example.
| charcircuit wrote:
| >I'm not sure why you keep referencing proxies
|
| That's what I thought the article was about.
| tyingq wrote:
| Ah, okay. It is about web cache poisoning, but there are
| many web caches that would use urllib.parse without a
| proxy being involved. Like an app directly counting on
| urllib.parse to generate a key for a redis cache. And the
| base problem happens to have broader implications than
| just cache poisoning. It could cause, for example, plain
| old data corruption for url parameter values.
| jasonhansel wrote:
| From that spec:
|
| > If any of the name-value pairs in pairs have a name
| component consisting of the string "_charset_" encoded in US-
| ASCII, and the value component of the first such pair, when
| decoded as US-ASCII, is the name of a supported character
| encoding, then let encoding be that character encoding
| (replacing the default passed to the algorithm).
|
| Ouch. I hope nobody actually implements that.
| tyingq wrote:
| I couldn't find support for that in urllib or python
| requests.
|
| But googling a bit does show it's supported in a lot of
| software.
|
| For example:
| https://bugzilla.mozilla.org/show_bug.cgi?id=18643
| the_mitsuhiko wrote:
| > In this case, he initially presumed that the problem would
| be with Flask, Bottle, and Tornado still honoring a semi-
| colon as a separator in a uri
|
| Flask (or rather Werkzeug what it's based on) removed
| implicit semicolon support for URL 13 years ago: https://gith
| ub.com/pallets/werkzeug/commit/0ea28bbc6f5f05eef...
| tyingq wrote:
| Yeah, it appears Flask doesn't use urllib either. I think
| he was just outlining his approach though. That he started
| with popular frameworks, assuming any issues would be
| there, but ended up in urllib.
|
| Bottle does use parts of urllib but appears to do it's own
| splitting on ampersands: https://github.com/bottlepy/bottle
| /blob/d1413a81ead7a6f130a0...
| dmurray wrote:
| A quick look at the W3C source on the matter only mentions & as
| a separator, so I think it's correct to say allowing something
| else is non-standard.
|
| https://www.w3.org/TR/REC-html40/interact/forms.html#form-co...
| oblvious-earth wrote:
| ";" is mentioned in the appendix as an implementation side
| note: https://www.w3.org/TR/1999/REC-
| html401-19991224/appendix/not...
| jbverschoor wrote:
| > We recommend that HTTP server implementors, and in
| particular, CGI implementors support the use of ";" in
| place of "&" to save authors the trouble of escaping "&"
| characters in this manner.
|
| Thankfully that did not happen (widespread), and now people
| understand they have to escape.
|
| I do remember jsessionid being after the semi-colon..
| tyingq wrote:
| That's the old 1999 spec. You would want the newer, 2014
| one: https://www.w3.org/TR/2014/REC-
| html5-20141028/forms.html#url...
| btilly wrote:
| But at the time that urllib.parse was written, that 1999
| spec WAS the official spec.
| ploxiln wrote:
| ... and only if the proxy attempted to interpret the query
| parameters. If they just route based on path, and do not modify
| the query at all, and do not try to use a query parameter as a
| cache key, then there's no problem.
|
| This is really about caching proxies. If the proxy attempts to
| cache based on some attributes of the request, then it has a
| lot of request normalizing it needs to do. Some more examples
| of things that some languages (in this case Go) may parse
| differently than some web proxies:
| https://github.com/golang/go/issues/25192#issuecomment-80625...
|
| And a more overall argument that putting this "vulnerability"
| on the language library (in Python and Go) does not make the
| most sense:
| https://github.com/golang/go/issues/25192#issuecomment-79685...
___________________________________________________________________
(page generated 2021-12-26 23:01 UTC)