[HN Gopher] Web hacking techniques of 2021
___________________________________________________________________
Web hacking techniques of 2021
Author : adrianomartins
Score : 456 points
Date : 2022-02-10 09:23 UTC (13 hours ago)
(HTM) web link (portswigger.net)
(TXT) w3m dump (portswigger.net)
| furstenheim wrote:
| It got me thinking, is client side rendering intrinsically safer
| than SSR.
|
| SQL queries with params are safer because data and code flow
| separately. Similarly, if you query backend for data and then do
| textContent = response, that cannot do xss, right?
| chrismorgan wrote:
| No, client-side rendering is not intrinsically safer than
| server-side rendering, provided all outputs of serialisation
| are parsed identically (as is the case for _valid_ HTML trees).
|
| The problems start when you try to manipulate serialised data,
| which is not safe to do this in the general case. You should
| instead construct a proper representation of what you desire,
| and then serialise _that_ , depending on the serialiser to take
| care of all of this sort of stuff. This approach has always
| been fairly popular in compiled languages and languages that
| like types, but dynamic languages have historically
| significantly preferred to manipulate strings, I suspect
| because they don't have good ergonomics on the other approach,
| and it's probably slower in interpreted languages--you'll note
| that React felt the need to extend JavaScript to make its
| approach acceptable to people.
|
| Most JavaScript stuff that supports server-side rendering now
| is working in this way, crafting a DOM tree and then
| serialising that. Svelte is a notable exception in that it
| takes a declarative DOM tree and essentially serialises what it
| can at compile time, thereby still retaining the required
| safety guarantees.
|
| There are definitely downsides to strict adherence to the model
| of crafting a data structure and then serialising it; most
| significantly, you can't start streaming a response until
| you're done. The solution for this is to use an append-only
| data structure (or possibly one that allows you to "commit" the
| document up to a given point, while still allowing mutations in
| anything that occurs later in the document); thus serialisation
| can begin before you finish writing the document.
|
| You know the old favourite about parsing HTML with regular
| expressions?
| <https://stackoverflow.com/questions/1732348/regex-match-
| open...> (If not, enjoy!) This is the thing people need to
| understand and realise in the general case: serialised data
| should be treated as _opaque_ , and only interacted with after
| real parsing and before real serialisation.
|
| HTTP headers aren't strings; "Date: Tue, 15 Nov 1994 08:12:31
| GMT" is a _serialised_ HTTP header, representing the actual
| header that's more like {Date, 1994-11-15T08:12:31Z}. And that
| latter is the form you should interact with it in.
|
| HTML isn't strings; "<p>Hello, world!</p>" is the _serialised_
| form of a paragraph element containing a text node with data
| "Hello, world!". And that's the form you should interact with
| it in.
|
| Yes, I am presenting a strongly-opinionated position that lacks
| any shade of pragmatism. Yes, my website is generated with
| templates that manipulate serialised HTML. Eventually I'll
| replace it with something more sound.
|
| One last note: at the start I said _valid_ HTML, because it's
| not enough to just serialise an arbitrary HTML DOM tree, as you
| can easily craft invalid HTML DOM trees, like nesting
| hyperlinks. In most regards, the XML syntax of HTML (still a
| thing) is actually a safer target to serialise to because then
| you don't even need to validate your tree to be confident it
| won't get mangled by the serialise /parse round-trip.
| furstenheim wrote:
| Sorry, what do you mean by parsed identically? In CSR you can
| have data displayed into the front-end without ever be parsed
| as HTML. You do some http call to the backend, get a json get
| the property and do, element.textContent = myData. If that's
| unsafe there would be a bug in the browser, ain't it?
| chrismorgan wrote:
| I was going to use optional start tags and tbody as my
| example, but on checking the spec it turns out that tr _is_
| actually valid as a direct child of table, even if the HTML
| syntax will prevent you from creating it by inserting a
| tbody around it. (XHTML 1.0 validation also confirms that
| tbody is genuinely optional there.) This actually
| undermines my "as is the case in _valid_ HTML"--but never
| mind, I'll demonstrate what the point was, and what is at
| least _generally_ the case.
|
| So let's go with a more egregious invalidity: nested links.
| Which browsers _do_ actually support, but HTML syntax
| doesn't. Suppose you produce this DOM tree (server side or
| client side, I don't care): p + #text
| "Look at this " + a href="https://a.example" |
| + #text "link with " | + a href="https://b.example"
| | | + #text "nesting" | + #text " like so" +
| #text "!"
|
| (Client-side, you could generate it like this:
| let p = document.createElement("p"); let a1 =
| document.createElement("a"); let a2 =
| document.createElement("a"); a1.href =
| "https://a.example"; a2.href = "https://b.example";
| a2.append("nesting"); a1.append("link with ", a2, "
| like so"); p.append("Look at this ", a1, "!");
|
| )
|
| That serialises to this in both HTML and XML syntaxes:
| <p>Look at this <a href="https://a.example">link with <a
| href="https://b.example">nesting</a> like so</a>!</p>
|
| (Client-side, `p.outerHTML`; `new
| XMLSerializer().serializeToString(p)` shows the XML syntax,
| which is the same modulo an xmlns attribute for XML
| reasons. Incidentally, `p.outerHTML` gives you HTML syntax
| for an HTML-syntax document and XML syntax for an XML-
| syntax document, which mostly means if you served the file
| with the application/xhtml+xml MIME type.)
|
| But parse _that_ with the HTML syntax, and the nested links
| break (e.g. `document.body.innerHTML = p.outerHTML`):
| p + #text "Look at this " + a
| href="https://a.example" | + #text "link with "
| + a href="https://b.example" | + #text "nesting"
| + #text " like so!"
|
| And _that_ is the steady state (meaning you can round-trip
| it again as much as you like and it will no longer change):
| <p>Look at this <a href="https://a.example">link with
| </a><a href="https://b.example">nesting</a> like so!</p>
|
| Returning to the initial remark you're asking about: I
| wrote that having more than just HTML in mind (kind of why
| I brought HTTP into it later on, and because other formats
| like Markdown may be being used, and who knows about it;
| and in the parent comment, SQL parameters had been
| mentioned, which is also a good example of the issue in
| hand), that this is a general remark about stability and
| safety: that interpolating strings raw is just dangerous,
| and that you should parse and serialise-- _provided_ the
| format has been designed so that that's a safe operation.
| As it happens, the typical DOM tree representation of HTML
| _doesn't_ protect you enough, so you need to work with
| _valid_ HTML for it to be fully robust.
|
| Actually, I've just thought of the perfect example of why
| valid HTML is important when you're crafting a tree for
| serialisation, because it actually _would_ introduce an
| injection vulnerability: comments. Contemplate this:
| document.createComment('--><script>alert("pwnd")</script><!
| --') #comment
| "--><script>alert("pwnd")</script><!--" <!--
| --><script>alert("pwnd")</script><!-- -->
|
| Or you could break scripts by injecting </script> or
| stylesheets by injecting </style>, given that they don't
| use HTML entity escaping. I _think_ these are the only
| cases where invalid HTML could actually be _harmful_ ; most
| places (not that there are many--optional start tags, link
| nesting and paragraph nesting are just about it) it'll just
| shuffle the DOM slightly.
|
| Y'know what? I'm starting to think even the _tree_ form is
| rather dangerous to work in for HTML. XML syntax protects
| you from almost all inconsistency, but doesn't guard
| against that comment attack (that's literally the only
| thing it'll miss) and loses the <noscript> element.
|
| I'm tempted to retract my position that client-side
| rendering is not intrinsically safer than server-side, but
| so long as you have a step that _validates_ your HTML
| before you serialise it, you're still OK (and even the
| breakages depend on injecting arbitrary content into a
| comment, script tag or style tag, which are all extremely
| unlikely), so I retain my position, now hanging
| precariously from that delicate thread of the word
| "intrinsically". I think there's a gaping chasm below me.
| Hopefully there's something soft to land on.
| jcims wrote:
| Anytime is possible for the data that returns to be
| interpolated by the client, you could have xss or related
| attack.
|
| Client side rendering does help but mistakes are still
| regularly made. Sometimes by the app dev, sometimes by the
| framework dev.
|
| You could probably go to an extreme and return all of your
| application data as sprites.
| furstenheim wrote:
| Of course you can still do <div> + input + </div> in CSR, but
| you can definitely not do myelement.textContent =
| whateverIGot in SSR, right?
| asddubs wrote:
| you can use a template engine that escapes all variables by
| default. in either case, it's just about coding defensively
| and being secure by default
| furstenheim wrote:
| Then why is parameter query safer? And not just escapes
| variables? Escaping is hard, as shown in the article
| alcover wrote:
| > textContent = response
|
| Good question (that none of the replies seem to address). That
| is exactly what I would do if rendering 'tainted' text.
|
| Can someone please tell us how it could be defeated ?
| asddubs wrote:
| if you use a template engine with sane defaults, you can
| achieve the same level of safety.
| TheAdamist wrote:
| The hn title needs updating as it's misleading, even if it
| reflects the title on the website. The first sentence even
| clarifies it's only new techniques.
|
| "Welcome to the Top 10 (new) Web Hacking Techniques of 2021, the
| latest iteration of our annual community-powered effort to
| identify the most significant web security research released in
| the last year".
|
| The top web hacking techniques used and the top new ones I would
| expect to be very different lists.
| badrabbit wrote:
| This guy's work always impresses me. He had a nice Blackhat brief
| as well.
|
| This list is great and all for redteamers but as a defender, I
| would like to know if any actual threat actors used these
| techniques even after publication. Even with all the
| secret/private and public threat intel I am aware of, none of
| them register. Not knocking down on threat research, I am
| honestly curious because I can't tell if I should be on the look
| out for any real threat actors using these techniques.
| FastEatSlow wrote:
| Yes, actual threat actors use these techniques even after
| publication. There is a lot of outdated/misconfigured systems
| in the wild. A fairly recent example is the defacing of
| multiple Ukrainian government websites[1], through exploiting a
| vulnerability fixed and publicised in august 2021. There's also
| around 10,000 (can't remember where that statistic is from)
| Huawei routers on the internet vulnerable to an issue from
| 2015, which are constantly being infected with botnet worms.
|
| [1] https://www.bleepingcomputer.com/news/security/multiple-
| ukra...
| badrabbit wrote:
| I know web exploits happen all the time first hand.
|
| > all 15 compromised Ukrainian sites were using an outdated
| version of the October CMS, vulnerable to CVE-2021-32648.
|
| That cve looks like it was caused by someone doing == instead
| of === in php.
|
| My question was things like request smuggling and protocol
| abuse attacks have ever been seen in the "wild".
| fendy3002 wrote:
| Man the JSON inconsistency one is creative. I know it's not
| consistent implementation across languages, but I don't know it
| can be used to such attacks.
| FabHK wrote:
| Yes. The big take-away for me, whether it's JSON or YAML or XML
| or whatever: never parse anything more than once (and
| definitely not with different parsers).
| formerly_proven wrote:
| Five out of ten new techniques are langsec, which makes them
| inherently difficult to fix, yet we keep using unreasonably
| complex languages for protocols and keep stapling on more
| complexity, resulting in formally assured insecurity.
| nyanpasu64 wrote:
| http://langsec.org/ does a spectacularly poor job of
| introducing langsec to the uninitiated. It appears to be a list
| of conferences and papers for academics, followed by
| http://langsec.org/bof-handout.pdf which makes unsubstantiated
| assertions and doesn't elaborate. I think more people would
| learn about langsec if the homepage contained an introduction
| followed by a guided tour of articles which incrementally teach
| the current state of the field in an organized accessible
| fashion.
|
| EDIT: I found https://scribe.rip/1b92451d4764 which purports to
| be an "introduction followed by a tour", which links to
| "Security Applications of Formal Language Theory" and "The
| Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to
| Expunge Them". The second seems not very practical/applied or
| hands-on, and the first is quite long and academic (I haven't
| read it yet). It _might_ be useful as reference material, but I
| 'd be interested to see examples of designing/refactoring
| systems to be more secure based on langsec.
| rank0 wrote:
| The paper linked in your EDIT is awesome. I'm an AppSec
| engineer and I had never encountered a term like "shotgun
| parser". What the authors describe as shotgun parsing is
| exactly what I've seen from reviewing validation logic across
| hundreds of enterprise applications. It's nice to have a name
| for the pattern.
|
| The worst part of shotgun parsing and loosely defined input
| structure is the difficulty of remediation. I constantly
| receive pushback from dev teams when I ask them to use regex-
| based validation per field. What sounds like a simple task
| actually becomes extremely difficult because lots of apps
| populate datasets via convoluted monolithic endpoints. Dev
| teams would have to change the way in which shared services
| structure and output information. Those shared services are
| frequently maintained by other teams and any other
| application which consume the same data would also need to be
| modified.
|
| In the end, it becomes a compromise where the ad-hoc parsing
| is tightened/modified to be "good enough". This
| bubblegum/duct-tape fix only further cements the ad-hoc
| parsing throughout the org.
| jcims wrote:
| I was full time infosec from 1998 until 2015 then moved into
| an adjacent role that is still technically infosec but is
| more infrastructure/platform controls. This is the first time
| i recall ever seeing the term.
|
| Based on reading the two sentence synopsis in Google results
| it's largely indistinguishable from the more familiar "formal
| methods" or "formal verification".
| the-alt-one wrote:
| KTH has a course called Language based security (
| https://www.kth.se/student/kurser/kurs/DD2525?l=en ) which
| indeed does come from people involved in formal methods.
|
| Formal methods is a huge area though, but in essence it's
| about establishing proofs of correctness.
| ooedemis wrote:
| What about GWT-Google Web Toolkit its actually not so many
| updated and under top news but the idea is implement in a prooven
| language java both frontend and backend
| scanr wrote:
| The work on exploiting prototype pollution was excellent
| https://blog.s1r1us.ninja/research/PP
|
| I didn't know about the --disable-proto option in node or the
| Document Policy proposal for dealing with it.
|
| Amazing that 80% of nested query parameter parsers were
| susceptible to prototype pollution.
| adrianomartins wrote:
| Interesting community built list of the top 10 web hacking
| vulnerabilities used in 2021. If you're making a web product you
| might want your team to quickly run over these.
| TheAdamist wrote:
| It's not the top 10 used, it's the top 10 new for 2021
| techniques, and specifically excludes older techniques.
| Agamus wrote:
| I'm not an expert here, but truly interested to hear responses to
| this question.
|
| To say that 1+1=2 is "true", does that not require a corollary in
| "reality" to something fundamental that can be called a "one"
| object? I believe this is called mathematical constructivism.
|
| Imagine, hypothetically, that we cannot identify something that
| is physically fundamental and individual. My question is whether
| any mathematics in that scenario could be considered "true"
| without such constructivism, in other words, without a physical
| correspondence to an unquestionably, physically fundamental "one"
| object.
| [deleted]
| hbn wrote:
| Not super on topic, but every time this site is linked, I never
| properly read the URL correctly. My brain immediately thinks the
| space is between the 's' and 'w'
| mywacaday wrote:
| Same with expertsexchange.com
| thefreeman wrote:
| thanks, now i'll never be able to read it properly again. :(
| losthobbies wrote:
| The dependency confusion article on Medium was a great read.
| airstrike wrote:
| It's a really good article and apologies to the author for
| nitpicking but even as a bona fide Python fanboy I had to raise
| my eyebrows at this statement:
|
| _> Some programming languages, like Python, come with an easy,
| more or less official method of installing dependencies for
| your projects._
| nawgz wrote:
| I mean, have you ever used a language like Java? Python has a
| bad package manager story, sure, but it has a package manager
| story - that's not actually particularly global afaik
| remus wrote:
| Beautifully simple! Exfiltrating data via a DNS request was a
| nice little trick too.
| [deleted]
| baobabKoodaa wrote:
| It's amazing that such a simple vulnerability can be leveraged
| in practice to gain access to so many machines on so many
| different organizations. Props to the researcher!
| clarnaskirq wrote:
| As a web programmer, for whom the majority of this article is not
| only new, but difficult to comprehend, it makes me yearn to
| improve my web security knowledge. Any pointers?
| orangepurple wrote:
| Go through each line item in the article and create a proof of
| concept for yourself. You will learn a lot along the way too.
| doopy1 wrote:
| You can look at the disclosed reports on hackerone and get a
| feel for the kind of stuff that's being exploited and how it's
| being addressed.
| ipnon wrote:
| Do some of your own hacking on hackthebox.com. It is shocking
| what can be done with only a week of security training by an
| already experienced programmer. It becomes clear that the
| typical software engineer doesn't give a _single_ thought to
| security.
| ridiculous_leke wrote:
| I suggest going through cheatsheets on OWASP. Most of it is
| comprehensible to any web programmer. Here's one example:
|
| https://cheatsheetseries.owasp.org/cheatsheets/PHP_Configura...
| icare_1er wrote:
| It baffles me how convoluted and complex the webapp attacks have
| become over the past few years.
|
| I think this is an effect of bug-bounty hunting, which has pretty
| much opened the research on those topics to a massive community.
| bawolff wrote:
| Kind of feels a little repetitive to have request smugguling on
| the list 3 different times.
| ackbar03 wrote:
| Anyone here that works on these kind of deep-dive type of
| security research? Can you give a TLDR of how do you usually set
| everything up to find these results?
|
| As in, do you set up some sort of test environment/website with
| full debug logs and take if one step at a time from there? If so,
| how to you ensure that it is realistic and relevant to real world
| use since real-world architecture might differ from a setup that
| worked in your experiments?
|
| I ask this because I used to do some bug bounties and it
| consisted of a lot of painful trial and error. I can't imagine
| anything new and profound can be found that way.
|
| (PS in case it isn't obvious I didn't open up the research links
| and read in detail, hence a tldr)
| EdOverflow wrote:
| I am a security researcher referenced in the winning web-
| hacking technique on that list ("Dependency Confusion" by Alex
| Birsan [1]) and was ranked 7th in Portswigger's 2019 issue
| [2,3]. My motto has always been "Learn to make it; then break
| it." In other words, I invest a lot of time familiarising
| myself with technologies and specifications before examining
| how their implementation might lead to security flaws. This
| process usually requires reading a lot of technical
| documentation and source code, and becoming acquainted with how
| organisations implement said technologies.
|
| Once I feel comfortable with my understanding of the subject
| material, I start to think about how certain aspects of the
| technology could lead to security flaws or interesting areas of
| research. At times this may require out-of-the-box thinking or
| can even be the result of pure luck.
|
| The "bug bounty" aspect of this all tends to come into play
| once I want to find case studies for my research.
|
| [1]: https://medium.com/@alex.birsan/dependency-
| confusion-4a5d60f...
|
| [2]: https://portswigger.net/research/top-10-web-hacking-
| techniqu...
|
| [3]: https://edoverflow.com/2019/ci-knew-there-would-be-bugs-
| here...
___________________________________________________________________
(page generated 2022-02-10 23:00 UTC)