[HN Gopher] Defusedxml - defusing XML bombs and other exploits
___________________________________________________________________
Defusedxml - defusing XML bombs and other exploits
Author : gudzpoz
Score : 53 points
Date : 2024-09-12 17:11 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| slau wrote:
| DefusedXML is an amazing piece of code.
|
| This being said, many of the mitigations it enables are now also
| available by default in many "standard" libraries. For example,
| bandit will often tell you to not use lxml in Python, but instead
| use defusedxml. However, modern versions don't suffer the same
| issues at all, and this is a case where automatically following
| the advice of the linter/SCA is not a great idea.
| metafunctor wrote:
| Do you mean that it is, in fact, a mistake to use defusedxml
| instead of lxml in Python?
| slau wrote:
| From the author themselves, 6 years ago:
|
| > defusedxml.lxml is no longer needed and supported. Nowadays
| libxml2 has builtin limitation for entity expansion.
|
| https://github.com/tiran/defusedxml/issues/25#issuecomment-4.
| ..
| masklinn wrote:
| Note that this is not enabled by default, although there is
| an upper bound on tree size which does limit the reach of
| the issue.
|
| See https://lxml.de/FAQ.html#is-lxml-vulnerable-to-xml-
| bombs for more about the tuning knobs.
| JonChesterfield wrote:
| libxml2 segfaults on me whenever I give it vaguely
| complicated xsl templates so I'm doubtful about how
| effective that handling will be.
| masklinn wrote:
| If you're trying to use it for lxml then yes, it was only
| ever experimental and has been deprecated (it also failed to
| define some interfaces correctly causing issues).
|
| If you're using it over the stdlib then no.
| mjfisher wrote:
| Fascinating reading:
|
| > The majority of developers are unacquainted with features such
| as processing instructions and entity expansions that XML
| inherited from SGML. At best they know about <!DOCTYPE> from
| experience with HTML but they are not aware that a document type
| definition (DTD) can generate an HTTP request or load a file from
| the file system.
|
| I was one of them!
| tannhaeuser wrote:
| Developers are even less aware that SGML has (and always had)
| _quantities_ in the SGML declaration, allowing among other
| things to restrict the nesting /expansion level of entities
| (and hence to counter EE attacks without resorting to
| heuristics).
|
| Regarding DOCTYPE and DTDs, browsers at best made use of those
| to switch into or out of "quirks mode", on seeing special
| hardcoded public identifiers but ignored any declarations.
| WHATWG's cargo cult "<!DOCTYPE html>" is just telling an SGML
| parser that the "internal and external subset is empty",
| meaning there are no markup declarations necessary to parse
| HTML which is of course bogus when HTML makes abundant use of
| empty elements (aka void/self-closing elements in HTML
| parlance), tag omission, attribute shortforms, and other
| features that need per-element declarations for parsing. Btw
| that's what defines the XML subset of SGML: that XML can always
| be parsed without a DTD, unlike HTML or other vocabularies
| making use of above stated features.
|
| Keep in mind SGML is a markup language for text authoring, and
| it would be pretty lame for a markup language to not have text
| macros (entities). In fact, the lack of such a basic feature is
| frequently complained about in browsers. The problems came when
| people misused XML for service payloads or other generic data
| exchange. Note SOAP did forbid DTDs, and stacks checked for
| presence of DTDs in payloads. That said, XML and XML Schema
| with extensive types for money/decimals, dates, hashes, etc. is
| heavily used in eg ISO 20022 payments and other financial
| messages, and to this date, there hasn't evolved a single
| competitor with the same coverage and scope (with the potential
| exception of ASN.1 which is even older and certainly more
| baroque).
| redbell wrote:
| > I was one of them!
|
| I _still_ one of them!
| Lance_ET_Compte wrote:
| Does `lxml` match `etree` in the table?
| move-on-by wrote:
| I've always appreciated their drop-in replacement support. It's
| so nice to just change an import and move on. I've used it on
| multiple legacy projects with great success- never a single
| compatibility issue. Great project!
| redbell wrote:
| > XML Bomb
|
| This reminds me of _Zip Bomb_ [1], aka, _Zip of Death_ (ZOD) [2]
|
| 1. https://en.m.wikipedia.org/wiki/Zip_bomb
|
| 2. https://github.com/iamtraction/ZOD
___________________________________________________________________
(page generated 2024-09-12 23:00 UTC)