[HN Gopher] Defusedxml - defusing XML bombs and other exploits
       ___________________________________________________________________
        
       Defusedxml - defusing XML bombs and other exploits
        
       Author : gudzpoz
       Score  : 53 points
       Date   : 2024-09-12 17:11 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | slau wrote:
       | DefusedXML is an amazing piece of code.
       | 
       | This being said, many of the mitigations it enables are now also
       | available by default in many "standard" libraries. For example,
       | bandit will often tell you to not use lxml in Python, but instead
       | use defusedxml. However, modern versions don't suffer the same
       | issues at all, and this is a case where automatically following
       | the advice of the linter/SCA is not a great idea.
        
         | metafunctor wrote:
         | Do you mean that it is, in fact, a mistake to use defusedxml
         | instead of lxml in Python?
        
           | slau wrote:
           | From the author themselves, 6 years ago:
           | 
           | > defusedxml.lxml is no longer needed and supported. Nowadays
           | libxml2 has builtin limitation for entity expansion.
           | 
           | https://github.com/tiran/defusedxml/issues/25#issuecomment-4.
           | ..
        
             | masklinn wrote:
             | Note that this is not enabled by default, although there is
             | an upper bound on tree size which does limit the reach of
             | the issue.
             | 
             | See https://lxml.de/FAQ.html#is-lxml-vulnerable-to-xml-
             | bombs for more about the tuning knobs.
        
             | JonChesterfield wrote:
             | libxml2 segfaults on me whenever I give it vaguely
             | complicated xsl templates so I'm doubtful about how
             | effective that handling will be.
        
           | masklinn wrote:
           | If you're trying to use it for lxml then yes, it was only
           | ever experimental and has been deprecated (it also failed to
           | define some interfaces correctly causing issues).
           | 
           | If you're using it over the stdlib then no.
        
       | mjfisher wrote:
       | Fascinating reading:
       | 
       | > The majority of developers are unacquainted with features such
       | as processing instructions and entity expansions that XML
       | inherited from SGML. At best they know about <!DOCTYPE> from
       | experience with HTML but they are not aware that a document type
       | definition (DTD) can generate an HTTP request or load a file from
       | the file system.
       | 
       | I was one of them!
        
         | tannhaeuser wrote:
         | Developers are even less aware that SGML has (and always had)
         | _quantities_ in the SGML declaration, allowing among other
         | things to restrict the nesting /expansion level of entities
         | (and hence to counter EE attacks without resorting to
         | heuristics).
         | 
         | Regarding DOCTYPE and DTDs, browsers at best made use of those
         | to switch into or out of "quirks mode", on seeing special
         | hardcoded public identifiers but ignored any declarations.
         | WHATWG's cargo cult "<!DOCTYPE html>" is just telling an SGML
         | parser that the "internal and external subset is empty",
         | meaning there are no markup declarations necessary to parse
         | HTML which is of course bogus when HTML makes abundant use of
         | empty elements (aka void/self-closing elements in HTML
         | parlance), tag omission, attribute shortforms, and other
         | features that need per-element declarations for parsing. Btw
         | that's what defines the XML subset of SGML: that XML can always
         | be parsed without a DTD, unlike HTML or other vocabularies
         | making use of above stated features.
         | 
         | Keep in mind SGML is a markup language for text authoring, and
         | it would be pretty lame for a markup language to not have text
         | macros (entities). In fact, the lack of such a basic feature is
         | frequently complained about in browsers. The problems came when
         | people misused XML for service payloads or other generic data
         | exchange. Note SOAP did forbid DTDs, and stacks checked for
         | presence of DTDs in payloads. That said, XML and XML Schema
         | with extensive types for money/decimals, dates, hashes, etc. is
         | heavily used in eg ISO 20022 payments and other financial
         | messages, and to this date, there hasn't evolved a single
         | competitor with the same coverage and scope (with the potential
         | exception of ASN.1 which is even older and certainly more
         | baroque).
        
         | redbell wrote:
         | > I was one of them!
         | 
         | I _still_ one of them!
        
       | Lance_ET_Compte wrote:
       | Does `lxml` match `etree` in the table?
        
       | move-on-by wrote:
       | I've always appreciated their drop-in replacement support. It's
       | so nice to just change an import and move on. I've used it on
       | multiple legacy projects with great success- never a single
       | compatibility issue. Great project!
        
       | redbell wrote:
       | > XML Bomb
       | 
       | This reminds me of _Zip Bomb_ [1], aka, _Zip of Death_ (ZOD) [2]
       | 
       | 1. https://en.m.wikipedia.org/wiki/Zip_bomb
       | 
       | 2. https://github.com/iamtraction/ZOD
        
       ___________________________________________________________________
       (page generated 2024-09-12 23:00 UTC)