[HN Gopher] Google, Mozilla Close to Finalizing Sanitizer API fo...
       ___________________________________________________________________
        
       Google, Mozilla Close to Finalizing Sanitizer API for Chrome and
       Firefox Browse
        
       Author : todsacerdoti
       Score  : 179 points
       Date   : 2021-10-20 05:47 UTC (17 hours ago)
        
 (HTM) web link (portswigger.net)
 (TXT) w3m dump (portswigger.net)
        
       | tmikaeld wrote:
       | This is good of course, however, it will probably take at least
       | 5+ years until a majority of users use eventual browsers that has
       | this feature built-in.
        
         | lloydatkinson wrote:
         | Why would it take five years for Chrome to implement it?
        
           | 19870213 wrote:
           | You're forgetting 'legacy' devices, as in, older than a
           | couple of minutes. I maintain an application that is used in
           | primary education in the Netherlands, and the oldest device
           | thus far with issues is an iPad4 with iOS10.3, which the
           | school only invested in a couple of years ago (I don't know
           | any further details). And in their infinite wisdom Apple
           | fixed the safari version to the ios version, and no
           | alternative browsers allowed. /rant
        
           | fabiospampinato wrote:
           | It's already implemented in Chrome latest, available under a
           | flag. Although support for <svg> and <math> elements is not
           | in yet.
        
           | skrebbel wrote:
           | It won't, but it will take a while (not sure about five
           | years) for everybody to be on the latest version. Chrome has
           | nice auto-update but not everybody has it enabled, for all
           | kinds of good and bad reasons.
        
             | wongarsu wrote:
             | According to statcounter.com any new Chrome version is
             | adopted by a large majority of users within 30 days [1].
             | 
             | Sure, some cooperate users hold out much longer, but those
             | seem to be a tiny minority.
             | 
             | 1: https://gs.statcounter.com/browser-version-market-share
        
               | codedokode wrote:
               | There are other browsers except for Chrome. For example
               | there are built-in browsers in smartphones that will
               | never be updated.
               | 
               | It is wrong to rely on everyone using a latest version of
               | a browser. Every site should support at least 5 years old
               | browsers and a good site will be usable in 10 years old
               | browser including builtin browser of Windows.
               | 
               | Sadly, in reality most sites are made so poorly that they
               | don't open on 5-years old smartphone. It shows how low
               | qualified became modern web developers.
        
               | wongarsu wrote:
               | I agree that apps will need a fallback for a couple
               | years. So far only Chrome and Firefox are implementing
               | it, and given that Safari took 2.5 years for
               | IntersectionObserver I wouldn't hold my breath until we
               | can even claim that all major browsers implement this.
               | 
               | But this API doesn't implement anything we couldn't do
               | before, it's a more correct and much faster
               | implementation of something we already have libraries
               | for. The vast majority of users will have an almost
               | immediate benefit from sites using this API, both in
               | speed and security.
               | 
               | That's what I take this comment chain to be about, since
               | talking about "a majority" doesn't make sense otherwise.
               | Supporting browser versions used by at most 2% of all
               | users is the name of the game in webdev, that's what made
               | IE so annoying.
        
         | [deleted]
        
         | dwheeler wrote:
         | The problem isn't that you need a majority; in many such
         | situations you need a supermajority. It's usually not okay if
         | your website can't be safely used by 49% of your users. In
         | particular, Apple is notoriously slow at improving its
         | JavaScript support in iOS, and they don't allow competing
         | JavaScript implementations to run on iOS, so on iOS you're
         | stuck.
         | 
         | In this case, as long as there is an easily-available OSS
         | polyfill, it'll be okay. Ideally sites will only load the
         | polyfill when they need to (primarily only if they're on iOS).
        
         | fleddr wrote:
         | Even after 5 years, you will continue to need to implement a
         | fallback, basically forever.
         | 
         | An attacker will always be able to use an older browser version
         | that does not have the built-in feature.
        
           | jacobmischka wrote:
           | It doesn't matter if attackers can intentionally use older
           | browsers, they can also use other tools like curl or even an
           | intentionally malicious browser application that doesn't have
           | these features either.
           | 
           | It matters if users use the secure browsers, and with Chrome,
           | Firefox, and (hopefully) Safari implementing it the vast
           | majority of them will within a few release cycles.
        
           | matsemann wrote:
           | That's not how XSS generally works. You need a victim to
           | visit a trusted page where you've managed to insert some
           | html/scripts, and then execute that in the context of the
           | user (cookies, read sensitive data etc). If you can trick a
           | user to use a different browser, you probably already have
           | full control.
        
           | tlamponi wrote:
           | > An attacker will always be able to use an older browser
           | version that does not have the built-in feature.
           | 
           | You do not purify DOM for the attackers' browser, they can
           | just open the dev console and execute arbitrary JS anyway,
           | you purify it so that user-input that one renders is also
           | safe for others users to see without allowing attacker
           | controlled scripts to be executed or DOM elements that leak
           | user info on load to be inserted.
           | 
           | And you can always simply start to show a banner that tells
           | "contnet blocked, upgrade your browser" in a few years, once
           | a big enough majority of your target user base upgraded to a
           | browser that supports it.
        
       | xeyownt wrote:
       | The day Google and Mozilla will merge, they will make Goozilla,
       | even more frightnening than Godzilla.
        
       | 0xy wrote:
       | Almost every new web API throws a bone to the ad companies who
       | use it for fingerprinting.
       | 
       | The AudioContext API dishes out sensitive and specific audio
       | device latency information that can be used to identify you to a
       | high degree of accuracy, even if the web page in question never
       | plays audio.
       | 
       | If you're ever bored one day, have a go reverse engineering some
       | adware JS code to see what they're up to.
       | 
       | AudioContext for a static ad? Canvas fingerprinting? DRM
       | fingerprinting? All of this has been enabled by both Google and
       | Mozilla, who serve the same ad masters.
       | 
       | Mozilla is entirely dependent on ad money, as is Google. So they
       | turn a blind eye to the security disasters being rolled out every
       | month -- which Google's customers subsequently abuse.
       | 
       | This will be no different, it's yet another datapoint to identify
       | you when perfectly good userspace solutions have existed forever.
        
         | pimterry wrote:
         | This makes no sense for this example. It's an always enabled
         | browser security API. It doesn't expose anything about the
         | device, it doesn't even have state, they're just proposing a
         | new Sanitizer API with methods to sanitize DOM objects. All you
         | could detect is its presence, which provides far less info than
         | the browser version alone.
         | 
         | There are other fingerprinting issues on the web, sure, but
         | this is not one of them. The knee-jerk "all changes on the web
         | are bad" responses are not helpful.
        
           | 0xy wrote:
           | The change is redundant because there are userspace solutions
           | today, and they will continue to be used for decades.
           | 
           | So what use does this serve? Another fingerprinting vector to
           | narrow down browser versions, or worse -- a security hole.
           | 
           | When the "Web Audio API" was rolled out by the Chrome team
           | not only was it fundamentally flawed, but it also contained
           | memory leak CVEs.
           | 
           | Every time they add more trash to the browser nobody asks for
           | (except their ad clients), they introduce more security
           | problems.
           | 
           | So -- why are they adding more APIs when they could fix the
           | old ones that are utterly broken and abused daily by the ad
           | industry? $$$.
        
             | maple3142 wrote:
             | Because DOMPurify is not perfect. Due to some problem of
             | HTML parsing, there were some ways to bypass it:
             | https://research.securitum.com/mutation-xss-via-mathml-
             | mutat...
             | 
             | Having a builtin XSS sanatizer means it could always use a
             | single parser to prevent such bypass.
        
             | pimterry wrote:
             | The browser version is always already accessible, and even
             | in future plans to reduce user agent info
             | (https://www.chromestatus.com/feature/5704553745874944) the
             | major version number is never going to be hidden.
             | 
             | This API exposes zero new fingerprinting bits.
             | 
             | Meanwhile userspace solutions for the same solution are a)
             | occasionally buggy b) not as widely used as they should be
             | c) not as performant as they could be if implemented
             | directly within the browser and d) not automatically
             | updated as new browser features are released. Standardizing
             | this will improve all of that.
             | 
             | It is valuable to standardize APIs and build them into the
             | web for features like this where there's a clearly correct
             | approach that's required by a large percentage of modern
             | sites (anything with client side dynamic content). Having
             | browser vendors implement this, embedded in the browser
             | itself and supported by browser implementers directly, is a
             | free security win for the web.
        
             | bugmen0t wrote:
             | The intent is to shift the responsibility to the browser.
             | Decades worth of userspace solution have failed us. The
             | browser is pretty good at HTML parsing.
        
         | prox wrote:
         | Yeah I would love to hear someone from Mozilla or Google
         | respond to this.
        
           | hoten wrote:
           | There's literally nothing to respond to. There's no sensitive
           | information exposed by this API...or any information, for
           | that matter.
        
         | jiggawatts wrote:
         | I don't think the fingerprinting capabilities are caused by
         | malice on the behalf of Mozilla or Google.
         | 
         | Even if they made $0.00 from ad revenue, they would still be on
         | the losing side of the battle against tracking. You can't have
         | many features without exposing _something_ about the client. As
         | soon as you have things like multiple versions, optional
         | features, plugins, and UI metrics, you 've lost the battle
         | already. Just your fonts alone can identify you to a reasonable
         | accuracy.
         | 
         | How would _you_ solve this problem? Have _everyone_ run the
         | exact same browser in a virtual machine sandbox, with all
         | traffic running through a common VPN? No plugins, ever? No
         | i18n? No preferences of any type? Upscale a fixed-sized image
         | to your screen 's physical resolution and just learn to live
         | with the blur?
         | 
         | That's where you'd have to _begin_ , but I guarantee the ad
         | people would find a way around it. Key press cadence timing.
         | Mouse movement patterns. Something. They'll find a way.
        
           | mathnmusic wrote:
           | The first principle should be to separate "documents" from
           | "webapps". An article on NYTimes should be classified as a
           | document which comes with a sandbox with minimal data
           | collection. Of course, as things stand, every site wants to
           | become an app because that's how the incentives are set up.
           | "Apps" - which can collect more data - should come with
           | significant user friction: permissions dialog, standardized
           | ToS, disclosures etc. Similarly, sites that offer "documents"
           | (i.e. no tracking), should be incentivized in other ways
           | (share button, micropayments etc).
           | 
           | There's a lot that can be done.
        
           | 0xy wrote:
           | Every data point represents more bits of information used to
           | identify users.
           | 
           | I would believe Mozilla and Google were good actors if they
           | went back and cleaned up their security vulnerability "Web
           | APIs" when they get used almost exclusively for
           | fingerprinting.
           | 
           | They don't make any attempt to fix the vulnerabilities, they
           | simply add more. Coincidentally, their ad clients directly
           | benefit. Ain't that something.
        
         | chrismorgan wrote:
         | I haven't delved, but this shouldn't be a fingerprinting vector
         | (except for the one bit of whether it's implemented), as all
         | browsers will be implementing the same thing, like with HTML
         | parsing.
         | 
         | As for the other cases you describe, I'd say the problem isn't
         | so much that fingerprinting vectors exist as it is that ad
         | providers allow arbitrary unsandboxed code execution, which is
         | an obviously-terrible idea that never should have happened.
        
       | fabiospampinato wrote:
       | I'm pretty excited about this for two reasons:
       | 
       | - First of all it makes sense that this feature is provided by
       | the browser itself and they take some responsibility if it
       | doesn't work right.
       | 
       | - Currently the best library for sanitization is probably
       | DOMPurify, and the native Sanitizer API is around 100x faster
       | than DOMPurify, so that would speed up some things dramatically.
       | 
       | I just hope it won't take years for Safari to implement this.
        
         | pimterry wrote:
         | > I just hope it won't take years for Safari to implement this.
         | 
         | 1000%. Safari likes to talk big about rejecting new APIs to
         | protect security & privacy, but there's a long list of APIs
         | they haven't implemented just like this, that are strictly
         | beneficial for users.
         | 
         | That both Firefox & Chrome have shipped working implementations
         | of this (a serious fix to solve a top 10 OWASP security issue)
         | before Safari has even shown any intent in look at it says a
         | lot imo.
        
           | afavour wrote:
           | > Safari likes to talk big about rejecting new APIs to
           | protect security & privacy
           | 
           | Not only that, they restrict existing functionality. For
           | example, all local storage is destroyed if you don't access a
           | site in seven days. At first blush that makes sense but it
           | means there's no way to reliably persist data to disk. If you
           | run a web app you more or less _have_ to create a backend,
           | account signups, etc etc. Not only is it a lot of extra work
           | it's also going to be a huge security vulnerability. The
           | result ends up being entirely counter to Apple's stated
           | intent.
        
             | KarlKemp wrote:
             | It's pretty obvious how that is helpful to protect users'
             | privacy, isn't it?
             | 
             | Or why would they do it, considering it's extra work
             | compared to the status quo?
        
               | afavour wrote:
               | Of course, I absolutely understand how it prevents
               | illegitimate uses of local data storage to violate
               | privacy. My concern is that it also destroys entirely
               | legitimate use cases for local storage and the only way
               | to mitigate that is to open users to a whole new class of
               | security vulnerability they can do very little to protect
               | themselves from.
        
               | KarlKemp wrote:
               | I believe the criteria are more complex than just "7
               | days". There's something about AI or ML in the Safari
               | "experiments" settings, and IIRC first- vs. third-party
               | data is handled differently, and data may also be
               | protected for more than a week if you previously had
               | regular interactions with the domain.
        
               | javitury wrote:
               | That uncertainty is still a blocker for many apps.
        
             | skybrian wrote:
             | It seems like that's one way to prevent lock-in to a single
             | device, or a single browser on that device.
        
           | gbrown wrote:
           | And then iPhone users are stuck due to anticompetitive lock-
           | in.
        
           | krono wrote:
           | Or in april 2021 they finally decide to implement a date
           | input field with picker, but then half-arse and not support
           | min and max properties [0].
           | 
           | A feature not being supported is clear-cut and workable. This
           | current mess where a feature might be supported, with
           | different parts of the spec available only to Safari 14.1 Bug
           | Sur and up but not 14.1 on Catalina is just tiresome.
           | 
           | [0] https://caniuse.com/input-datetime
        
             | tehbeard wrote:
             | No minmax, thanks Apple...
             | 
             | Aren't we overdue for another indexedDB fuck up by the
             | Safari Dev team?
        
               | jessaustin wrote:
               | Sshhhh! Don't remind them!
        
       | encryptluks2 wrote:
       | Sounds like a positive... although, at this point, what looks
       | good may end up actually being bad, like FLoC. Although, I don't
       | understand the uproar over people saying adblock was being
       | removed from Chrome, which still works for me. I think this is a
       | sign that Chromium is actually willing to work with developers to
       | improve APIs.
        
         | Semaphor wrote:
         | > what looks good may end up actually being bad, like FLoC.
         | 
         | FLoC was a google project (this is FF and Google + library
         | author), and it looked bad from the start.
         | 
         | > adblock was being removed from Chrome, which still works for
         | me.
         | 
         | Adblock, in a way, will still work. Just even worse than now
         | (where uBlock Origin on FF is better than on Chrome). The
         | Manifest V3 change was postponed by google, currently [0] they
         | plan to stop supporting V2 in January 2023
         | 
         | [0]:
         | https://developer.chrome.com/docs/extensions/mv3/mv2-sunset/
        
           | encryptluks2 wrote:
           | https://developer.chrome.com/blog/mv2-transition/
           | 
           | > In the meantime, we will continue to add new capabilities
           | to Manifest V3 based on the needs and voices of our developer
           | community. Even in the last few months, there have been a
           | number of exciting expansions of the extension platform.
           | 
           | I have yet to see where Chrome is explicitly telling anyone
           | they plan to phase out support for adblockers, nor where they
           | are making it clear that is their intention. V3 is not yet
           | completed, and is actively being worked on. If they actually
           | do disable adblockers then that is a different story.
        
             | rndgermandude wrote:
             | >I have yet to see where Chrome is explicitly telling
             | anyone they plan to phase out support for adblockers
             | 
             | Why would they ever want to do that? It would be a PR
             | nightmare if they came out and explicitly said "fuck you".
             | 
             | Instead they opted to take away capabilities from their
             | APIs with the result of severely limiting adblocker
             | capabilities, under the guise that this improves security
             | and performance, which is not entirely wrong, but at the
             | same time hides the fact that there would have been easy
             | enough alternatives that preserve the capabilities of
             | adblockers and some other API users while making sure the
             | security threats they stated they are concerned about can
             | be prevented[0]. Yet they didn't even really look at what
             | was proposed and insisted on crippling their API in a way
             | it cripples adblockers.
             | 
             | I can only conclude that improving security and performance
             | is just one of the engineering goals of their solution,
             | while the other (unstated) goal is to fuck with adblockers.
             | 
             | [0] They were particularly worried about extensions being
             | able to intercept requests, examine requests and exfiltrate
             | sensitive data. One way this can be easily solved is by
             | adding a special sandbox for request blocking that has no
             | accessible output to the extension or anywhere else (no
             | backchannel, no access to the network or file system). You
             | can load scripts (and data) into it, but it may only ever
             | talk to the browser itself during request handling. This
             | breaks the "exfiltrate" part.
        
               | encryptluks2 wrote:
               | They are still accepting proposals and the changes aren't
               | being forced until 2023. I know everyone likes to think
               | Google is always evil lately, but there is still a lot of
               | time for the new API to be revised and improve on the
               | features you mention. You can even make the suggestions
               | yourself or work on the code to fix it.
        
               | rndgermandude wrote:
               | >They are still accepting proposals
               | 
               | V3 is finalized. And while they say they accept proposals
               | for future changes, they already did not accept or even
               | consider proposals in the timeline leading to V3.
               | 
               | >changes aren't being forced until 2023
               | 
               | They pushed the timeline back, because of all the push
               | back they got. And also, the deadline for _new_
               | extensions using V2 is Jan 2022, so a few months from
               | now. So
               | 
               | >but there is still a lot of time for the new API to be
               | revised and improve on the features you mention.
               | 
               | Not true either, you have until Jan 2022, a few months
               | from now, to spec and implement and roll out such a
               | revised or new API.
               | 
               | This is not going to happen. They had enough time to do
               | all that when the issues were first raised, but didn't.
               | They had enough time to do all this when the first
               | serious proposals for better solutions were made, but
               | they didn't. Why at this point I am wondering: will they
               | ever?
               | 
               | Sure, the already established extensions will get a
               | little bit of a longer grace period where things would
               | theoretically happen. That doesn't help you if you want
               | to create something new, tho.
               | 
               | >You can even make the suggestions yourself or work on
               | the code to fix it.
               | 
               | Have you ever tried to get code into chrome(ium)? That's
               | hard enough by itself. Now try to get code in that
               | affects something google considers important... and they
               | consider this important at least now, for the mere fact
               | it was "news".
               | 
               | Trying to work with them is what gorhill did, and a lot
               | of other people too, before he figured out they were set
               | on going the cripple-adblockers route and sounded the
               | alarms.
        
             | Arnavion wrote:
             | >I have yet to see where Chrome is explicitly telling
             | anyone they plan to phase out support for adblockers, nor
             | where they are making it clear that is their intention.
             | 
             | That was never their intention, not what the uproar was
             | about, so it's to be expected that you're not seeing any
             | evidence of it. The problem with Manifest V3 is not that it
             | disables adblockers; not sure where you got that idea from.
             | 
             | The problem is that it severely restricts how effective
             | they can be. Many of the things uBO does cannot be done in
             | v3. That's what the uproar is about.
             | 
             | https://github.com/uBlockOrigin/uBlock-
             | issues/issues/338#iss...
             | 
             | https://github.com/uBlockOrigin/uBlock-
             | issues/issues/338#iss...
        
               | encryptluks2 wrote:
               | Again, these changes aren't taking place until 2023. They
               | are still accepting new features. Yes, everyone is aware
               | that the developer of uBlock threw a fit. It was all over
               | the news, and it was portrayed by the media as Chrome is
               | disabling adblock, because that is essentially the
               | message from Raymond Hill at the time. I think it is good
               | to be aware of these changes, and contribute feedback and
               | try to get compatible or comparable APIs implemented. At
               | no point has Google or Chromium said they are unwilling.
               | If anything, it looked like a prime opportunity for them
               | to scream fire when there was no fire.
        
       | codedokode wrote:
       | There is something wrong with this idea. Sanitizing HTML should
       | be done on the server, not on the client side.
       | 
       | Looks like absolutely useless feature that will just make bloated
       | browsers more bloated.
        
         | nightpool wrote:
         | The benefit of doing this client-side instead of server-side is
         | that you can stay up to date with any changes that the client
         | may make to how it's processing HTML that may have security
         | implications. Additionally, you get to use the exact same code
         | that the browser is ultimately using to parse the HTML, so a
         | browser parsing bug, spec nuance, or un-specced legacy behavior
         | that your backend developer didn't consider don't turn into
         | serious security flaws.
         | 
         | Additionally, the Sanitize API does a much better job of
         | handling contextual parsing then many other similar backend
         | APIs. What happens when you parse an HTML fragment assuming it
         | will live in a `div`, and then it actually get inserted into a
         | `table` cell? The spec goes into this is more detail here:
         | https://wicg.github.io/sanitizer-api/#strings
         | 
         | The downsides, of course, are those associated with any thick-
         | client/thin-server API design--more logic on the front-end
         | means more logic to reimplement for different consumers.
         | 
         | Personally, I would probably still stick with Nokogiri for my
         | own applications, but I can see both sides of the trade-off.
        
         | mftb wrote:
         | The article states a couple times in the opening paragraphs
         | that the API is about sanitizing dynamically generated HTML,
         | "Many websites rely on dynamically generated content in the
         | browser. Often, the generated markup includes content provided
         | by outside sources, such as user-provided input, which can
         | include malicious JavaScript code.". So the server would never
         | see this HTML.
        
           | codedokode wrote:
           | This is absolutely unclear. If the user enters HTML and that
           | HTML never gets to the server then why sanitize it? To
           | protect user from hacking themselves?
        
             | alanfranz wrote:
             | Google for "reflected XSS". Sometimes a parameter in the
             | URL can be rendered in the user's browser.
        
             | oh_sigh wrote:
             | Sure, why not? Most users expect that if they paste
             | something into a textbox on a site, that their browser
             | won't send their cookies and browsing history to some
             | random 3rd party
        
             | playpause wrote:
             | Users may 'hack themselves' when an attacker persuades them
             | to paste something into a website, for example. These are
             | very basic XSS questions by the way, you don't seem to know
             | enough about the subject to be this incredulous.
        
             | dzaima wrote:
             | In addition to the other replies, there could be server-
             | provided HTML that the user has the option to change, and
             | initiating a change, activating the vulnerability, could be
             | one click. (this happened to Google's own search bar!)
             | 
             | Then there are cases of different browsers parsing things
             | differently and/or bad sanitization/serializing giving
             | different results on repeated invocation or just being
             | broken on the server-side. A simple client-side option is
             | gonna be a lot simpler.
        
           | tdeck wrote:
           | After reading some of these comments I still wonder what the
           | concrete use cases are. What are these websites that allow
           | users to paste in HTML, and why? Is that even a good idea? I
           | can understand when it's developer tools like jsFiddle and
           | the like, but when should a normal consumer website be
           | hosting untrusted code in the frontend?
        
       | tannhaeuser wrote:
       | Have they addressed the points we've discussed 4 months ago [1]
       | (eg where they're reinventing SGML, badly and hard-coded to
       | HTML):
       | 
       | [1]: https://news.ycombinator.com/item?id=27061020
        
         | nightpool wrote:
         | Seeing as [nobody seems to have brought it up to
         | them](https://github.com/WICG/sanitizer-api/issues?q=sgml), I'm
         | not surprised that they haven't addressed it.
         | 
         | But, as always, specific & easy-to-use APIs are going to win
         | out over more "fully general" ones. Are you suggesting that
         | everybody learn DSSSL and write queries like
         | ((match-element? nd '(section title))         (format-number-
         | list         (hierarchical-number '("chapter" "section") nd)
         | "1"         "."))
         | 
         | Simply to be able to safely display some markup? I for one
         | would much rather work on an AST with normal javascript instead
         | of having to learn another DSL.
        
           | tannhaeuser wrote:
           | > _Are you suggesting that everybody learn DSSSL [...]?_
           | 
           | Hell, no ;) Just that they pickup SGML insertion contexts as
           | a concept where to escape what chars when that was known in
           | the late 1970s already (ISO 8879 was published in 1986, but
           | took a loong time through the committees). It's incredibly
           | lame they haven't figured DTD/markup grammars and can only
           | handle hard-coded HTML insertion contexts - one more thing to
           | fall off the cliff as HTML evolves, and unnecessarily so.
           | 
           | OTOH, it always is fun to show HNers what could've would've
           | been using DSSSL/Scheme in browsers ...
        
             | nightpool wrote:
             | This has nothing to do with "escaping what chars when".
             | It's simply a structural whitelist for DOM nodes that
             | prevents JS execution, coupled with a contextual parser
             | that was already available, but a little hard to find.
             | Maybe I'm not understanding your point, because googling
             | for "SGML insertion contexts" doesn't bring up anything
             | that looks relevant, but there are many, many drawbacks
             | that came from using XML to define HTML, and the browser
             | community moved away from it for a good reason. My guess is
             | that SGML had a similar story.
        
           | ampdepolymerase wrote:
           | The Lisp evangelism team on HN will burn you for that comment
           | :)
        
             | [deleted]
        
             | floatingatoll wrote:
             | The Lisp evangelism team can speak for themselves :)
        
       | alanfranz wrote:
       | Risky. How can you distinguish between an intentionally set
       | script and an attack?
       | 
       | Why can't HTML be composed client-side using proper, contextual
       | APIs instead of "sanitizing" it afterwards? It won't work. It
       | reminds me of PHP magic quotes - they didn't work.
       | 
       | We'd still need a sanitizer for URLs, of course, those are one of
       | the pesky parts of the web specs.
        
         | wccrawford wrote:
         | Because far, far too many web apps need to display user-entered
         | data, and it needs to be sanitized. When markdown is converted
         | to HTML, as it is on this form, it _still_ should be sanitized
         | afterwards to deal with any vulnerabilities that were
         | discovered after the user entered the data, even years later.
        
           | alanfranz wrote:
           | "and it needs to be sanitized..." clarify your point. If I
           | need to display user-controlled data, I can use a proper API
           | - e.g. var x = document.createElement(); x.textContent =
           | "<script></script>". You can put _anything_ inside
           | textContent. It works because it is contextual; you 're
           | creating an element and telling the browser what to do with
           | it (display as text). If you needed better formatting, you
           | would compose the various html elements, you would NOT use
           | innerHTML.
           | 
           | DOMPurify performs a string->string conversion, so it's got
           | no context information. I don't understand how this can work.
           | It didn't work for PHP magic quotes. It doesn't work for SQL
           | queries. Why can and should it work for HTML?
           | 
           | Remember that "work" implies not just "safe". It implies "it
           | must show what the user wanted to see". Otherwise
           | var sanitize = function(input){ return "<p>";}
           | 
           | Would "just work" perfectly.
        
             | nightpool wrote:
             | Well, it's a good thing we're not talking about DOMPurify,
             | because the spec we're talking about (the Sanitizer API),
             | has lots of context information and does not provide a
             | string -> string API: https://wicg.github.io/sanitizer-
             | api/#strings
             | 
             | This API is simply a DOM-based whitelist for preventing
             | script execution coupled with a contextual parser. No more,
             | no less. It doesn't solve every problem with accepting
             | untrusted HTML, sure, but it's good enough for a wide
             | variety of use-cases (One example that comes up frequently
             | is embedded markdown, another good example is a mail client
             | --both situations where formatted, user-controlled data is
             | important). The benefit of doing this client-side instead
             | of server-side is that you can stay up to date with any
             | changes that the client may make to how it's processing
             | HTML that may have security implications. (The downsides,
             | of course, are those associated with any thick-client/thin-
             | server API design--more logic on the front-end means more
             | logic to reimplement for different consumers)
        
       | danShumway wrote:
       | This is good.
       | 
       | All of this stuff is possible to do already with 3rd-party
       | frameworks with varying levels of performance/reliability, and
       | there may be other methods you use to sanitize your output. If
       | you're a responsible developer, if you have experience trying to
       | guard against XSS, or if you're working with a framework like
       | React/Vue/whatever, this may not change much about your life.
       | 
       | However, I think it's understated how many sites struggle with
       | this, and how many of them do sanitization poorly, and it is much
       | better to be able to point them towards a single API and say,
       | "look, just call this function."
       | 
       | One of the bigger vulnerabilities I've ever found in a website
       | (https://danshumway.com/blog/gamasutra-vulnerabilities/) was in
       | part a problem directly caused by bad sanitization. When I
       | reported that vulnerability, I didn't advise them to fix their
       | sanitizer because I didn't feel confident it would really fix the
       | problem or that there wouldn't be other issues in the future. I
       | could have pointed them towards something like DOMPurify, but
       | instead I advised that they start embedding posts in iFrames with
       | reduce permissions so any scripts that did run wouldn't be able
       | to get at user data.
       | 
       | I wonder if there was more of a standard around this stuff if
       | some engineer on their team might not have caught the problem
       | earlier.
       | 
       | Similar to the SameSite changes to cookies, getting a native
       | sanitizer isn't about forcing you to stop doing serverside
       | sanitization or even changing your workflow much at all. It's
       | about making the defaults safer and making it easier for hobby
       | developers (and companies too) to avoid messing up because they
       | don't understand the risks of calling otherwise innocuous
       | functions.
       | 
       | What's even more exciting is some of the work going into trusted
       | types (https://web.dev/trusted-types/). The sanitizer API gives
       | you a function you can call to get reasonably safe output without
       | a lot of configuration or a 3rd-party framework or knowing what
       | to download. The trusted types API is designed to make it harder
       | to accidentally go outside of that sanitizer. It's not a silver
       | bullet, but it's a big deal for companies where you're trying to
       | set org-wide policies and avoid XSS attacks sneaking in through a
       | random feature by a team that you're not watching closely enough.
       | 
       | It's promising to see Firefox/Google working together on this
       | stuff, I hope they continue with other APIs. The ideas are good,
       | they just need iteration and more input, and these kinds of API
       | areas are imo where Google/Mozilla tend to work together pretty
       | well. I'm fairly optimistic about them.
        
       | chrisweekly wrote:
       | Mods: Title typo - last word should be "Browsers" not "Browse"
        
         | floatingatoll wrote:
         | Only way the mods will see that in a timely manner is if you or
         | someone emails them, using the footer Contact link. Otherwise
         | they might not ever realize you posted this.
        
         | lordgrenville wrote:
         | 80 char title limit
         | https://github.com/wting/hackernews/blob/master/news.arc#L14...
        
       | tinus_hn wrote:
       | This is solving an issue in the browser that should be solved
       | server side.
       | 
       | Where is this data they are sanitizing coming from? Why would you
       | want every browser treating this differently?
        
         | jorangreef wrote:
         | In fact, this is actually an issue that can only be properly
         | solved in the browser, hence the need for security projects
         | like DOMPurify in the past.
         | 
         | The reason being that all browsers parse potential XSS (and
         | mXSS) content slightly differently so that a server side
         | sanitizer by definition will never parse content in exactly the
         | same way as a specific version of a given browser, since it's
         | not sharing exactly the same runtime.
         | 
         | This semantic gap between browser rendering quirks and server
         | side approximations can be exploited by an attacker to slip
         | content past the server side sanitizer.
         | 
         | And that's why the browser vendors are actively working with
         | Cure53, the author of DOMPurify, in order to ship Sanitizer
         | API.
         | 
         | I'm just surprised this has taken so long.
        
           | cxr wrote:
           | A reminder: people, please don't take your cues on language
           | theoretic security (or anything security-related) from
           | internet comments containing unverified claims, even if those
           | comments aren't nameless/faceless and are moderately-to-
           | highly upvoted. The parent comment is an example, appearing
           | to have the approval of the community, but there are several
           | subtle things wrong (or just odd) with it. It's short of
           | misleading people into doing anything harmful, though, so
           | stopping the world to hash them out here isn't critical, and
           | would be tedious besides.
           | 
           | Aside from that, a subset of people using DOMPurify--maybe
           | even a plurality--seem to treat it as a talisman and can't
           | explain how or if they've configured it correctly to provide
           | the kind of protections they need for their use case.
           | Security is not a separable concern.
        
             | jorangreef wrote:
             | It would be better for your comment to provide reasonable
             | objections as to why DOMPurify should not be considered
             | state of the art, without spreading fear, uncertainty or
             | doubt.
        
             | mcherm wrote:
             | > The parent comment is an example, appearing to have the
             | approval of the community, but there are several subtle
             | things wrong (or just odd) with it. It's short of
             | misleading people into doing anything harmful, though, so
             | stopping the world to hash them out here isn't critical,
             | and would be tedious besides.
             | 
             | It is not clear to me what subtle things are wrong with the
             | parent comment. But I'm pretty sure I can see the flaw in
             | claiming that something is wrong without saying how. It's
             | an impossible-to-disprove accusation which therefore does
             | not advance the conversation.
        
             | acdha wrote:
             | > please don't take your cues on language theoretic
             | security (or anything security-related) from internet
             | comments containing unverified claims
             | 
             | Your comment is nothing but vague, unverified claims. Why
             | do should you be given the benefit of the doubt but not the
             | person you're replying to? If you have a real concern,
             | share it so that it can be evaluated and everyone,
             | including the original poster, can learn from it.
        
             | nawgz wrote:
             | What a useless comment. "Shite bad but I won't say a thing
             | about what or how"
             | 
             | And then you proceed to imply developers who have UIs
             | susceptible to XSS are to blame, even though this is a
             | known attack vector where browsers parse adversarial trees
             | in an exploitable way that is completely opaque to
             | everyone?
             | 
             | This is silly. Hold your tongue or speak clearly, don't
             | jeer from a high horse
        
         | fabiospampinato wrote:
         | > This is solving an issue in the browser that should be solved
         | server side.
         | 
         | This doesn't make much sense to me, my app doesn't even rely on
         | any servers, what do you suggest I do then?
         | 
         | > Where is this data they are sanitizing coming from?
         | 
         | It's untrusted, what does it matter where exactly it comes
         | from?
         | 
         | > Why would you want every browser treating this differently?
         | 
         | Why do you say that each browser would tread this differently?
         | There's a spec for how this should work:
         | https://wicg.github.io/sanitizer-api/
        
           | resonious wrote:
           | > my app doesn't even rely on any servers, what do you
           | suggest I do then?
           | 
           | Although I agree that there is nothing wrong with client-side
           | sanitization, does this even apply to non-networked apps? Are
           | you protecting your user from?
           | 
           | Or perhaps by "doesn't rely on any servers" you mean
           | "serverless", which doesn't literally mean no servers. If
           | your app's client hits a database-as-a-service somewhere then
           | that is a server and it very well can sanitize for you.
        
             | pjerem wrote:
             | I think it's "doesn't rely on any servers _that you own_ ".
             | 
             | You can have no back-end code but still can call external
             | APIs hosted on servers you don't control or which can be
             | compromised by malicious user generated content.
        
             | esrauch wrote:
             | Besides the other comments there are legitimate security
             | reasons to be worried about sanitizing in general even in
             | zero-rpc-app cases; malicious files are an attack vector.
             | Eg if you were making Calibre you do still want to be
             | hardened against malicious local files from hijacking the
             | app.
        
             | fabiospampinato wrote:
             | > Although I agree that there is nothing wrong with client-
             | side sanitization, does this even apply to non-networked
             | apps? Are you protecting your user from?
             | 
             | Kind of from themselves, the app can open arbitrary files,
             | you can't have people nuke themselves by opening a random
             | file they got from the internet, or by pasting in something
             | they don't understand. Also a plugin may sort of escape
             | from the boundaries put on it by XSS-ing its user by
             | outputting some malicious stuff.
             | 
             | > Or perhaps by "doesn't rely on any servers" you mean
             | "serverless", which doesn't literally mean no servers. If
             | your app's client hits a database-as-a-service somewhere
             | then that is a server and it very well can sanitize for
             | you.
             | 
             | No I actually meant no servers at all, it's a 100% local
             | Electron app.
        
             | wongarsu wrote:
             | > does this even apply to non-networked apps? Are you
             | protecting your user from?
             | 
             | Some pages render data from GET parameters, which allows
             | XSS by giving people a link. There's also scenarios where
             | data from localstorage, cookies or manually copy-pasted
             | data is an attack vector. Imagine I get you to paste my
             | document into your text editor and that allows me to
             | extract all your data.
        
             | runarberg wrote:
             | XSS is a known vulnerability in social hacking as well as
             | shared data. Imagine an app which allows you to paste an
             | input which will later be rendered as HTML. An attacker can
             | create a malicious script and persuade the victims to copy
             | it and paste in their app. This script might include
             | `fetch` calls to a server the attacker controls which the
             | attacker can use to spy or install malware on the victims
             | machine.
             | 
             | The above example is a vulnerability in the app just the
             | same, it just requires the attacker to consider different
             | avenues of delivery (such as sharing the malicious script
             | on a forum) then in traditional server connected apps.
        
           | tinus_hn wrote:
           | What is your 'app' and how does it get untrusted input
           | without any server being involved?
           | 
           | There is also a spec for how html should work.
           | 
           | And finally, 100% of the useful functionality of this 'api'
           | could be implemented using a Javascript library or server
           | side. Yes, browsers have slight disagreements over handling
           | broken html. So the first step of server side sanitation is
           | to remove the broken stuff. This also removes a lot of cross
           | browser incompatibility. And then you select tags and only
           | allow the tags you want. Basic xss sanitation like it's 1999.
           | 
           | Nobody needs any built in api for this. This proposal is
           | about solving a solved problem.
        
             | fabiospampinato wrote:
             | It's an Electron 'app' that can open arbitrary files.
             | 
             | This isn't the sort of API that unlocks some previously
             | impossible feature, but DOMPurify is incredibly slow and I
             | would trust the browser's implementation more, obviously.
             | Rewriting DOMPurify better has always been an option, but
             | that's not an easy value proposition for most developers,
             | hence the best we've got is still DOMPurify.
        
               | tinus_hn wrote:
               | - you don't need to add apis to Firefox to use them in an
               | Electron app
               | 
               | - It isn't hard to do what this api does in Javascript as
               | fast as the browser can do it. There already is api to
               | convert strings to DOM and it isn't exactly rocket
               | science to iterate over the result and drop tags that
               | aren't on a list. I'm sure there can be a library that
               | does it slowly but it doesn't have to.
        
               | fabiospampinato wrote:
               | > - you don't need to add apis to Firefox to use them in
               | an Electron app
               | 
               | Right, so what?
               | 
               | > - It isn't hard to do what this api does in Javascript
               | as fast as the browser can do it. There already is api to
               | convert strings to DOM and it isn't exactly rocket
               | science to iterate over the result and drop tags that
               | aren't on a list. I'm sure there can be a library that
               | does it slowly but it doesn't have to.
               | 
               | Of course it's DOMPurify's fault for being that slow,
               | nobody is arguing that it must be rewritten in C++ to be
               | fast.
               | 
               | It's not as simple as you put it though, unless you have
               | an extremely strict list of tags containing no tags in
               | which case it's trivial, but it's also of very little
               | use.
               | 
               | Since your comment sounds overly arrogant to my ears I'd
               | like to point out that by following the rules that you
               | mentioned for making a sanitizer (DOMParser + drop non-
               | whitelisted tags basically) you can only produce either a
               | useless or a broken sanitizer, it's impossible to make a
               | useful and working sanitizer that way. Proof-ish: either
               | you drop <img> nodes, in which case you've made a pretty
               | useless sanitizer in my book, or you leave it in, in
               | which case you leave yourself open to XSS via stuff like
               | the onerror attribute.
               | 
               | It's not trivial to write a sanitizer that both works and
               | it's useful, almost no developers should need to learn
               | all the nuances necessary for writing one, hence the
               | platform itself should provide it.
        
         | dobin wrote:
         | It doesnt matter where the data is coming from, it matters what
         | it is able to do. As always with security, if the server
         | attempts to protect the client app by emulating its behaviour,
         | it will go wrong (as the server is never able to emulate the
         | client perfectly). This is a problem in most of the magic black
         | security boxes (WAF, IPS, DLP etc.).
         | 
         | The browser knows if a certain piece of data will perform
         | execution or not, as it is the software implementing the
         | functionality. It is the correct app to ask, as it is the one
         | being exploited.
        
         | masa331 wrote:
         | This is for sanitizing content generated client-side which
         | might not touch server at all. You can create HTML and put it
         | into the DOM on the client
        
           | silon42 wrote:
           | If you are generating, you should have a whitelist of safe
           | html/css..
           | 
           | Apart from performance this smells of not using a whitelist
           | mechanism (I hope this is not the case).
        
             | masa331 wrote:
             | What? Whitelisting is one technique which you can use in
             | sanitizing content generated from unknown sources. If you
             | need to generated such content then it's a probably a
             | special feature of your software and no smell
        
             | kapep wrote:
             | > this smells of not using a whitelist mechanism
             | 
             | What makes you think that? I just skimmed the draft and it
             | seems to use a sensible whitelist as default. Developers
             | can allow or deny additional elements/attributes as they
             | like.
        
         | bugmen0t wrote:
         | If you sanitize on the server, you are sanitizing for a
         | theoretical browser and how _you_ might think it parse HTML.
         | Any kind of parsing ambiguity will lead to XSS.
         | 
         | That's why you should be using an API that relies on the
         | browser's parser.
        
         | zerkten wrote:
         | The right answer here is probably that there are some apps
         | which have a no backend, or a limited one, where the
         | possibility of handling dangerous input exists and this
         | provides a standard solution.
         | 
         | Further, it's not a panacea. Defense-in-depth still applies, so
         | server-side and other mitigations will still be appropriate.
         | Build and use threat models to understand what is appropriate.
        
       ___________________________________________________________________
       (page generated 2021-10-20 23:02 UTC)