[HN Gopher] Google, Mozilla Close to Finalizing Sanitizer API fo...
___________________________________________________________________
Google, Mozilla Close to Finalizing Sanitizer API for Chrome and
Firefox Browse
Author : todsacerdoti
Score : 179 points
Date : 2021-10-20 05:47 UTC (17 hours ago)
(HTM) web link (portswigger.net)
(TXT) w3m dump (portswigger.net)
| tmikaeld wrote:
| This is good of course, however, it will probably take at least
| 5+ years until a majority of users use eventual browsers that has
| this feature built-in.
| lloydatkinson wrote:
| Why would it take five years for Chrome to implement it?
| 19870213 wrote:
| You're forgetting 'legacy' devices, as in, older than a
| couple of minutes. I maintain an application that is used in
| primary education in the Netherlands, and the oldest device
| thus far with issues is an iPad4 with iOS10.3, which the
| school only invested in a couple of years ago (I don't know
| any further details). And in their infinite wisdom Apple
| fixed the safari version to the ios version, and no
| alternative browsers allowed. /rant
| fabiospampinato wrote:
| It's already implemented in Chrome latest, available under a
| flag. Although support for <svg> and <math> elements is not
| in yet.
| skrebbel wrote:
| It won't, but it will take a while (not sure about five
| years) for everybody to be on the latest version. Chrome has
| nice auto-update but not everybody has it enabled, for all
| kinds of good and bad reasons.
| wongarsu wrote:
| According to statcounter.com any new Chrome version is
| adopted by a large majority of users within 30 days [1].
|
| Sure, some cooperate users hold out much longer, but those
| seem to be a tiny minority.
|
| 1: https://gs.statcounter.com/browser-version-market-share
| codedokode wrote:
| There are other browsers except for Chrome. For example
| there are built-in browsers in smartphones that will
| never be updated.
|
| It is wrong to rely on everyone using a latest version of
| a browser. Every site should support at least 5 years old
| browsers and a good site will be usable in 10 years old
| browser including builtin browser of Windows.
|
| Sadly, in reality most sites are made so poorly that they
| don't open on 5-years old smartphone. It shows how low
| qualified became modern web developers.
| wongarsu wrote:
| I agree that apps will need a fallback for a couple
| years. So far only Chrome and Firefox are implementing
| it, and given that Safari took 2.5 years for
| IntersectionObserver I wouldn't hold my breath until we
| can even claim that all major browsers implement this.
|
| But this API doesn't implement anything we couldn't do
| before, it's a more correct and much faster
| implementation of something we already have libraries
| for. The vast majority of users will have an almost
| immediate benefit from sites using this API, both in
| speed and security.
|
| That's what I take this comment chain to be about, since
| talking about "a majority" doesn't make sense otherwise.
| Supporting browser versions used by at most 2% of all
| users is the name of the game in webdev, that's what made
| IE so annoying.
| [deleted]
| dwheeler wrote:
| The problem isn't that you need a majority; in many such
| situations you need a supermajority. It's usually not okay if
| your website can't be safely used by 49% of your users. In
| particular, Apple is notoriously slow at improving its
| JavaScript support in iOS, and they don't allow competing
| JavaScript implementations to run on iOS, so on iOS you're
| stuck.
|
| In this case, as long as there is an easily-available OSS
| polyfill, it'll be okay. Ideally sites will only load the
| polyfill when they need to (primarily only if they're on iOS).
| fleddr wrote:
| Even after 5 years, you will continue to need to implement a
| fallback, basically forever.
|
| An attacker will always be able to use an older browser version
| that does not have the built-in feature.
| jacobmischka wrote:
| It doesn't matter if attackers can intentionally use older
| browsers, they can also use other tools like curl or even an
| intentionally malicious browser application that doesn't have
| these features either.
|
| It matters if users use the secure browsers, and with Chrome,
| Firefox, and (hopefully) Safari implementing it the vast
| majority of them will within a few release cycles.
| matsemann wrote:
| That's not how XSS generally works. You need a victim to
| visit a trusted page where you've managed to insert some
| html/scripts, and then execute that in the context of the
| user (cookies, read sensitive data etc). If you can trick a
| user to use a different browser, you probably already have
| full control.
| tlamponi wrote:
| > An attacker will always be able to use an older browser
| version that does not have the built-in feature.
|
| You do not purify DOM for the attackers' browser, they can
| just open the dev console and execute arbitrary JS anyway,
| you purify it so that user-input that one renders is also
| safe for others users to see without allowing attacker
| controlled scripts to be executed or DOM elements that leak
| user info on load to be inserted.
|
| And you can always simply start to show a banner that tells
| "contnet blocked, upgrade your browser" in a few years, once
| a big enough majority of your target user base upgraded to a
| browser that supports it.
| xeyownt wrote:
| The day Google and Mozilla will merge, they will make Goozilla,
| even more frightnening than Godzilla.
| 0xy wrote:
| Almost every new web API throws a bone to the ad companies who
| use it for fingerprinting.
|
| The AudioContext API dishes out sensitive and specific audio
| device latency information that can be used to identify you to a
| high degree of accuracy, even if the web page in question never
| plays audio.
|
| If you're ever bored one day, have a go reverse engineering some
| adware JS code to see what they're up to.
|
| AudioContext for a static ad? Canvas fingerprinting? DRM
| fingerprinting? All of this has been enabled by both Google and
| Mozilla, who serve the same ad masters.
|
| Mozilla is entirely dependent on ad money, as is Google. So they
| turn a blind eye to the security disasters being rolled out every
| month -- which Google's customers subsequently abuse.
|
| This will be no different, it's yet another datapoint to identify
| you when perfectly good userspace solutions have existed forever.
| pimterry wrote:
| This makes no sense for this example. It's an always enabled
| browser security API. It doesn't expose anything about the
| device, it doesn't even have state, they're just proposing a
| new Sanitizer API with methods to sanitize DOM objects. All you
| could detect is its presence, which provides far less info than
| the browser version alone.
|
| There are other fingerprinting issues on the web, sure, but
| this is not one of them. The knee-jerk "all changes on the web
| are bad" responses are not helpful.
| 0xy wrote:
| The change is redundant because there are userspace solutions
| today, and they will continue to be used for decades.
|
| So what use does this serve? Another fingerprinting vector to
| narrow down browser versions, or worse -- a security hole.
|
| When the "Web Audio API" was rolled out by the Chrome team
| not only was it fundamentally flawed, but it also contained
| memory leak CVEs.
|
| Every time they add more trash to the browser nobody asks for
| (except their ad clients), they introduce more security
| problems.
|
| So -- why are they adding more APIs when they could fix the
| old ones that are utterly broken and abused daily by the ad
| industry? $$$.
| maple3142 wrote:
| Because DOMPurify is not perfect. Due to some problem of
| HTML parsing, there were some ways to bypass it:
| https://research.securitum.com/mutation-xss-via-mathml-
| mutat...
|
| Having a builtin XSS sanatizer means it could always use a
| single parser to prevent such bypass.
| pimterry wrote:
| The browser version is always already accessible, and even
| in future plans to reduce user agent info
| (https://www.chromestatus.com/feature/5704553745874944) the
| major version number is never going to be hidden.
|
| This API exposes zero new fingerprinting bits.
|
| Meanwhile userspace solutions for the same solution are a)
| occasionally buggy b) not as widely used as they should be
| c) not as performant as they could be if implemented
| directly within the browser and d) not automatically
| updated as new browser features are released. Standardizing
| this will improve all of that.
|
| It is valuable to standardize APIs and build them into the
| web for features like this where there's a clearly correct
| approach that's required by a large percentage of modern
| sites (anything with client side dynamic content). Having
| browser vendors implement this, embedded in the browser
| itself and supported by browser implementers directly, is a
| free security win for the web.
| bugmen0t wrote:
| The intent is to shift the responsibility to the browser.
| Decades worth of userspace solution have failed us. The
| browser is pretty good at HTML parsing.
| prox wrote:
| Yeah I would love to hear someone from Mozilla or Google
| respond to this.
| hoten wrote:
| There's literally nothing to respond to. There's no sensitive
| information exposed by this API...or any information, for
| that matter.
| jiggawatts wrote:
| I don't think the fingerprinting capabilities are caused by
| malice on the behalf of Mozilla or Google.
|
| Even if they made $0.00 from ad revenue, they would still be on
| the losing side of the battle against tracking. You can't have
| many features without exposing _something_ about the client. As
| soon as you have things like multiple versions, optional
| features, plugins, and UI metrics, you 've lost the battle
| already. Just your fonts alone can identify you to a reasonable
| accuracy.
|
| How would _you_ solve this problem? Have _everyone_ run the
| exact same browser in a virtual machine sandbox, with all
| traffic running through a common VPN? No plugins, ever? No
| i18n? No preferences of any type? Upscale a fixed-sized image
| to your screen 's physical resolution and just learn to live
| with the blur?
|
| That's where you'd have to _begin_ , but I guarantee the ad
| people would find a way around it. Key press cadence timing.
| Mouse movement patterns. Something. They'll find a way.
| mathnmusic wrote:
| The first principle should be to separate "documents" from
| "webapps". An article on NYTimes should be classified as a
| document which comes with a sandbox with minimal data
| collection. Of course, as things stand, every site wants to
| become an app because that's how the incentives are set up.
| "Apps" - which can collect more data - should come with
| significant user friction: permissions dialog, standardized
| ToS, disclosures etc. Similarly, sites that offer "documents"
| (i.e. no tracking), should be incentivized in other ways
| (share button, micropayments etc).
|
| There's a lot that can be done.
| 0xy wrote:
| Every data point represents more bits of information used to
| identify users.
|
| I would believe Mozilla and Google were good actors if they
| went back and cleaned up their security vulnerability "Web
| APIs" when they get used almost exclusively for
| fingerprinting.
|
| They don't make any attempt to fix the vulnerabilities, they
| simply add more. Coincidentally, their ad clients directly
| benefit. Ain't that something.
| chrismorgan wrote:
| I haven't delved, but this shouldn't be a fingerprinting vector
| (except for the one bit of whether it's implemented), as all
| browsers will be implementing the same thing, like with HTML
| parsing.
|
| As for the other cases you describe, I'd say the problem isn't
| so much that fingerprinting vectors exist as it is that ad
| providers allow arbitrary unsandboxed code execution, which is
| an obviously-terrible idea that never should have happened.
| fabiospampinato wrote:
| I'm pretty excited about this for two reasons:
|
| - First of all it makes sense that this feature is provided by
| the browser itself and they take some responsibility if it
| doesn't work right.
|
| - Currently the best library for sanitization is probably
| DOMPurify, and the native Sanitizer API is around 100x faster
| than DOMPurify, so that would speed up some things dramatically.
|
| I just hope it won't take years for Safari to implement this.
| pimterry wrote:
| > I just hope it won't take years for Safari to implement this.
|
| 1000%. Safari likes to talk big about rejecting new APIs to
| protect security & privacy, but there's a long list of APIs
| they haven't implemented just like this, that are strictly
| beneficial for users.
|
| That both Firefox & Chrome have shipped working implementations
| of this (a serious fix to solve a top 10 OWASP security issue)
| before Safari has even shown any intent in look at it says a
| lot imo.
| afavour wrote:
| > Safari likes to talk big about rejecting new APIs to
| protect security & privacy
|
| Not only that, they restrict existing functionality. For
| example, all local storage is destroyed if you don't access a
| site in seven days. At first blush that makes sense but it
| means there's no way to reliably persist data to disk. If you
| run a web app you more or less _have_ to create a backend,
| account signups, etc etc. Not only is it a lot of extra work
| it's also going to be a huge security vulnerability. The
| result ends up being entirely counter to Apple's stated
| intent.
| KarlKemp wrote:
| It's pretty obvious how that is helpful to protect users'
| privacy, isn't it?
|
| Or why would they do it, considering it's extra work
| compared to the status quo?
| afavour wrote:
| Of course, I absolutely understand how it prevents
| illegitimate uses of local data storage to violate
| privacy. My concern is that it also destroys entirely
| legitimate use cases for local storage and the only way
| to mitigate that is to open users to a whole new class of
| security vulnerability they can do very little to protect
| themselves from.
| KarlKemp wrote:
| I believe the criteria are more complex than just "7
| days". There's something about AI or ML in the Safari
| "experiments" settings, and IIRC first- vs. third-party
| data is handled differently, and data may also be
| protected for more than a week if you previously had
| regular interactions with the domain.
| javitury wrote:
| That uncertainty is still a blocker for many apps.
| skybrian wrote:
| It seems like that's one way to prevent lock-in to a single
| device, or a single browser on that device.
| gbrown wrote:
| And then iPhone users are stuck due to anticompetitive lock-
| in.
| krono wrote:
| Or in april 2021 they finally decide to implement a date
| input field with picker, but then half-arse and not support
| min and max properties [0].
|
| A feature not being supported is clear-cut and workable. This
| current mess where a feature might be supported, with
| different parts of the spec available only to Safari 14.1 Bug
| Sur and up but not 14.1 on Catalina is just tiresome.
|
| [0] https://caniuse.com/input-datetime
| tehbeard wrote:
| No minmax, thanks Apple...
|
| Aren't we overdue for another indexedDB fuck up by the
| Safari Dev team?
| jessaustin wrote:
| Sshhhh! Don't remind them!
| encryptluks2 wrote:
| Sounds like a positive... although, at this point, what looks
| good may end up actually being bad, like FLoC. Although, I don't
| understand the uproar over people saying adblock was being
| removed from Chrome, which still works for me. I think this is a
| sign that Chromium is actually willing to work with developers to
| improve APIs.
| Semaphor wrote:
| > what looks good may end up actually being bad, like FLoC.
|
| FLoC was a google project (this is FF and Google + library
| author), and it looked bad from the start.
|
| > adblock was being removed from Chrome, which still works for
| me.
|
| Adblock, in a way, will still work. Just even worse than now
| (where uBlock Origin on FF is better than on Chrome). The
| Manifest V3 change was postponed by google, currently [0] they
| plan to stop supporting V2 in January 2023
|
| [0]:
| https://developer.chrome.com/docs/extensions/mv3/mv2-sunset/
| encryptluks2 wrote:
| https://developer.chrome.com/blog/mv2-transition/
|
| > In the meantime, we will continue to add new capabilities
| to Manifest V3 based on the needs and voices of our developer
| community. Even in the last few months, there have been a
| number of exciting expansions of the extension platform.
|
| I have yet to see where Chrome is explicitly telling anyone
| they plan to phase out support for adblockers, nor where they
| are making it clear that is their intention. V3 is not yet
| completed, and is actively being worked on. If they actually
| do disable adblockers then that is a different story.
| rndgermandude wrote:
| >I have yet to see where Chrome is explicitly telling
| anyone they plan to phase out support for adblockers
|
| Why would they ever want to do that? It would be a PR
| nightmare if they came out and explicitly said "fuck you".
|
| Instead they opted to take away capabilities from their
| APIs with the result of severely limiting adblocker
| capabilities, under the guise that this improves security
| and performance, which is not entirely wrong, but at the
| same time hides the fact that there would have been easy
| enough alternatives that preserve the capabilities of
| adblockers and some other API users while making sure the
| security threats they stated they are concerned about can
| be prevented[0]. Yet they didn't even really look at what
| was proposed and insisted on crippling their API in a way
| it cripples adblockers.
|
| I can only conclude that improving security and performance
| is just one of the engineering goals of their solution,
| while the other (unstated) goal is to fuck with adblockers.
|
| [0] They were particularly worried about extensions being
| able to intercept requests, examine requests and exfiltrate
| sensitive data. One way this can be easily solved is by
| adding a special sandbox for request blocking that has no
| accessible output to the extension or anywhere else (no
| backchannel, no access to the network or file system). You
| can load scripts (and data) into it, but it may only ever
| talk to the browser itself during request handling. This
| breaks the "exfiltrate" part.
| encryptluks2 wrote:
| They are still accepting proposals and the changes aren't
| being forced until 2023. I know everyone likes to think
| Google is always evil lately, but there is still a lot of
| time for the new API to be revised and improve on the
| features you mention. You can even make the suggestions
| yourself or work on the code to fix it.
| rndgermandude wrote:
| >They are still accepting proposals
|
| V3 is finalized. And while they say they accept proposals
| for future changes, they already did not accept or even
| consider proposals in the timeline leading to V3.
|
| >changes aren't being forced until 2023
|
| They pushed the timeline back, because of all the push
| back they got. And also, the deadline for _new_
| extensions using V2 is Jan 2022, so a few months from
| now. So
|
| >but there is still a lot of time for the new API to be
| revised and improve on the features you mention.
|
| Not true either, you have until Jan 2022, a few months
| from now, to spec and implement and roll out such a
| revised or new API.
|
| This is not going to happen. They had enough time to do
| all that when the issues were first raised, but didn't.
| They had enough time to do all this when the first
| serious proposals for better solutions were made, but
| they didn't. Why at this point I am wondering: will they
| ever?
|
| Sure, the already established extensions will get a
| little bit of a longer grace period where things would
| theoretically happen. That doesn't help you if you want
| to create something new, tho.
|
| >You can even make the suggestions yourself or work on
| the code to fix it.
|
| Have you ever tried to get code into chrome(ium)? That's
| hard enough by itself. Now try to get code in that
| affects something google considers important... and they
| consider this important at least now, for the mere fact
| it was "news".
|
| Trying to work with them is what gorhill did, and a lot
| of other people too, before he figured out they were set
| on going the cripple-adblockers route and sounded the
| alarms.
| Arnavion wrote:
| >I have yet to see where Chrome is explicitly telling
| anyone they plan to phase out support for adblockers, nor
| where they are making it clear that is their intention.
|
| That was never their intention, not what the uproar was
| about, so it's to be expected that you're not seeing any
| evidence of it. The problem with Manifest V3 is not that it
| disables adblockers; not sure where you got that idea from.
|
| The problem is that it severely restricts how effective
| they can be. Many of the things uBO does cannot be done in
| v3. That's what the uproar is about.
|
| https://github.com/uBlockOrigin/uBlock-
| issues/issues/338#iss...
|
| https://github.com/uBlockOrigin/uBlock-
| issues/issues/338#iss...
| encryptluks2 wrote:
| Again, these changes aren't taking place until 2023. They
| are still accepting new features. Yes, everyone is aware
| that the developer of uBlock threw a fit. It was all over
| the news, and it was portrayed by the media as Chrome is
| disabling adblock, because that is essentially the
| message from Raymond Hill at the time. I think it is good
| to be aware of these changes, and contribute feedback and
| try to get compatible or comparable APIs implemented. At
| no point has Google or Chromium said they are unwilling.
| If anything, it looked like a prime opportunity for them
| to scream fire when there was no fire.
| codedokode wrote:
| There is something wrong with this idea. Sanitizing HTML should
| be done on the server, not on the client side.
|
| Looks like absolutely useless feature that will just make bloated
| browsers more bloated.
| nightpool wrote:
| The benefit of doing this client-side instead of server-side is
| that you can stay up to date with any changes that the client
| may make to how it's processing HTML that may have security
| implications. Additionally, you get to use the exact same code
| that the browser is ultimately using to parse the HTML, so a
| browser parsing bug, spec nuance, or un-specced legacy behavior
| that your backend developer didn't consider don't turn into
| serious security flaws.
|
| Additionally, the Sanitize API does a much better job of
| handling contextual parsing then many other similar backend
| APIs. What happens when you parse an HTML fragment assuming it
| will live in a `div`, and then it actually get inserted into a
| `table` cell? The spec goes into this is more detail here:
| https://wicg.github.io/sanitizer-api/#strings
|
| The downsides, of course, are those associated with any thick-
| client/thin-server API design--more logic on the front-end
| means more logic to reimplement for different consumers.
|
| Personally, I would probably still stick with Nokogiri for my
| own applications, but I can see both sides of the trade-off.
| mftb wrote:
| The article states a couple times in the opening paragraphs
| that the API is about sanitizing dynamically generated HTML,
| "Many websites rely on dynamically generated content in the
| browser. Often, the generated markup includes content provided
| by outside sources, such as user-provided input, which can
| include malicious JavaScript code.". So the server would never
| see this HTML.
| codedokode wrote:
| This is absolutely unclear. If the user enters HTML and that
| HTML never gets to the server then why sanitize it? To
| protect user from hacking themselves?
| alanfranz wrote:
| Google for "reflected XSS". Sometimes a parameter in the
| URL can be rendered in the user's browser.
| oh_sigh wrote:
| Sure, why not? Most users expect that if they paste
| something into a textbox on a site, that their browser
| won't send their cookies and browsing history to some
| random 3rd party
| playpause wrote:
| Users may 'hack themselves' when an attacker persuades them
| to paste something into a website, for example. These are
| very basic XSS questions by the way, you don't seem to know
| enough about the subject to be this incredulous.
| dzaima wrote:
| In addition to the other replies, there could be server-
| provided HTML that the user has the option to change, and
| initiating a change, activating the vulnerability, could be
| one click. (this happened to Google's own search bar!)
|
| Then there are cases of different browsers parsing things
| differently and/or bad sanitization/serializing giving
| different results on repeated invocation or just being
| broken on the server-side. A simple client-side option is
| gonna be a lot simpler.
| tdeck wrote:
| After reading some of these comments I still wonder what the
| concrete use cases are. What are these websites that allow
| users to paste in HTML, and why? Is that even a good idea? I
| can understand when it's developer tools like jsFiddle and
| the like, but when should a normal consumer website be
| hosting untrusted code in the frontend?
| tannhaeuser wrote:
| Have they addressed the points we've discussed 4 months ago [1]
| (eg where they're reinventing SGML, badly and hard-coded to
| HTML):
|
| [1]: https://news.ycombinator.com/item?id=27061020
| nightpool wrote:
| Seeing as [nobody seems to have brought it up to
| them](https://github.com/WICG/sanitizer-api/issues?q=sgml), I'm
| not surprised that they haven't addressed it.
|
| But, as always, specific & easy-to-use APIs are going to win
| out over more "fully general" ones. Are you suggesting that
| everybody learn DSSSL and write queries like
| ((match-element? nd '(section title)) (format-number-
| list (hierarchical-number '("chapter" "section") nd)
| "1" "."))
|
| Simply to be able to safely display some markup? I for one
| would much rather work on an AST with normal javascript instead
| of having to learn another DSL.
| tannhaeuser wrote:
| > _Are you suggesting that everybody learn DSSSL [...]?_
|
| Hell, no ;) Just that they pickup SGML insertion contexts as
| a concept where to escape what chars when that was known in
| the late 1970s already (ISO 8879 was published in 1986, but
| took a loong time through the committees). It's incredibly
| lame they haven't figured DTD/markup grammars and can only
| handle hard-coded HTML insertion contexts - one more thing to
| fall off the cliff as HTML evolves, and unnecessarily so.
|
| OTOH, it always is fun to show HNers what could've would've
| been using DSSSL/Scheme in browsers ...
| nightpool wrote:
| This has nothing to do with "escaping what chars when".
| It's simply a structural whitelist for DOM nodes that
| prevents JS execution, coupled with a contextual parser
| that was already available, but a little hard to find.
| Maybe I'm not understanding your point, because googling
| for "SGML insertion contexts" doesn't bring up anything
| that looks relevant, but there are many, many drawbacks
| that came from using XML to define HTML, and the browser
| community moved away from it for a good reason. My guess is
| that SGML had a similar story.
| ampdepolymerase wrote:
| The Lisp evangelism team on HN will burn you for that comment
| :)
| [deleted]
| floatingatoll wrote:
| The Lisp evangelism team can speak for themselves :)
| alanfranz wrote:
| Risky. How can you distinguish between an intentionally set
| script and an attack?
|
| Why can't HTML be composed client-side using proper, contextual
| APIs instead of "sanitizing" it afterwards? It won't work. It
| reminds me of PHP magic quotes - they didn't work.
|
| We'd still need a sanitizer for URLs, of course, those are one of
| the pesky parts of the web specs.
| wccrawford wrote:
| Because far, far too many web apps need to display user-entered
| data, and it needs to be sanitized. When markdown is converted
| to HTML, as it is on this form, it _still_ should be sanitized
| afterwards to deal with any vulnerabilities that were
| discovered after the user entered the data, even years later.
| alanfranz wrote:
| "and it needs to be sanitized..." clarify your point. If I
| need to display user-controlled data, I can use a proper API
| - e.g. var x = document.createElement(); x.textContent =
| "<script></script>". You can put _anything_ inside
| textContent. It works because it is contextual; you 're
| creating an element and telling the browser what to do with
| it (display as text). If you needed better formatting, you
| would compose the various html elements, you would NOT use
| innerHTML.
|
| DOMPurify performs a string->string conversion, so it's got
| no context information. I don't understand how this can work.
| It didn't work for PHP magic quotes. It doesn't work for SQL
| queries. Why can and should it work for HTML?
|
| Remember that "work" implies not just "safe". It implies "it
| must show what the user wanted to see". Otherwise
| var sanitize = function(input){ return "<p>";}
|
| Would "just work" perfectly.
| nightpool wrote:
| Well, it's a good thing we're not talking about DOMPurify,
| because the spec we're talking about (the Sanitizer API),
| has lots of context information and does not provide a
| string -> string API: https://wicg.github.io/sanitizer-
| api/#strings
|
| This API is simply a DOM-based whitelist for preventing
| script execution coupled with a contextual parser. No more,
| no less. It doesn't solve every problem with accepting
| untrusted HTML, sure, but it's good enough for a wide
| variety of use-cases (One example that comes up frequently
| is embedded markdown, another good example is a mail client
| --both situations where formatted, user-controlled data is
| important). The benefit of doing this client-side instead
| of server-side is that you can stay up to date with any
| changes that the client may make to how it's processing
| HTML that may have security implications. (The downsides,
| of course, are those associated with any thick-client/thin-
| server API design--more logic on the front-end means more
| logic to reimplement for different consumers)
| danShumway wrote:
| This is good.
|
| All of this stuff is possible to do already with 3rd-party
| frameworks with varying levels of performance/reliability, and
| there may be other methods you use to sanitize your output. If
| you're a responsible developer, if you have experience trying to
| guard against XSS, or if you're working with a framework like
| React/Vue/whatever, this may not change much about your life.
|
| However, I think it's understated how many sites struggle with
| this, and how many of them do sanitization poorly, and it is much
| better to be able to point them towards a single API and say,
| "look, just call this function."
|
| One of the bigger vulnerabilities I've ever found in a website
| (https://danshumway.com/blog/gamasutra-vulnerabilities/) was in
| part a problem directly caused by bad sanitization. When I
| reported that vulnerability, I didn't advise them to fix their
| sanitizer because I didn't feel confident it would really fix the
| problem or that there wouldn't be other issues in the future. I
| could have pointed them towards something like DOMPurify, but
| instead I advised that they start embedding posts in iFrames with
| reduce permissions so any scripts that did run wouldn't be able
| to get at user data.
|
| I wonder if there was more of a standard around this stuff if
| some engineer on their team might not have caught the problem
| earlier.
|
| Similar to the SameSite changes to cookies, getting a native
| sanitizer isn't about forcing you to stop doing serverside
| sanitization or even changing your workflow much at all. It's
| about making the defaults safer and making it easier for hobby
| developers (and companies too) to avoid messing up because they
| don't understand the risks of calling otherwise innocuous
| functions.
|
| What's even more exciting is some of the work going into trusted
| types (https://web.dev/trusted-types/). The sanitizer API gives
| you a function you can call to get reasonably safe output without
| a lot of configuration or a 3rd-party framework or knowing what
| to download. The trusted types API is designed to make it harder
| to accidentally go outside of that sanitizer. It's not a silver
| bullet, but it's a big deal for companies where you're trying to
| set org-wide policies and avoid XSS attacks sneaking in through a
| random feature by a team that you're not watching closely enough.
|
| It's promising to see Firefox/Google working together on this
| stuff, I hope they continue with other APIs. The ideas are good,
| they just need iteration and more input, and these kinds of API
| areas are imo where Google/Mozilla tend to work together pretty
| well. I'm fairly optimistic about them.
| chrisweekly wrote:
| Mods: Title typo - last word should be "Browsers" not "Browse"
| floatingatoll wrote:
| Only way the mods will see that in a timely manner is if you or
| someone emails them, using the footer Contact link. Otherwise
| they might not ever realize you posted this.
| lordgrenville wrote:
| 80 char title limit
| https://github.com/wting/hackernews/blob/master/news.arc#L14...
| tinus_hn wrote:
| This is solving an issue in the browser that should be solved
| server side.
|
| Where is this data they are sanitizing coming from? Why would you
| want every browser treating this differently?
| jorangreef wrote:
| In fact, this is actually an issue that can only be properly
| solved in the browser, hence the need for security projects
| like DOMPurify in the past.
|
| The reason being that all browsers parse potential XSS (and
| mXSS) content slightly differently so that a server side
| sanitizer by definition will never parse content in exactly the
| same way as a specific version of a given browser, since it's
| not sharing exactly the same runtime.
|
| This semantic gap between browser rendering quirks and server
| side approximations can be exploited by an attacker to slip
| content past the server side sanitizer.
|
| And that's why the browser vendors are actively working with
| Cure53, the author of DOMPurify, in order to ship Sanitizer
| API.
|
| I'm just surprised this has taken so long.
| cxr wrote:
| A reminder: people, please don't take your cues on language
| theoretic security (or anything security-related) from
| internet comments containing unverified claims, even if those
| comments aren't nameless/faceless and are moderately-to-
| highly upvoted. The parent comment is an example, appearing
| to have the approval of the community, but there are several
| subtle things wrong (or just odd) with it. It's short of
| misleading people into doing anything harmful, though, so
| stopping the world to hash them out here isn't critical, and
| would be tedious besides.
|
| Aside from that, a subset of people using DOMPurify--maybe
| even a plurality--seem to treat it as a talisman and can't
| explain how or if they've configured it correctly to provide
| the kind of protections they need for their use case.
| Security is not a separable concern.
| jorangreef wrote:
| It would be better for your comment to provide reasonable
| objections as to why DOMPurify should not be considered
| state of the art, without spreading fear, uncertainty or
| doubt.
| mcherm wrote:
| > The parent comment is an example, appearing to have the
| approval of the community, but there are several subtle
| things wrong (or just odd) with it. It's short of
| misleading people into doing anything harmful, though, so
| stopping the world to hash them out here isn't critical,
| and would be tedious besides.
|
| It is not clear to me what subtle things are wrong with the
| parent comment. But I'm pretty sure I can see the flaw in
| claiming that something is wrong without saying how. It's
| an impossible-to-disprove accusation which therefore does
| not advance the conversation.
| acdha wrote:
| > please don't take your cues on language theoretic
| security (or anything security-related) from internet
| comments containing unverified claims
|
| Your comment is nothing but vague, unverified claims. Why
| do should you be given the benefit of the doubt but not the
| person you're replying to? If you have a real concern,
| share it so that it can be evaluated and everyone,
| including the original poster, can learn from it.
| nawgz wrote:
| What a useless comment. "Shite bad but I won't say a thing
| about what or how"
|
| And then you proceed to imply developers who have UIs
| susceptible to XSS are to blame, even though this is a
| known attack vector where browsers parse adversarial trees
| in an exploitable way that is completely opaque to
| everyone?
|
| This is silly. Hold your tongue or speak clearly, don't
| jeer from a high horse
| fabiospampinato wrote:
| > This is solving an issue in the browser that should be solved
| server side.
|
| This doesn't make much sense to me, my app doesn't even rely on
| any servers, what do you suggest I do then?
|
| > Where is this data they are sanitizing coming from?
|
| It's untrusted, what does it matter where exactly it comes
| from?
|
| > Why would you want every browser treating this differently?
|
| Why do you say that each browser would tread this differently?
| There's a spec for how this should work:
| https://wicg.github.io/sanitizer-api/
| resonious wrote:
| > my app doesn't even rely on any servers, what do you
| suggest I do then?
|
| Although I agree that there is nothing wrong with client-side
| sanitization, does this even apply to non-networked apps? Are
| you protecting your user from?
|
| Or perhaps by "doesn't rely on any servers" you mean
| "serverless", which doesn't literally mean no servers. If
| your app's client hits a database-as-a-service somewhere then
| that is a server and it very well can sanitize for you.
| pjerem wrote:
| I think it's "doesn't rely on any servers _that you own_ ".
|
| You can have no back-end code but still can call external
| APIs hosted on servers you don't control or which can be
| compromised by malicious user generated content.
| esrauch wrote:
| Besides the other comments there are legitimate security
| reasons to be worried about sanitizing in general even in
| zero-rpc-app cases; malicious files are an attack vector.
| Eg if you were making Calibre you do still want to be
| hardened against malicious local files from hijacking the
| app.
| fabiospampinato wrote:
| > Although I agree that there is nothing wrong with client-
| side sanitization, does this even apply to non-networked
| apps? Are you protecting your user from?
|
| Kind of from themselves, the app can open arbitrary files,
| you can't have people nuke themselves by opening a random
| file they got from the internet, or by pasting in something
| they don't understand. Also a plugin may sort of escape
| from the boundaries put on it by XSS-ing its user by
| outputting some malicious stuff.
|
| > Or perhaps by "doesn't rely on any servers" you mean
| "serverless", which doesn't literally mean no servers. If
| your app's client hits a database-as-a-service somewhere
| then that is a server and it very well can sanitize for
| you.
|
| No I actually meant no servers at all, it's a 100% local
| Electron app.
| wongarsu wrote:
| > does this even apply to non-networked apps? Are you
| protecting your user from?
|
| Some pages render data from GET parameters, which allows
| XSS by giving people a link. There's also scenarios where
| data from localstorage, cookies or manually copy-pasted
| data is an attack vector. Imagine I get you to paste my
| document into your text editor and that allows me to
| extract all your data.
| runarberg wrote:
| XSS is a known vulnerability in social hacking as well as
| shared data. Imagine an app which allows you to paste an
| input which will later be rendered as HTML. An attacker can
| create a malicious script and persuade the victims to copy
| it and paste in their app. This script might include
| `fetch` calls to a server the attacker controls which the
| attacker can use to spy or install malware on the victims
| machine.
|
| The above example is a vulnerability in the app just the
| same, it just requires the attacker to consider different
| avenues of delivery (such as sharing the malicious script
| on a forum) then in traditional server connected apps.
| tinus_hn wrote:
| What is your 'app' and how does it get untrusted input
| without any server being involved?
|
| There is also a spec for how html should work.
|
| And finally, 100% of the useful functionality of this 'api'
| could be implemented using a Javascript library or server
| side. Yes, browsers have slight disagreements over handling
| broken html. So the first step of server side sanitation is
| to remove the broken stuff. This also removes a lot of cross
| browser incompatibility. And then you select tags and only
| allow the tags you want. Basic xss sanitation like it's 1999.
|
| Nobody needs any built in api for this. This proposal is
| about solving a solved problem.
| fabiospampinato wrote:
| It's an Electron 'app' that can open arbitrary files.
|
| This isn't the sort of API that unlocks some previously
| impossible feature, but DOMPurify is incredibly slow and I
| would trust the browser's implementation more, obviously.
| Rewriting DOMPurify better has always been an option, but
| that's not an easy value proposition for most developers,
| hence the best we've got is still DOMPurify.
| tinus_hn wrote:
| - you don't need to add apis to Firefox to use them in an
| Electron app
|
| - It isn't hard to do what this api does in Javascript as
| fast as the browser can do it. There already is api to
| convert strings to DOM and it isn't exactly rocket
| science to iterate over the result and drop tags that
| aren't on a list. I'm sure there can be a library that
| does it slowly but it doesn't have to.
| fabiospampinato wrote:
| > - you don't need to add apis to Firefox to use them in
| an Electron app
|
| Right, so what?
|
| > - It isn't hard to do what this api does in Javascript
| as fast as the browser can do it. There already is api to
| convert strings to DOM and it isn't exactly rocket
| science to iterate over the result and drop tags that
| aren't on a list. I'm sure there can be a library that
| does it slowly but it doesn't have to.
|
| Of course it's DOMPurify's fault for being that slow,
| nobody is arguing that it must be rewritten in C++ to be
| fast.
|
| It's not as simple as you put it though, unless you have
| an extremely strict list of tags containing no tags in
| which case it's trivial, but it's also of very little
| use.
|
| Since your comment sounds overly arrogant to my ears I'd
| like to point out that by following the rules that you
| mentioned for making a sanitizer (DOMParser + drop non-
| whitelisted tags basically) you can only produce either a
| useless or a broken sanitizer, it's impossible to make a
| useful and working sanitizer that way. Proof-ish: either
| you drop <img> nodes, in which case you've made a pretty
| useless sanitizer in my book, or you leave it in, in
| which case you leave yourself open to XSS via stuff like
| the onerror attribute.
|
| It's not trivial to write a sanitizer that both works and
| it's useful, almost no developers should need to learn
| all the nuances necessary for writing one, hence the
| platform itself should provide it.
| dobin wrote:
| It doesnt matter where the data is coming from, it matters what
| it is able to do. As always with security, if the server
| attempts to protect the client app by emulating its behaviour,
| it will go wrong (as the server is never able to emulate the
| client perfectly). This is a problem in most of the magic black
| security boxes (WAF, IPS, DLP etc.).
|
| The browser knows if a certain piece of data will perform
| execution or not, as it is the software implementing the
| functionality. It is the correct app to ask, as it is the one
| being exploited.
| masa331 wrote:
| This is for sanitizing content generated client-side which
| might not touch server at all. You can create HTML and put it
| into the DOM on the client
| silon42 wrote:
| If you are generating, you should have a whitelist of safe
| html/css..
|
| Apart from performance this smells of not using a whitelist
| mechanism (I hope this is not the case).
| masa331 wrote:
| What? Whitelisting is one technique which you can use in
| sanitizing content generated from unknown sources. If you
| need to generated such content then it's a probably a
| special feature of your software and no smell
| kapep wrote:
| > this smells of not using a whitelist mechanism
|
| What makes you think that? I just skimmed the draft and it
| seems to use a sensible whitelist as default. Developers
| can allow or deny additional elements/attributes as they
| like.
| bugmen0t wrote:
| If you sanitize on the server, you are sanitizing for a
| theoretical browser and how _you_ might think it parse HTML.
| Any kind of parsing ambiguity will lead to XSS.
|
| That's why you should be using an API that relies on the
| browser's parser.
| zerkten wrote:
| The right answer here is probably that there are some apps
| which have a no backend, or a limited one, where the
| possibility of handling dangerous input exists and this
| provides a standard solution.
|
| Further, it's not a panacea. Defense-in-depth still applies, so
| server-side and other mitigations will still be appropriate.
| Build and use threat models to understand what is appropriate.
___________________________________________________________________
(page generated 2021-10-20 23:02 UTC)