[HN Gopher] How to safely escape JSON inside HTML SCRIPT elements
___________________________________________________________________
How to safely escape JSON inside HTML SCRIPT elements
Author : dmsnell
Score : 58 points
Date : 2025-08-08 22:42 UTC (4 days ago)
(HTM) web link (sirre.al)
(TXT) w3m dump (sirre.al)
| dmsnell wrote:
| Discussing why parsing HTML SCRIPT elements is so complicated,
| the history of why it became the way it is, and how to safely and
| securely embed JSON content inside of a SCRIPT element today.
| dmsnell wrote:
| This was my first submission, and the above comment was what I
| added to the text box. It wasn't clear to me what the purpose
| was, but it seemed like it would want an excerpt. I only
| discovered after submitting that it created this comment.
|
| I guess people just generally don't add those?
|
| Still, to help me out, could someone clarify why this was down-
| voted? I don't want to mess up again if I did, but I don't
| understand what that was.
| shakna wrote:
| > Leave url blank to submit a question for discussion. If
| there is no url, text will appear at the top of the thread.
| If there is a url, text is optional.
|
| Most people will opt for text to be optional with a link -
| unless they're showing their own product (Show HN). Because
| there is an expectation that you will attempt to read an
| article, before conversing about it.
| flomo wrote:
| I don't know, but I see early posts which look like AI bot
| summaries (presumably to collect karma). Probably not
| necessary for a link.
| bawolff wrote:
| I think its just because as a comment it looks pretty random
| and somewhat off topic since its a summary of the article
| instead of an opinion on it.
|
| I think most of the time people dont add a comment to
| submissions, but if they do its more of the form: I found X
| interesting because of [insert non obvious reason why X is
| interesting] or some additional non-obvious context needed.
|
| In any case, i don't think there is any reason to worry too
| much. There was no ill intent and at the end of the day its
| all just fake internet points.
| comex wrote:
| If you're evaluating JSON as JavaScript, you also need to make
| sure none of the objects have a key named "__proto__", or else
| you can end up with some strange results.
|
| (This is related to the 'prototype pollution' attack, although
| searching that phrase will mostly give you information about the
| more-dangerous variant where two objects are being merged
| together with some JS library. If __proto__ is just part of a
| literal, the behavior is not as dangerous, but still surprising.)
| o11c wrote:
| But note that there's also `<script type="application/json">`
| these days (usually only useful with `id=`) ... and `importmap`
| I guess.
| themafia wrote:
| It's even more general: type
| This attribute indicates the type of script represented. The
| value of this attribute will be one of the following:
| [...] Any other value The
| embedded content is treated as a data block, and won't be
| processed by the browser. Developers must use a valid MIME
| type that is not a JavaScript MIME type to denote data
| blocks. All of the other attributes will be ignored,
| including the src attribute.
|
| Although 'importmap' has specific functionality, as does
| 'speculationrules', although they operate similarly. My
| favorite is type="module" which competes with the higher
| level attribute nomodule="true". Anyways it looks like
| <script> has taken a lot of abuse over the years:
|
| https://developer.mozilla.org/en-
| US/docs/Web/HTML/Reference/...
| masklinn wrote:
| > My favorite is type="module" which competes with the
| higher level attribute nomodule="true". Anyways it looks
| like <script> has taken a lot of abuse over the years:
|
| It "conflicts" in the same way noscript[1] and script
| "conflict" no? They're basically related features, but
| can't really be made exclusive because the mere act of
| trying to do so wouldn't work: as the link indicates,
| executing code in a !module browser reserves the type
| (requires a specific set of types) so you can't use that as
| a way to opt in !module browsers.
|
| [1] an other fun element with wonky parsing rules besides
| themafia wrote:
| You can write: <script nomodule="true"
| type="module"></script>
|
| Which is a little weird. At the very least I'd expect the
| type="module" documentation to say that `charset`,
| `defer` and `nomodule` attributes have no effect.
| domenicd wrote:
| It does? https://html.spec.whatwg.org/multipage/scripting
| .html#attr-s...
| themafia wrote:
| It specifies it in the abstract. Did you mean to link
| here instead of to the 'src' attribute documentation?
|
| https://html.spec.whatwg.org/multipage/scripting.html#att
| r-s....
|
| My expectation was that this condition would have been
| reflected in MDNs documentation where it breaks the
| conditions for 'charset' and 'defer' out.
| masklinn wrote:
| Why? MDN does not purport to be exhaustive, that's the
| spec's job.
| themafia wrote:
| MDN does a pretty good job anyways. Perhaps I feel that
| it would be in keeping with that spirit to have this
| condition documented. This is partly because MDN is far
| easier to read for the purposes of _reference_ than the
| spec which is easier to read for the purposes of
| _implementing_. It's also easier to search and to share
| links to, as the link you presented earlier was both
| wrong and confusing, and there was no natural way to link
| to the part of the document you intended.
|
| Perhaps the spec isn't the right tool for every job?
| That's why, for me, at least.
| moron4hire wrote:
| Submit a change, then. MDN isn't written by some secret
| cabal. It's written by all of us.
| masklinn wrote:
| > You can write:
|
| Yes, and you can write <noscript>
| <script> ... </script>
| </noscript>
| jgalt212 wrote:
| Why does the author ignore this method? Django docs show this
| as a best practice via a built in tag.
| minitech wrote:
| Yes, that option is the real "just do this".
|
| - escape `<` as `\u003c` <script id="my-json"
| type="application/json">{{ escaped_json }}</script>
| JSON.parse(document.getElementById('my-json').textContent)
|
| No __proto__ issue, and no dynamic code at all, so you can
| use a strict CSP.
| dullcrisp wrote:
| Wait can someone explain why a script tag inside a comment inside
| a script tag needs to be closed, while a script tag inside a
| script tag without a comment does not? They explained why
| comments inside script tags are a thing, but nothing further than
| that.
| AdieuToLogic wrote:
| From the post: Everything until the tag closer
| </script> is inside the script element.
|
| And: In fact, script tags can contain any
| language (not necessarily JavaScript) or even arbitrary
| data. In order to support this behavior, script tags
| have special parsing rules. For the most part, the
| browser accepts whatever is inside the script tag until
| it finds the script close tag </script>.
|
| Note the sentence fragment "even arbitrary data." This explains
| the second part of your question as to why nested script tags
| without HTML comments do not require matching closing tags.
| Similar compatibility hacks exist for other closing tags
| (search for Chrome closing tags being optional for a fun ride
| down a rabbit hole).
|
| As to: why a script tag inside a comment inside
| a script tag needs to be closed ...
|
| Well, this again is due to maximizing backward compatibility in
| order to support broken browsers (thanks IE4, you bastard!). As
| the article states: When JavaScript was first
| introduced, many browsers did not support it. So they
| would render the content of the script tag - the
| JavaScript code itself. The normal way to get around
| that was to put the script into a comment ...
|
| HTH
| dullcrisp wrote:
| So did these older browsers also check for the presence of a
| comment before turning on double-escaping mode?
|
| Or did they always have two levels of script tag escaping but
| that behavior only got preserved when inside an HTML comment?
|
| No other JavaScript behavior is different inside an HTML
| comment, and I'm still missing the connection between the
| HTML comment and the embedded </script> not closing the tag
| besides that they were two things that older browsers might
| have done.
| dmsnell wrote:
| The other comment explains this, but I think it can also be
| viewed differently.
|
| It's helpful to recognize that the inner script tags are not
| actual script tags. Yes, once entering a script element, the
| browser switches parsers and wants to skip everything until a
| closing script tag appears. The STYLE element, TITLE, TEXTAREA,
| and a few others do this. Once they chop up the HTML like this
| they send the contents to the separate inner parser (in this
| case, the JS engine). SCRIPT is unique due to the legacy
| behavior^1.
|
| HTML5 specifies these "inner" tags as transitions into escape
| modes. The entire goal is to allow JavaScript to contain the
| string "</script>" without it leaking to the outer parser. The
| early pattern of hiding inside an HTML comment is what
| determined the escaping mechanism rather than making some
| special syntax (which today does exist as noted in the post).
|
| The opening script tag inside the comment is actually what
| triggers the escaping mode, and so it's less an HTML tag and
| more some kind of pseudo JS syntax. The inner closing tag is
| therefore the escaped string value and simultaneously closes
| the escaped mode.
|
| Consider the use of double quotes inside a string. We have to
| close the outer quote, but if the inner quote is escaped like
| `\"` then we don't have to close it -- it's merely data and not
| syntax.
|
| There is only one level of nesting, and eight opening tags
| would still be "closed" by the single closing tag.
|
| ^1: (edit) This is one reason HTML and XML (XHTML) are
| incompatible. The content of SCRIPT and STYLE elements are
| essentially just bytes. In XML they must be well-formed markup.
| XML parsers cannot parse HTML.
| tannhaeuser wrote:
| Whoever the idiot was who came up with piling inline CSS and
| JS into the already heavy SGML syntax of HTML should've
| considered his career choices. It would've be perfectly
| adequate to require script and CSS to be put into external
| "resources" linked via src/href, especially since the spec
| proposals operated under the assumption there would be
| multiple script and styling languages going forward (like,
| hey, if we have one markup and styling language, why not have
| two or multiple?). When in fact the rules were quite simple:
| in SGML, text rendered to the reader goes into content,
| everything else, including formatting properties, goes into
| atttibutes. The reason for introducing this inlining
| misfeature was probably the desire to avoid network
| roundtrip, which would've later been made bogusly obsolete by
| Google's withdrawn HTTP/2 push spec, but also the bizarre
| idea anyone except webdev bloggers would be editing HTML+CSS
| by hand. To think there was a committee overviewing such
| blunders as "W3C recommendations" - actually, they screwed up
| again with CSS when they allowed unencoded inline data URLs
| such as used for SVG backgrounds and the like. The alarm
| bells should've been ringing at the latest the moment they
| seriously considered storing markup within CSS like with the
| abovementioned misfeature but also with the "content:" CSS
| property. You know, as in "recommendation" which is how W3C
| final stage specs were called.
| socalgal2 wrote:
| All of those are features, not bugs and I'm glad they are
| there. Uploading and dealing with 1 file is much nicer than
| dealing with several.
| jiggawatts wrote:
| > much nicer than dealing with several.
|
| _" My momentary convenience trumps the man-millenia of
| effort required to protect billions of people from script
| injection attacks."_
| porridgeraisin wrote:
| Not just his convenience. Man-millenia of convenience, if
| you will ;) I too love the fact that many things can be
| single index.html's, no need of a zip file then. It's
| double-click to view. One of the best things about the
| web platform.
|
| Edit: and "effort", please. The spec has a simple and
| clear note:
|
| > The easiest and safest way to avoid the rather strange
| restrictions described in this section is to always
| escape an ASCII case-insensitive match for "<!--" as
| "\x3C!--", "<script" as "\x3Cscript", and "</script" as
| "\x3C/script" when these sequences appear in literals in
| scripts (e.g. in strings, regular expressions, or
| comments), and to avoid writing code that uses such
| constructs in expressions. Doing so avoids the pitfalls
| that the restrictions in this section are prone to
| triggering.
|
| Backwards compatibility is easily and completely worth
| this small amount of effort. It's a one-liner in most
| languages.
| tannhaeuser wrote:
| The easiest and safest way to avoid the rather strange
| restrictions described is to not make use of inline
| script in a way that makes those restrictions neccessary,
| though. And a "recommendation" should reflect that (from
| back when HTML recommendations were actually published
| rather than random Google shills writing whatever on
| github). The suggested workaround is also not without
| criticism (eg [1]).
|
| [1]: https://uploadcare.com/blog/vulnerability-in-html-
| design/
| robocat wrote:
| > It would've be perfectly adequate to require script and
| CSS to be put into external "resources" linked via src/href
|
| Bullshit - Navigator and IE didn't have HTTP/2. I'm
| guessing you didn't use dialup where your external CSS or
| JavaScript regularly failed to load. You didn't add extra
| dependencies because IE would only had two concurrent
| connections to load files.
|
| It's easy to criticize past mistakes from your armchair:
| but I suggest you try and be a little more fair towards the
| people that made decisions especially when overall HTML has
| been a resounding success.
| tannhaeuser wrote:
| I suggest you try and check what the people you're
| accusing of armchair attitudes in fact were and are doing
| to solve problems.
|
| Have you done even a single thing in the markup
| community?
| robocat wrote:
| Sorry - I shouldn't be so flippant.
|
| Engineers hate bad compromises, and the core of
| engineering is making good compromises. Creating anything
| makes you your own critic.
| edoceo wrote:
| Time makes a fool of everyone.
| dullcrisp wrote:
| Huh, it's still confusing to me why they would have this
| double-escaping behavior only inside an HTML comment. Why not
| have it always behave one way or the other? At what point did
| the parsing behavior inside and outside HTML comments split
| and why?
| dmsnell wrote:
| At some point I think I read a more complete justification,
| but I can't find it now. There is evidence that it came
| about as a byproduct of the interaction of the HTML parser
| and JS parsers in early browsers.
|
| In this link we can see the expectation that the HTML
| comment surrounds a call to document.write() which inserts
| a new SCRIPT element. The tags are balanced.
|
| https://stackoverflow.com/questions/236073/why-split-the-
| scr...
|
| In this HTML 4.01 spec, it's noted to use HTML comments to
| hide the script contents from render, which is where we
| start to get the notion of using these to hide markup from
| display.
|
| https://www.w3.org/TR/html401/interact/scripts.html
|
| Some drafts of the HTML standard attempted to escape
| differently and didn't have the double escape state.
|
| https://www.w3.org/TR/2016/WD-html52-20161206/semantics-
| scri...
|
| My guess is that at some point the parsers looked for
| balanced tags, as evidenced in the note in the last link
| above, but then practical issues with improperly-generated
| scripts led to the idea that a single SCRIPT closing tag
| ends the escaping. Maybe people were attempting to
| concatenate script contents wrong and getting stacks of
| opening tags that were never closed. I don't know, but I
| suppose it's recorded somewhere.
|
| Many things in today's HTML arose because of widespread
| issues with how people generated the content. The same is
| true of XML and XHTML by the way. Early XML mailing lists
| were full of people parsing XML with naive PERL regular
| expressions and suggesting that when someone wants to "fix"
| broken markup, that they do it with string-based find-and-
| replace.
|
| The main difference is that the HTML spec went in the
| direction of saying, _if we can agree how to handle these
| errors then in the face of some errors we can display some
| content_ and we can all do it in the same way. XML is worse
| in some regards: certain kinds of errors are still
| ambiguous and up to the parser to determine how to handle,
| whether they are non-recoverable or recoverable. For those
| non-recoverable, the presence of a single error destroys
| the entire document, like being refused a withdrawal at the
| bank because you didn't cross a 7.
|
| At least with HTML5, it's agreed upon what to do when
| errors are present and all parsers can produce the same
| output document; XML parsers routinely handle malformed
| content and do so in different ways (though most at least
| provide or default to a strict mode). It's better than the
| early web, but not that much better.
| TOGoS wrote:
| > Not so fast, things are about to get messy
|
| That ship sailed several paragraphs ago, when <script> got
| special treatment by the HTML parser. Too bad we couldn't all
| agree to parse <![CDATA[...]]> consistently, or, you know, just
| &-escape the text like we do /everywhere else/ in HTML.
| forty wrote:
| What's wrong with CDATA? Do you have concrete examples when
| that would not work?
| TOGoS wrote:
| As per the 'special parsing rules for script tags', browsers
| don't actually treat it as what you'd expect it means.
| <script>console.log("<![CDATA[Hello, this string content in a
| CDATA section!]]>");</script>
|
| Results in this being output to the console:
| <![CDATA[Hello, this string content in a CDATA section!]]>
|
| Browsers don't do what you intend if you wrap the whole
| script in CDATA, either. They treat the "<![CDATA[" sequence
| as literally part of the script! Which of course throws a
| syntax error.
|
| I tend to use them anyway, as sort of a HTML/XHTML polyglot
| thing, because deep in my heart I still think HTML should be
| valid XML: <script>/* <![CDATA[ */
| // my script here, and you *still* need to be careful not
| // to include close-script or close-cdata sequences /*
| ]]> */</script>
|
| In summary, the 'special parsing rules for script tags' add a
| great amount of complexity not just to the parsing code, but
| for anybody who has to emit markup, especially if different
| parsers disagree on what kind of escaping rules are active
| within a given section. Yes, the HTML5 spec codified the
| neurotypical "I would rather make you guess what I mean than
| just use the proper words to say it clearly" behavior, so at
| least browsers agree on it, but it's a mess and a pain to
| deal with because now you have to remember 1000 exceptions to
| what would have been simple rules.
___________________________________________________________________
(page generated 2025-08-12 23:00 UTC)