https://sirre.al/2025/08/06/safe-json-in-script-tags-how-not-to-break-a-site/
sirre.al
Safe JSON in script tags: How not to break a site
Written by
Jon Surrell
console.log( 1 > 0 && 0 < 1 )
This is great, JavaScript can be embedded directly. Imagine if script
tags required HTML escaping:
HTML
In fact, script tags can contain any language (not necessarily
JavaScript) or even arbitrary data. In order to support this
behavior, script tags have special parsing rules. For the most part,
the browser accepts whatever is inside the script tag until it finds
the script close tag ^1.
So, what happens when we embed this perfectly valid JavaScript that
contains a script close tag?
HTML
')
Oops! We can see that was part of a JavaScript string, but
the browser is just parsing the HTML. This script element closes
prematurely, resulting in the following tree:
+-SCRIPT
| +-#text console.log('
+-#text ')
Ok, let's use json_encode() and we should be all set:
PHP
' );
?> )
Now we've got this HTML:
HTML
has become <\/script>. The JavaScript string value is
preserved and the script element does not close prematurely. Perfect,
right?
Not so fast, things are about to get messy
Let's expand with a more complex example. Here's some data used by an
imaginary HTML library. We'll escape the JSON again with json_encode^
2:
PHP
',
'openComment' => '",
"closeScript": "<\/script>",
"openComment": "",
"closeScript": "<\/script>",
"openComment": "
This kind of practice was commonplace on the web. As the web evolved,
browsers continued to support the behavior so they wouldn't break
existing pages. Then, HTML5 came along and standardized the behavior
so folks knew what to expect, even if it's surprising. We can see
other remnants of this practice in the HTML scripting specification:
for related historical reasons, the string "", or a
newline (\n, \f, \r). For example, does not close a script element from the script data double
escaped state.
I encourage you to pause for a moment and play with this example to
get a feel for how the script tag escaped states work.
Avoid the doubled escaped state
The complexity of script tag parsing and escaping comes from the
escaped states. Avoid the script data double escaped state and script
tags become simple. Everything until the tag closer is
inside the script element.
How can we avoid the double escaped state? Script tag parsing always
starts in the script data state and there's a pattern in its
transitions:
*
\u003E. This will escape much more than is strictly necessary, but
it's sufficient and is provided by the language. Perfect!
How to escape JSON escaping in PHP
For JSON that will be printed in a script tag, use the following
flags:
* JSON_HEX_TAG
All < and > are converted to \u003C and \u003E.
* JSON_UNESCAPED_SLASHES
Don't escape /.
If everything is UTF-8 (both the data and the charset of the page)
you can add these flags for cleaner and shorter JSON:
* JSON_UNESCAPED_UNICODE
Encode multibyte Unicode characters literally (default is to
escape as \uXXXX).
* JSON_UNESCAPED_LINE_TERMINATORS
The line terminators are kept unescaped when
JSON_UNESCAPED_UNICODE is supplied. It uses the same behaviour as
it was before PHP 7.1 without this constant. Available as of PHP
7.1.0.
JSON_UNESCAPED_LINE_TERMINATORS is a fun one. Before ES2019,
JavaScript strings did not accept two characters U+2028 (LINE
SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) that JSON strings do
allow. Some valid JSON was invalid JavaScript. Since the JavaScript
is a superset of JSON proposal landed in ES2019, that's no longer the
case and those characters no longer require escaping. Phew! Browser
support today is very good.
JSON escaping in action
Here's the problematic example again, now with the recommended flags:
PHP
',
'openComment' => '