From: Timo Tijhof Date: Mon, 20 Aug 2018 00:42:15 +0000 (+0100) Subject: Html: Reject from inlineScript() and leave rest unescaped X-Git-Tag: 1.34.0-rc.0~4254^2 X-Git-Url: http://git.cyclocoop.org/data/Luca_Pacioli_%28Gemaelde%29.jpeg?a=commitdiff_plain;h=15d6bffb56ac24ea7aaa3c9b4e161b228894b477;p=lhc%2Fweb%2Fwiklou.git Html: Reject from inlineScript() and leave rest unescaped There are three problems with the CDATA approach: 1. It doesn't work. HTML5 already interprets the contents of would be a syntax error. Indicating it is not interpreted as text, but as CDATA. Effectively, the only thing an HTML parser looks for is . And that's exactly the problem. Producing an inline script containing the characters "" for legitimate reasons, is currently broken. No alternate wrapping or setting can make it work, either. See also: https://people.wikimedia.org/~krinkle/200506-html-inlinescript.html which contains: tag (original)'); } /*]]>*/ In a browser, the script is terminated by the first "", leaving the code unfinished, throwing a SyntaxError, and outputting the rest of the script as plain text on the page. 2. CDATA is only for XML mode, whereas MediaWiki does not support the XML/XHTML output mode (since MediaWiki 1.22). Instead, we only output HTML (5). Code that does need to produce XML, should use the class from Xml.php instead. 3. It gives a false sense of security. We could just remove the CDATA code as-is and that in itself would be an improvement per point 2 and 3, and would break nothing per point 1. However, this commit attempts to address the underlying bug by rejecting the characters "" from input. If this is needed in a literal, it is the responsibility of the caller to escape it in a way that is appropiate for how it is used (string, comment, regex, etc.). There are two ways this can be used currently in core: * User input as exported through JSON (e.g. mw.config, or mw.messages). This is already fine as both FormatJson::encode and json_encode handle escape either < or / in the string by default already. * Previews of edits to user scripts. This is currently already broken and causes the script to end early and produce arbitrary HTML on the page. This commit limits the impact by refusing to output such script in a broken way. I will further address that use case in a follow-up. Bug: T200506 Change-Id: I67ceb34eabf2f62fd3f3841b8f1459289fad28fb --- diff --git a/includes/Html.php b/includes/Html.php index dba4c67a72..aac492c921 100644 --- a/includes/Html.php +++ b/includes/Html.php @@ -552,10 +552,13 @@ class Html { } /** - * Output a "" or (for XML) literal "]]>". + * It is unsupported for the contents to contain the sequence `setUserLang( $langObj ); $this->setContentLang( $langObj ); + $this->restoreWarnings = false; + } + + protected function tearDown() { + if ( $this->restoreWarnings ) { + $this->restoreWarnings = false; + Wikimedia\restoreWarnings(); + } + parent::tearDown(); } /** @@ -802,21 +812,30 @@ class HtmlTest extends MediaWikiTestCase { ], 'Ampersand' => [ 'EXAMPLE.is(a && b);', - '' + '' ], 'HTML' => [ 'EXAMPLE.label("");', - '' + '' ], - 'Script closing string' => [ + 'Script closing string (lower)' => [ 'EXAMPLE.label("");', - // Broken: First ends the script in HTML - '");/*]]>*/' + '', + true, ], - 'CDATA string' => [ - 'EXAMPLE.label("&> CDATA ]]>");', - // Broken: Works in HTML, but is invalid XML. - '' + 'Script closing with non-standard attributes (mixed)' => [ + 'EXAMPLE.label("");', + '', + true, + ], + 'HTML-comment-open and script-open' => [ + // In HTML, , + // there are levels of escaping modes, and the below sequence puts an + // HTML parser in a state where would *not* close the script. + // https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-end-state + 'var a = "