From: Ilmari Karonen Date: Sun, 4 Jan 2009 02:29:00 +0000 (+0000) Subject: Add special case handling of the XHTML character entity "'" to normalizeEntity... X-Git-Tag: 1.31.0-rc.0~43572 X-Git-Url: https://git.cyclocoop.org/%7B%24www_url%7Dadmin/compta/banques/?a=commitdiff_plain;h=153e741bc3ca92b9fee13664e195b57352f2d28c;p=lhc%2Fweb%2Fwiklou.git Add special case handling of the XHTML character entity "'" to normalizeEntity() and decodeEntity(). This should resolve the remainder of bug 14365. It might seem cleaner to just add the appropriate entry to $wgHtmlEntityAliases, but this would break decodeEntity() as currently written. Explicitly note this in the comments. --- diff --git a/RELEASE-NOTES b/RELEASE-NOTES index a5bc0cd6f1..2ccaa04394 100644 --- a/RELEASE-NOTES +++ b/RELEASE-NOTES @@ -466,6 +466,8 @@ The following extensions are migrated into MediaWiki 1.14: local URLs * (bug 16376) Mention in deleteBatch.php and moveBatch.php maintenance scripts that STDIN can be used for page list +* Sanitizer::decodeCharReferences() now decodes the XHTML "'" character + entity (loosely related to bug 14365) === API changes in 1.14 === diff --git a/includes/Sanitizer.php b/includes/Sanitizer.php index 6caded3d73..e89633f22f 100644 --- a/includes/Sanitizer.php +++ b/includes/Sanitizer.php @@ -59,6 +59,9 @@ define( 'MW_ATTRIBS_REGEX', /** * List of all named character entities defined in HTML 4.01 * http://www.w3.org/TR/html4/sgml/entities.html + * This list does *not* include ', which is part of XHTML + * 1.0 but not HTML 4.01. It is handled as a special case in + * the code. * @private */ global $wgHtmlEntities; @@ -318,6 +321,7 @@ $wgHtmlEntities = array( /** * Character entity aliases accepted by MediaWiki + * XXX: decodeEntity() assumes that all values in this array are valid keys to $wgHtmlEntities */ global $wgHtmlEntityAliases; $wgHtmlEntityAliases = array( @@ -954,7 +958,7 @@ class Sanitizer { * encoded text for an attribute value. * * See http://www.w3.org/TR/REC-xml/#AVNormalize for background, - * but note that we're not returning the value, but are returning + * but note that we are not returning the value, but are returning * XML source fragments that will be slapped into output. * * @param string $text @@ -1032,6 +1036,8 @@ class Sanitizer { return "&{$wgHtmlEntityAliases[$name]};"; } elseif( isset( $wgHtmlEntities[$name] ) ) { return "&$name;"; + } elseif( $name == 'apos' ) { + return "'"; // "'" is valid in XHTML, but not in HTML4 } else { return "&$name;"; } @@ -1133,6 +1139,8 @@ class Sanitizer { } if( isset( $wgHtmlEntities[$name] ) ) { return codepointToUtf8( $wgHtmlEntities[$name] ); + } elseif( $name == 'apos' ) { + return "'"; // "'" is not in $wgHtmlEntities, but it's still valid XHTML } else { return "&$name;"; }