From: Thiemo Kreuz Date: Mon, 13 May 2019 09:28:30 +0000 (+0200) Subject: title: Convert binary regexp to use Unicode code points X-Git-Tag: 1.34.0-rc.0~1688^2 X-Git-Url: http://git.cyclocoop.org/%22%20.%20generer_url_ecrire%28%22suivi_revisions%22%29%20.%20%22?a=commitdiff_plain;h=c00c1f0b21ec4b07ab317e410945fb9cd336317a;p=lhc%2Fweb%2Fwiklou.git title: Convert binary regexp to use Unicode code points The hex sequences are the raw binary values for the Unicode code points. Now that we have a more modern PHP at hand, we can use \x{FFFF} for Unicode characters. I believe the /S is not needed any more. It "precompiles" the regular expression. But this is a pretty trivial regular expression. Precompiling it is most probably even slower. Change-Id: I49435114b3bc31dcce8aa4e48091d509844a2a07 --- diff --git a/includes/title/MediaWikiTitleCodec.php b/includes/title/MediaWikiTitleCodec.php index 31a022255c..7af0c1e9c5 100644 --- a/includes/title/MediaWikiTitleCodec.php +++ b/includes/title/MediaWikiTitleCodec.php @@ -284,7 +284,7 @@ class MediaWikiTitleCodec implements TitleFormatter, TitleParser { # Strip Unicode bidi override characters. # Sometimes they slip into cut-n-pasted page titles, where the # override chars get included in list displays. - $dbkey = preg_replace( '/\xE2\x80[\x8E\x8F\xAA-\xAE]/S', '', $dbkey ); + $dbkey = preg_replace( '/[\x{200E}\x{200F}\x{202A}-\x{202E}]+/u', '', $dbkey ); # Clean up whitespace # Note: use of the /u option on preg_replace here will cause