Protect legacy URL parameter syntax in link and alt options
HTML doesn't allow certain semicolon-less HTML entities in attribute
values to avoid breaking legacy markup like:
<a href="http://example.com?foo¶m=bar">...</a>
(Note that the & in that URL is not properly entity-escaped as `&`.)
Unlike wikitext, HTML generally allows semicolon-less legacy entities
in text.
Our alt and link option processing shove text through
Sanitizer::stripAllTags, which does entity decoding including these
legacy semicolon-less entities. Wikitext doesn't allow semicolon-less
entities, so escape & characters where appropriate to protect alt/link
options and avoid breaking URLs.
This was a "regression" in how alt options were handled starting in
ddb4913f53624c8ee0a2a91bd44bf750e378569d when we switched to using
Remex for Sanitizer::stripAllTags -- semicolon-less entities (previously
invalid in wikitext) were now being decoded when stripAllTags was
called on alt text. This change became a problem when
ad80f0bca27c2b0905b2b137977586bfab80db34 sent link option text through
Sanitizer::stripAllTags (with the new semicolon-less entity decode)
instead of PHP's strip_tags (which, in addition to its other faults,
doesn't do entity decode at all). This suddenly started decoding
"non-wikitext" entities like `¶` inside URLs, breaking links.
Filed T210437 as a follow-up to consider changing the behavior
of Sanitizer::stripAllTags() globally to prevent it from decoding
semicolon-less entities for all callers.
Bug: T209236
Change-Id: I5925e110e335d83eafa9de935c4e06806322f4a9