Provisionally reverting r80430 (bug 529, 12974)
authorBrion Vibber <brion@users.mediawiki.org>
Wed, 26 Jan 2011 01:16:18 +0000 (01:16 +0000)
committerBrion Vibber <brion@users.mediawiki.org>
Wed, 26 Jan 2011 01:16:18 +0000 (01:16 +0000)
As noted in CR there are some issues where we're trading one set of weird behavior for a different, but previously unknown, set of weird behavior which breaks existing markup that works around the old behavior.
I'd recommend keeping this in store for after the 1.17 stuff calms down so unexpected parser changes aren't cropping up in the middle of things for people working with trunk.

If these are the right changes to make then great -- but they should be done after the consequences are better understood and folks can prepare for changes.

RELEASE-NOTES
includes/parser/Parser.php
tests/parser/parserTests.txt

index d571625..62ea8b9 100644 (file)
@@ -91,11 +91,6 @@ it from source control: http://www.mediawiki.org/wiki/Download_from_SVN
   members.
 * (bug 2585) Image pages should send 404 if no image, no shared image and no
   description page.
-* (bug 12974) The behaviour of wikitable and template parsing was altered.  Previously,
-  all templates were treated as being at the start of a block context for the purposes
-  of evaluating wikitext elements like #, :, *, and tables (bug 529).  Now this 
-  behaviour is only applied to wikitable-start {|, but the first line of a wikitable
-  cell is now treated as a linestart.
 * Custom editintro's using the editintro url parameter will no longer show <noinclude>
   sections on pages they are included on.
 * (bug 26449) Keep underlines from headings outside of tables and thumbs by
index 3f04d31..89bd76d 100644 (file)
@@ -814,22 +814,7 @@ class Parser {
                $has_opened_tr = array(); # Did this table open a <tr> element?
                $indent_level = 0; # indent level of the table
 
-               # Keep pulling lines off the front of the array until they're all gone.
-               # we want to be able to push lines back on to the front of the stream,
-               # but StringUtils::explode() returns funky optimised Iterators which don't
-               # support insertion.  So maintain a separate buffer and draw on that first if
-               # there's anything in it
-               $extraLines = array();
-               $lines->rewind();
-               do {
-                       if( $extraLines ){
-                               $outLine = array_shift( $extraLines );
-                       } elseif( $lines->valid() ) {
-                               $outLine = $lines->current();
-                               $lines->next();
-                       } else {
-                               break;
-                       }
+               foreach ( $lines as $outLine ) {
                        $line = trim( $outLine );
 
                        if ( $line === '' ) { # empty line, go to next line
@@ -905,10 +890,11 @@ class Parser {
                        } elseif ( $first_character === '|' || $first_character === '!' || substr( $line , 0 , 2 )  === '|+' ) {
                                # This might be cell elements, td, th or captions
                                if ( substr( $line , 0 , 2 ) === '|+' ) {
-                                       $first_character = '|+';
+                                       $first_character = '+';
+                                       $line = substr( $line , 1 );
                                }
 
-                               $line = substr( $line , strlen( $first_character ) );
+                               $line = substr( $line , 1 );
 
                                if ( $first_character === '!' ) {
                                        $line = str_replace( '!!' , '||' , $line );
@@ -919,84 +905,62 @@ class Parser {
                                # by earlier parser steps, but should avoid splitting up eg
                                # attribute values containing literal "||".
                                $cells = StringUtils::explodeMarkup( '||' , $line );
-                               $cell = array_shift( $cells );
-
-                               # Inject cells back into the stream to be dealt with later
-                               # TODO: really we should do the whole thing as a stream...
-                               # but that would be too much like a sensible implementation :P
-                               if( count( $cells ) ){
-                                       foreach( array_reverse( $cells ) as $extraCell ){
-                                               array_unshift( $extraLines, $first_character . $extraCell );
-                                       }
-                               }
 
                                $outLine = '';
 
-                               $previous = '';
-                               if ( $first_character !== '|+' ) {
-                                       $tr_after = array_pop( $tr_attributes );
-                                       if ( !array_pop( $tr_history ) ) {
-                                               $previous = "<tr{$tr_after}>\n";
+                               # Loop through each table cell
+                               foreach ( $cells as $cell ) {
+                                       $previous = '';
+                                       if ( $first_character !== '+' ) {
+                                               $tr_after = array_pop( $tr_attributes );
+                                               if ( !array_pop( $tr_history ) ) {
+                                                       $previous = "<tr{$tr_after}>\n";
+                                               }
+                                               array_push( $tr_history , true );
+                                               array_push( $tr_attributes , '' );
+                                               array_pop( $has_opened_tr );
+                                               array_push( $has_opened_tr , true );
                                        }
-                                       array_push( $tr_history , true );
-                                       array_push( $tr_attributes , '' );
-                                       array_pop( $has_opened_tr );
-                                       array_push( $has_opened_tr , true );
-                               }
 
-                               $last_tag = array_pop( $last_tag_history );
+                                       $last_tag = array_pop( $last_tag_history );
 
-                               if ( array_pop( $td_history ) ) {
-                                       $previous = "</{$last_tag}>\n{$previous}";
-                               }
+                                       if ( array_pop( $td_history ) ) {
+                                               $previous = "</{$last_tag}>\n{$previous}";
+                                       }
 
-                               if ( $first_character === '|' ) {
-                                       $last_tag = 'td';
-                               } elseif ( $first_character === '!' ) {
-                                       $last_tag = 'th';
-                               } elseif ( $first_character === '|+' ) {
-                                       $last_tag = 'caption';
-                               } else {
-                                       $last_tag = '';
-                               }
+                                       if ( $first_character === '|' ) {
+                                               $last_tag = 'td';
+                                       } elseif ( $first_character === '!' ) {
+                                               $last_tag = 'th';
+                                       } elseif ( $first_character === '+' ) {
+                                               $last_tag = 'caption';
+                                       } else {
+                                               $last_tag = '';
+                                       }
 
-                               array_push( $last_tag_history , $last_tag );
-
-                               # A cell could contain both parameters and data... but the pipe could
-                               # also be the start of a nested table, or a raw pipe inside an invalid
-                               # link (bug 553).  
-                               $cell_data = preg_split( '/(?<!\{)\|/', $cell, 2 );
-
-                               # Bug 553: a '|' inside an invalid link should not
-                               # be mistaken as delimiting cell parameters
-                               if ( strpos( $cell_data[0], '[[' ) !== false ) {
-                                       $data = $cell;
-                                       $cell = "{$previous}<{$last_tag}>";
-                               } elseif ( count( $cell_data ) == 1 ) {
-                                       $cell = "{$previous}<{$last_tag}>";
-                                       $data = $cell_data[0];
-                               } else {
-                                       $attributes = $this->mStripState->unstripBoth( $cell_data[0] );
-                                       $attributes = Sanitizer::fixTagAttributes( $attributes , $last_tag );
-                                       $cell = "{$previous}<{$last_tag}{$attributes}>";
-                                       $data = $cell_data[1];
-                               }
+                                       array_push( $last_tag_history , $last_tag );
 
-                               # Bug 529: the start of a table cell should be a linestart context for
-                               # processing other block markup, including nested tables.  The original
-                               # implementation of this was to add a newline before every brace construct,
-                               # which broke all manner of other things.  Instead, push the contents
-                               # of the cell back into the stream and come back to it later.  But don't
-                               # do that if the first line is empty, or you may get extra whitespace
-                               if( $data ){
-                                       array_unshift( $extraLines, trim( $data ) );
-                               }
+                                       # A cell could contain both parameters and data
+                                       $cell_data = explode( '|' , $cell , 2 );
 
-                               $outLine .= $cell;
-                               array_push( $td_history , true );
+                                       # Bug 553: Note that a '|' inside an invalid link should not
+                                       # be mistaken as delimiting cell parameters
+                                       if ( strpos( $cell_data[0], '[[' ) !== false ) {
+                                               $cell = "{$previous}<{$last_tag}>{$cell}";
+                                       } elseif ( count( $cell_data ) == 1 ) {
+                                               $cell = "{$previous}<{$last_tag}>{$cell_data[0]}";
+                                       } else {
+                                               $attributes = $this->mStripState->unstripBoth( $cell_data[0] );
+                                               $attributes = Sanitizer::fixTagAttributes( $attributes , $last_tag );
+                                               $cell = "{$previous}<{$last_tag}{$attributes}>{$cell_data[1]}";
+                                       }
+
+                                       $outLine .= $cell;
+                                       array_push( $td_history , true );
+                               }
                        }
                        $out .= $outLine . "\n";
-               } while( $lines->valid() || count( $extraLines ) );
+               }
 
                # Closing open td, tr && table
                while ( count( $td_history ) > 0 ) {
@@ -2260,7 +2224,6 @@ class Parser {
                                        '/(?:<\\/table|<\\/blockquote|<\\/h1|<\\/h2|<\\/h3|<\\/h4|<\\/h5|<\\/h6|'.
                                        '<td|<th|<\\/?div|<hr|<\\/pre|<\\/p|'.$this->mUniqPrefix.'-pre|<\\/li|<\\/ul|<\\/ol|<\\/?center)/iS', $t );
                                if ( $openmatch or $closematch ) {
-
                                        $paragraphStack = false;
                                        # TODO bug 5718: paragraph closed
                                        $output .= $this->closeParagraph();
@@ -3251,11 +3214,11 @@ class Parser {
                        $text = wfEscapeWikiText( $text );
                } elseif ( is_string( $text )
                        && !$piece['lineStart']
-                       && preg_match( '/^{\\|/', $text ) )
+                       && preg_match( '/^(?:{\\||:|;|#|\*)/', $text ) )
                {
-                       # Bug 529: if the template begins with a table, it should be treated as
-                       # beginning a new line.  This previously handled other block-level elements
-                       # such as #, :, etc, but these have many false-positives (bug 12974).
+                       # Bug 529: if the template begins with a table or block-level
+                       # element, it should be treated as beginning a new line.
+                       # This behaviour is somewhat controversial.
                        $text = "\n" . $text;
                }
 
@@ -3314,7 +3277,7 @@ class Parser {
 
                if ( !$title->equals( $cacheTitle ) ) {
                        $this->mTplRedirCache[$cacheTitle->getPrefixedDBkey()] =
-                               array( $title->getNamespace(), $title->getDBkey() );
+                               array( $title->getNamespace(), $cdb = $title->getDBkey() );
                }
 
                return array( $dom, $title );
index 20e1c77..b13339c 100644 (file)
@@ -1210,8 +1210,7 @@ A table with nothing but a caption
 |}
 !! result
 <table>
-<caption>
-caption
+<caption> caption
 </caption><tr><td></td></tr></table>
 
 !! end
@@ -1227,22 +1226,12 @@ Simple table
 !! result
 <table>
 <tr>
-<td>
-<p>1
-</p>
-</td>
-<td>
-<p>2
-</p>
+<td> 1 </td>
+<td> 2
 </td></tr>
 <tr>
-<td>
-<p>3
-</p>
-</td>
-<td>
-<p>4
-</p>
+<td> 3 </td>
+<td> 4
 </td></tr></table>
 
 !! end
@@ -1272,110 +1261,48 @@ Multiplication table
 |}
 !! result
 <table border="1" cellpadding="2">
-<caption>
-Multiplication table
+<caption>Multiplication table
 </caption>
 <tr>
-<th>
-<p>&times;
-</p>
-</th>
-<th>
-<p>1
-</p>
-</th>
-<th>
-<p>2
-</p>
-</th>
-<th>
-<p>3
-</p>
+<th> &times; </th>
+<th> 1 </th>
+<th> 2 </th>
+<th> 3
 </th></tr>
 <tr>
-<th>
-<p>1
-</p>
+<th> 1
 </th>
-<td>
-<p>1
-</p>
-</td>
-<td>
-<p>2
-</p>
-</td>
-<td>
-<p>3
-</p>
+<td> 1 </td>
+<td> 2 </td>
+<td> 3
 </td></tr>
 <tr>
-<th>
-<p>2
-</p>
+<th> 2
 </th>
-<td>
-<p>2
-</p>
-</td>
-<td>
-<p>4
-</p>
-</td>
-<td>
-<p>6
-</p>
+<td> 2 </td>
+<td> 4 </td>
+<td> 6
 </td></tr>
 <tr>
-<th>
-<p>3
-</p>
+<th> 3
 </th>
-<td>
-<p>3
-</p>
-</td>
-<td>
-<p>6
-</p>
-</td>
-<td>
-<p>9
-</p>
+<td> 3 </td>
+<td> 6 </td>
+<td> 9
 </td></tr>
 <tr>
-<th>
-<p>4
-</p>
+<th> 4
 </th>
-<td>
-<p>4
-</p>
-</td>
-<td>
-<p>8
-</p>
-</td>
-<td>
-<p>12
-</p>
+<td> 4 </td>
+<td> 8 </td>
+<td> 12
 </td></tr>
 <tr>
-<th>
-<p>5
-</p>
+<th> 5
 </th>
-<td>
-<p>5
-</p>
-</td>
-<td>
-<p>10
-</p>
-</td>
-<td>
-<p>15
-</p>
+<td> 5 </td>
+<td> 10 </td>
+<td> 15
 </td></tr></table>
 
 !! end
@@ -1394,26 +1321,16 @@ Table rowspan
 !! result
 <table align="right" border="1">
 <tr>
-<td>
-<p>Cell 1, row 1
-</p>
+<td> Cell 1, row 1
 </td>
-<td rowspan="2">
-<p>Cell 2, row 1 (and 2)
-</p>
+<td rowspan="2"> Cell 2, row 1 (and 2)
 </td>
-<td>
-<p>Cell 3, row 1
-</p>
+<td> Cell 3, row 1
 </td></tr>
 <tr>
-<td>
-<p>Cell 1, row 2
-</p>
+<td> Cell 1, row 2
 </td>
-<td>
-<p>Cell 3, row 2
-</p>
+<td> Cell 3, row 2
 </td></tr></table>
 
 !! end
@@ -1434,26 +1351,18 @@ Nested table
 !! result
 <table border="1">
 <tr>
-<td>
-<p>&alpha;
-</p>
+<td> &alpha;
 </td>
 <td>
 <table bgcolor="#ABCDEF" border="2">
 <tr>
-<td>
-<p>nested
-</p>
+<td>nested
 </td></tr>
 <tr>
-<td>
-<p>table
-</p>
+<td>table
 </td></tr></table>
 </td>
-<td>
-<p>the original table again
-</p>
+<td>the original table again
 </td></tr></table>
 
 !! end
@@ -1467,9 +1376,7 @@ Invalid attributes in table cell (bug 1830)
 !! result
 <table>
 <tr>
-<td>
-<p>broken
-</p>
+<td>broken
 </td></tr></table>
 
 !! end
@@ -1483,13 +1390,8 @@ Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitec
 !! result
 <table>
 <tr>
-<td>
-<p>[<a href="ftp://%7Cx" class="external free" rel="nofollow">ftp://%7Cx</a>
-</p>
-</td>
-<td>
-<p>]" onmouseover="alert(document.cookie)"&gt;test
-</p>
+<td>[<a href="ftp://%7Cx" class="external free" rel="nofollow">ftp://%7Cx</a></td>
+<td>]" onmouseover="alert(document.cookie)"&gt;test
 </td>
 </tr>
 </table>
@@ -2702,9 +2604,7 @@ BUG 553: link with two variables in a piped link
 !! result
 <table>
 <tr>
-<td>
-<p>[[{{{1}}}|{{{2}}}]]
-</p>
+<td>[[{{{1}}}|{{{2}}}]]
 </td></tr></table>
 
 !! end
@@ -2814,22 +2714,12 @@ foo {{table}}
 </p>
 <table>
 <tr>
-<td>
-<p>1
-</p>
-</td>
-<td>
-<p>2
-</p>
+<td> 1 </td>
+<td> 2
 </td></tr>
 <tr>
-<td>
-<p>3
-</p>
-</td>
-<td>
-<p>4
-</p>
+<td> 3 </td>
+<td> 4
 </td></tr></table>
 
 !! end
@@ -2844,22 +2734,12 @@ foo
 </p>
 <table>
 <tr>
-<td>
-<p>1
-</p>
-</td>
-<td>
-<p>2
-</p>
+<td> 1 </td>
+<td> 2
 </td></tr>
 <tr>
-<td>
-<p>3
-</p>
-</td>
-<td>
-<p>4
-</p>
+<td> 3 </td>
+<td> 4
 </td></tr></table>
 
 !! end
@@ -4440,9 +4320,7 @@ Table multiple attributes correction
 !! result
 <table>
 <tr>
-<th class="awesome">
-<p>status
-</p>
+<th class="awesome"> status
 </th></tr></table>
 
 !!end
@@ -4896,9 +4774,7 @@ Table attribute legitimate extension
 !! result
 <table>
 <tr>
-<th style="color:blue">
-<p>status
-</p>
+<th style="color:blue"> status
 </th></tr></table>
 
 !!end
@@ -4912,9 +4788,7 @@ Table attribute safety
 !! result
 <table>
 <tr>
-<th style="/* insecure input */">
-<p>status
-</p>
+<th style="/* insecure input */"> status
 </th></tr></table>
 
 !! end
@@ -5583,14 +5457,9 @@ noxml
 !! result
 <table>
 <tr>
-<th>
-<p>https://
-</p>
-</th>
-<th>
-</th>
-<th>
-</th>
+<th>https://</th>
+<th></th>
+<th></th>
 <th>
 </td>
 </tr>
@@ -5607,9 +5476,7 @@ Fuzz testing: Parser21
 !! result
 <table>
 <tr>
-<th>
-<p><a href="irc://{{ftp://a" class="external free" rel="nofollow">irc://{{ftp://a</a>" onmouseover="alert('hello world');"
-</p>
+<th> <a href="irc://{{ftp://a" class="external free" rel="nofollow">irc://{{ftp://a</a>" onmouseover="alert('hello world');"
 </th>
 <td>
 </td>
@@ -5655,9 +5522,7 @@ MOVE YOUR MOUSE CURSOR OVER THIS TEXT
 
 MOVE YOUR MOUSE CURSOR OVER THIS TEXT
 <tr>
-<td>
-<p></u>
-</p>
+<td></u>
 </td>
 </tr>
 </table>
@@ -7772,17 +7637,18 @@ Template:Bullet
 !!endarticle
 
 !! test
-Bug 529/12974: Uncovered bullet
+Bug 529: Uncovered bullet
 !! input
 * Foo {{bullet}}
 !! result
-<ul><li> Foo * Bar
+<ul><li> Foo 
+</li><li> Bar
 </li></ul>
 
 !! end
 
 !! test
-Bug 529/12974: Uncovered table already at line-start
+Bug 529: Uncovered table already at line-start
 !! input
 x
 
 </p>
 <table>
 <tr>
-<td>
-<p>1
-</p>
-</td>
-<td>
-<p>2
-</p>
+<td> 1 </td>
+<td> 2
 </td></tr>
 <tr>
-<td>
-<p>3
-</p>
-</td>
-<td>
-<p>4
-</p>
+<td> 3 </td>
+<td> 4
 </td></tr></table>
 <p>y
 </p>
 !! end
 
 !! test
-Bug 529/12974: Uncovered bullet in parser function result
+Bug 529: Uncovered bullet in parser function result
 !! input
 * Foo {{lc:{{bullet}} }}
 !! result
-<ul><li> Foo * bar
+<ul><li> Foo 
+</li><li> bar
 </li></ul>
 
 !! end