PostgreSQL: Improve speed for page edit in imports
Whenever a new revision is added, a deferred update gets enqueued.
When it is fired, it clears the searchable text from all earlier
revisions for the article. This becomes very slow for articles
with long revision histories, as it re-clears the textvector even
when it has already been cleared by earlier actions. This leads to
very high load in the database for runs of importDump.php
This patch improves this situation by adding a condition to the WHERE
clause such that it does not update rows in which the textvector
is already NULL.
PostgreSQL cannot automatically remove such degenerate updates
in general because the updated rows must be locked and have their
transaction markers increased. However, in this particular case
those things are unimportant.
This change improves the performance of importDump.php on a
wiki with long revision histories by 7 fold, and moves the
major bottleneck from the database to PHP. It might also
improve the performance of ordinary page edits, but that
was not tested.
There are more improvements that could be made here. For example,
a partial index or expression index could make it so that already
cleared rows do not have to be visited at all. Or the deferred
update mechanism could have a notion of "idempotency" so that many
indentical updates enqueued during bulk loading would be condensed
to only a single execution. However, this change is very much
simpler and is sufficient to shift the bottleneck elsewhere.
Change-Id: I458603767c6a86425010d02ffc1f8079c4b3c9a0