From 96c7f7eaa68c8532366bd94bcfc5222196537f1e Mon Sep 17 00:00:00 2001 From: Jeff Date: Sun, 20 Apr 2014 21:31:07 -0700 Subject: [PATCH] PostgreSQL: Improve speed for page edit in imports Whenever a new revision is added, a deferred update gets enqueued. When it is fired, it clears the searchable text from all earlier revisions for the article. This becomes very slow for articles with long revision histories, as it re-clears the textvector even when it has already been cleared by earlier actions. This leads to very high load in the database for runs of importDump.php This patch improves this situation by adding a condition to the WHERE clause such that it does not update rows in which the textvector is already NULL. PostgreSQL cannot automatically remove such degenerate updates in general because the updated rows must be locked and have their transaction markers increased. However, in this particular case those things are unimportant. This change improves the performance of importDump.php on a wiki with long revision histories by 7 fold, and moves the major bottleneck from the database to PHP. It might also improve the performance of ordinary page edits, but that was not tested. There are more improvements that could be made here. For example, a partial index or expression index could make it so that already cleared rows do not have to be visited at all. Or the deferred update mechanism could have a notion of "idempotency" so that many indentical updates enqueued during bulk loading would be condensed to only a single execution. However, this change is very much simpler and is sufficient to shift the bottleneck elsewhere. Change-Id: I458603767c6a86425010d02ffc1f8079c4b3c9a0 --- includes/search/SearchPostgres.php | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/includes/search/SearchPostgres.php b/includes/search/SearchPostgres.php index 4da63b3a28..e4f4b5583b 100644 --- a/includes/search/SearchPostgres.php +++ b/includes/search/SearchPostgres.php @@ -191,7 +191,7 @@ class SearchPostgres extends SearchDatabase { function update( $pageid, $title, $text ) { ## We don't want to index older revisions - $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " . + $sql = "UPDATE pagecontent SET textvector = NULL WHERE textvector IS NOT NULL and old_id IN " . "(SELECT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) . " ORDER BY rev_text_id DESC OFFSET 1)"; $this->db->query( $sql ); -- 2.20.1