1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-10 08:52:39 +01:00

Improve efficiency of worker task GC for huge loads

Summary:
Fixes T9808.

An instance imported a very large repository, generating approximately 4 million tasks over the course of a few days. A week later, these tasks started expiring and became candidates for garbage collection.

The GC works by deleting 100 rows at at time over and over again. It finds the rows it's going to delete by querying for old rows.

Currently, this query generates a `WHERE dateCreated < X ORDER BY id DESC` query. This query can not efficiently execute using a single key, because it relies on `dateCreated` order to find the rows, then on `id` order to sort them. With a table with 4M rows, this is slow.

This would still be OK, except that the query has to execute a lot of times since it only deletes 100 rows each time. Particularly, it needs to execute a total of ~40K times.

Instead, generate `WHERE dateCreated < X ORDER BY dateCreated DESC, id DESC`. This should have the same effect in general and the GC definitely doesn't care about the difference, but it should be more efficient at large scales.

Test Plan:
I had to `TRUNCATE` the problem table so I don't have a perfect repro to completely convincingly test this anymore. Both queries behave fine at small scales, which is why we haven't seen this before.

I was able to run the newer query in production before I nuked the table and have it complete in a reasonable amount of time, while the old query hung longer than I wanted to wait (several minutes?). The query plan for the new query was also a good one, while the query plan for the old query was terrible.

I loaded the daemon console and ran `bin/garbage collect --collector worker.tasks --trace`. I verified the queries looked reasonable and produced reasonable results in production.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9808

Differential Revision: https://secure.phabricator.com/D14505
This commit is contained in:
epriestley 2015-11-17 16:47:58 -08:00
parent 59c5cd95e7
commit 2e09a93dc1

View file

@ -41,9 +41,10 @@ final class PhabricatorWorkerArchiveTaskQuery
$rows = queryfx_all(
$conn_r,
'SELECT * FROM %T %Q ORDER BY id DESC %Q',
'SELECT * FROM %T %Q %Q %Q',
$task_table->getTableName(),
$this->buildWhereClause($conn_r),
$this->buildOrderClause($conn_r),
$this->buildLimitClause($conn_r));
return $task_table->loadAllFromArray($rows);
@ -83,6 +84,18 @@ final class PhabricatorWorkerArchiveTaskQuery
return $this->formatWhereClause($where);
}
private function buildOrderClause(AphrontDatabaseConnection $conn_r) {
// NOTE: The garbage collector executes this query with a date constraint,
// and the query is inefficient if we don't use the same key for ordering.
// See T9808 for discussion.
if ($this->dateCreatedBefore) {
return qsprintf($conn_r, 'ORDER BY dateCreated DESC, id DESC');
} else {
return qsprintf($conn_r, 'ORDER BY id DESC');
}
}
private function buildLimitClause(AphrontDatabaseConnection $conn_r) {
$clause = '';
if ($this->limit) {