2015-12-21 18:02:55 +01:00
|
|
|
<?php
|
|
|
|
|
|
|
|
final class PhabricatorFulltextIndexEngineExtension
|
|
|
|
extends PhabricatorIndexEngineExtension {
|
|
|
|
|
|
|
|
const EXTENSIONKEY = 'fulltext';
|
|
|
|
|
|
|
|
public function getExtensionName() {
|
|
|
|
return pht('Fulltext Engine');
|
|
|
|
}
|
|
|
|
|
Allow index extensions to skip indexing if the object has not changed
Summary:
Fixes T9890. This allows IndexExtensions to emit an object version.
Before we build indexes, we check if the indexed version is the same as the current version. If it is, we just don't call that extension.
T9890 has a case where this is useful: a script went crazy and posted thousands of comments to a single task.
Without versioning, that results in the same comments being indexed over and over again. With versioning, most of the queue could just exit without doing any work.
Test Plan:
- Added a `sleep(1)` to the actual indexing, used `bin/search index --background` to queue up a lot of tasks, ran them with `bin/phd debug task`, saw them complete very quickly with only one actual index operation performed.
- Used `bin/search index --trace` and `bin/search index --trace --background` to observe the behavior of queries against the index version store, which looked sensible.
- Made comments/transactions, saw versions update.
- Used `bin/remove destroy`, verified index versions were purged.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9890
Differential Revision: https://secure.phabricator.com/D14845
2015-12-21 20:04:08 +01:00
|
|
|
public function getIndexVersion($object) {
|
|
|
|
$version = array();
|
|
|
|
|
|
|
|
if ($object instanceof PhabricatorApplicationTransactionInterface) {
|
|
|
|
// If this is a normal object with transactions, we only need to
|
|
|
|
// reindex it if there are new transactions (or comment edits).
|
|
|
|
$version[] = $this->getTransactionVersion($object);
|
|
|
|
$version[] = $this->getCommentVersion($object);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!$version) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
return implode(':', $version);
|
|
|
|
}
|
|
|
|
|
2015-12-21 18:02:55 +01:00
|
|
|
public function shouldIndexObject($object) {
|
|
|
|
return ($object instanceof PhabricatorFulltextInterface);
|
|
|
|
}
|
|
|
|
|
|
|
|
public function indexObject(
|
|
|
|
PhabricatorIndexEngine $engine,
|
|
|
|
$object) {
|
|
|
|
|
|
|
|
$engine = $object->newFulltextEngine();
|
|
|
|
if (!$engine) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
$engine->setObject($object);
|
|
|
|
|
|
|
|
$engine->buildFulltextIndexes();
|
|
|
|
}
|
|
|
|
|
Allow index extensions to skip indexing if the object has not changed
Summary:
Fixes T9890. This allows IndexExtensions to emit an object version.
Before we build indexes, we check if the indexed version is the same as the current version. If it is, we just don't call that extension.
T9890 has a case where this is useful: a script went crazy and posted thousands of comments to a single task.
Without versioning, that results in the same comments being indexed over and over again. With versioning, most of the queue could just exit without doing any work.
Test Plan:
- Added a `sleep(1)` to the actual indexing, used `bin/search index --background` to queue up a lot of tasks, ran them with `bin/phd debug task`, saw them complete very quickly with only one actual index operation performed.
- Used `bin/search index --trace` and `bin/search index --trace --background` to observe the behavior of queries against the index version store, which looked sensible.
- Made comments/transactions, saw versions update.
- Used `bin/remove destroy`, verified index versions were purged.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9890
Differential Revision: https://secure.phabricator.com/D14845
2015-12-21 20:04:08 +01:00
|
|
|
private function getTransactionVersion($object) {
|
|
|
|
$xaction = $object->getApplicationTransactionTemplate();
|
|
|
|
|
|
|
|
$xaction_row = queryfx_one(
|
|
|
|
$xaction->establishConnection('r'),
|
|
|
|
'SELECT id FROM %T WHERE objectPHID = %s
|
|
|
|
ORDER BY id DESC LIMIT 1',
|
|
|
|
$xaction->getTableName(),
|
|
|
|
$object->getPHID());
|
|
|
|
if (!$xaction_row) {
|
|
|
|
return 'none';
|
|
|
|
}
|
|
|
|
|
|
|
|
return $xaction_row['id'];
|
|
|
|
}
|
|
|
|
|
|
|
|
private function getCommentVersion($object) {
|
|
|
|
$xaction = $object->getApplicationTransactionTemplate();
|
|
|
|
|
|
|
|
try {
|
|
|
|
$comment = $xaction->getApplicationTransactionCommentObject();
|
Implement basic ngram search for Owners Package names
Summary:
Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:
```
< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>
```
When the user searches for `exam`, we join this table for packages with tokens `exa` and `xam`. MySQL can do this a lot more efficiently than it can process a `LIKE "%exam%"` query against a huge table.
When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.
Test Plan:
- Ran storage upgrades and search indexer.
- Searched for stuff with "name contains".
- Used typehaead and got sensible results.
- Searched for `aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz` and saw only 16 joins.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9979
Differential Revision: https://secure.phabricator.com/D14846
2015-12-21 21:22:07 +01:00
|
|
|
if (!$comment) {
|
|
|
|
return 'none';
|
|
|
|
}
|
Allow index extensions to skip indexing if the object has not changed
Summary:
Fixes T9890. This allows IndexExtensions to emit an object version.
Before we build indexes, we check if the indexed version is the same as the current version. If it is, we just don't call that extension.
T9890 has a case where this is useful: a script went crazy and posted thousands of comments to a single task.
Without versioning, that results in the same comments being indexed over and over again. With versioning, most of the queue could just exit without doing any work.
Test Plan:
- Added a `sleep(1)` to the actual indexing, used `bin/search index --background` to queue up a lot of tasks, ran them with `bin/phd debug task`, saw them complete very quickly with only one actual index operation performed.
- Used `bin/search index --trace` and `bin/search index --trace --background` to observe the behavior of queries against the index version store, which looked sensible.
- Made comments/transactions, saw versions update.
- Used `bin/remove destroy`, verified index versions were purged.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9890
Differential Revision: https://secure.phabricator.com/D14845
2015-12-21 20:04:08 +01:00
|
|
|
} catch (Exception $ex) {
|
|
|
|
return 'none';
|
|
|
|
}
|
|
|
|
|
|
|
|
$comment_row = queryfx_one(
|
|
|
|
$comment->establishConnection('r'),
|
|
|
|
'SELECT c.id FROM %T x JOIN %T c
|
|
|
|
ON x.phid = c.transactionPHID
|
|
|
|
WHERE x.objectPHID = %s
|
|
|
|
ORDER BY c.id DESC LIMIT 1',
|
|
|
|
$xaction->getTableName(),
|
|
|
|
$comment->getTableName(),
|
|
|
|
$object->getPHID());
|
|
|
|
if (!$comment_row) {
|
|
|
|
return 'none';
|
|
|
|
}
|
|
|
|
|
|
|
|
return $comment_row['id'];
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-12-21 18:02:55 +01:00
|
|
|
}
|