Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
<?php
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @group search
|
|
|
|
*/
|
|
|
|
abstract class PhabricatorSearchDocumentIndexer {
|
|
|
|
|
|
|
|
abstract public function getIndexableObject();
|
|
|
|
abstract protected function buildAbstractDocumentByPHID($phid);
|
|
|
|
|
2013-07-29 17:20:06 +02:00
|
|
|
protected function getViewer() {
|
|
|
|
return PhabricatorUser::getOmnipotentUser();
|
|
|
|
}
|
|
|
|
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
public function shouldIndexDocumentByPHID($phid) {
|
|
|
|
$object = $this->getIndexableObject();
|
|
|
|
return (phid_get_type($phid) == phid_get_type($object->generatePHID()));
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getIndexIterator() {
|
|
|
|
$object = $this->getIndexableObject();
|
|
|
|
return new LiskMigrationIterator($object);
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function loadDocumentByPHID($phid) {
|
2013-07-29 17:20:06 +02:00
|
|
|
$object = id(new PhabricatorObjectQuery())
|
|
|
|
->setViewer($this->getViewer())
|
|
|
|
->withPHIDs(array($phid))
|
|
|
|
->executeOne();
|
|
|
|
if (!$object) {
|
|
|
|
throw new Exception("Unable to load object by phid '{$phid}'!");
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
}
|
2013-07-29 17:20:06 +02:00
|
|
|
return $object;
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
public function indexDocumentByPHID($phid) {
|
|
|
|
try {
|
|
|
|
$document = $this->buildAbstractDocumentByPHID($phid);
|
|
|
|
|
|
|
|
$engine = PhabricatorSearchEngineSelector::newSelector()->newEngine();
|
|
|
|
try {
|
|
|
|
$engine->reindexAbstractDocument($document);
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
$phid = $document->getPHID();
|
|
|
|
$class = get_class($engine);
|
|
|
|
|
2013-09-12 22:05:54 +02:00
|
|
|
phlog("Unable to index document {$phid} with engine {$class}.");
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
phlog($ex);
|
|
|
|
}
|
|
|
|
|
2013-09-12 22:05:54 +02:00
|
|
|
$this->dispatchDidUpdateIndexEvent($phid, $document);
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
} catch (Exception $ex) {
|
|
|
|
$class = get_class($this);
|
2013-09-12 22:05:54 +02:00
|
|
|
phlog("Unable to build document {$phid} with indexer {$class}.");
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
phlog($ex);
|
|
|
|
}
|
|
|
|
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
2013-07-29 17:20:06 +02:00
|
|
|
protected function newDocument($phid) {
|
|
|
|
return id(new PhabricatorSearchAbstractDocument())
|
|
|
|
->setPHID($phid)
|
|
|
|
->setDocumentType(phid_get_type($phid));
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function indexSubscribers(
|
|
|
|
PhabricatorSearchAbstractDocument $doc) {
|
|
|
|
|
|
|
|
$subscribers = PhabricatorSubscribersQuery::loadSubscribersForPHID(
|
|
|
|
$doc->getPHID());
|
|
|
|
$handles = id(new PhabricatorHandleQuery())
|
|
|
|
->setViewer($this->getViewer())
|
|
|
|
->withPHIDs($subscribers)
|
|
|
|
->execute();
|
|
|
|
|
|
|
|
foreach ($handles as $phid => $handle) {
|
|
|
|
$doc->addRelationship(
|
|
|
|
PhabricatorSearchRelationship::RELATIONSHIP_SUBSCRIBER,
|
|
|
|
$phid,
|
|
|
|
$handle->getType(),
|
|
|
|
$doc->getDocumentModified()); // Bogus timestamp.
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
protected function indexTransactions(
|
|
|
|
PhabricatorSearchAbstractDocument $doc,
|
|
|
|
PhabricatorApplicationTransactionQuery $query,
|
|
|
|
array $phids) {
|
|
|
|
|
|
|
|
$xactions = id(clone $query)
|
|
|
|
->setViewer($this->getViewer())
|
|
|
|
->withObjectPHIDs($phids)
|
|
|
|
->withTransactionTypes(array(PhabricatorTransactions::TYPE_COMMENT))
|
|
|
|
->execute();
|
|
|
|
|
|
|
|
foreach ($xactions as $xaction) {
|
|
|
|
if (!$xaction->hasComment()) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$comment = $xaction->getComment();
|
|
|
|
$doc->addField(
|
|
|
|
PhabricatorSearchField::FIELD_COMMENT,
|
|
|
|
$comment->getContent());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Rebuild custom field indexes when rebuilding standard/fulltext search indexes
Summary:
Ref T4379. Fixes T4359. Currently, `bin/search index` does not rebuild CustomField indexes. This is because they aren't really part of the main search index. However, from a user's point of view this is by far the most logical place to look for index rebuilds, and it's straightforward for us to write into this secondary store.
At some point, it might be nice to let you specify fields as "fulltext" too, although no one has asked for that yet. We could then dump the text of those fields into the fulltext index. Ref T418.
Test Plan: Used `bin/search index --type proj --trace`, etc., and examination of the database to verify that indexes rebuilt. Reindexed users, tasks, projects.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4359, T418, T4379
Differential Revision: https://secure.phabricator.com/D8185
2014-02-10 23:32:45 +01:00
|
|
|
protected function indexCustomFields(
|
|
|
|
PhabricatorSearchAbstractDocument $doc,
|
|
|
|
PhabricatorCustomFieldInterface $object) {
|
|
|
|
|
|
|
|
// Rebuild the ApplicationSearch indexes. These are internal and not part of
|
|
|
|
// the fulltext search, but putting them in this workflow allows users to
|
|
|
|
// use the same tools to rebuild the indexes, which is easy to understand.
|
|
|
|
|
|
|
|
$field_list = PhabricatorCustomField::getObjectFields(
|
|
|
|
$object,
|
|
|
|
PhabricatorCustomField::ROLE_APPLICATIONSEARCH);
|
|
|
|
|
|
|
|
$field_list->setViewer($this->getViewer());
|
|
|
|
$field_list->readFieldsFromStorage($object);
|
|
|
|
$field_list->rebuildIndexes($object);
|
|
|
|
|
|
|
|
// We could also allow fields to provide fulltext content, and index it
|
|
|
|
// here on the document. No one has asked for this yet, though, and the
|
|
|
|
// existing "search" key isn't a good fit to interpret to mean we should
|
|
|
|
// index stuff here, since it can be set on a lot of fields which don't
|
|
|
|
// contain anything resembling fulltext.
|
|
|
|
}
|
|
|
|
|
2013-09-12 22:05:54 +02:00
|
|
|
private function dispatchDidUpdateIndexEvent(
|
|
|
|
$phid,
|
|
|
|
PhabricatorSearchAbstractDocument $document) {
|
|
|
|
|
|
|
|
$event = new PhabricatorEvent(
|
|
|
|
PhabricatorEventType::TYPE_SEARCH_DIDUPDATEINDEX,
|
|
|
|
array(
|
|
|
|
'phid' => $phid,
|
|
|
|
'object' => $this->loadDocumentByPHID($phid),
|
|
|
|
'document' => $document,
|
|
|
|
));
|
|
|
|
$event->setUser($this->getViewer());
|
|
|
|
PhutilEventEngine::dispatchEvent($event);
|
|
|
|
}
|
|
|
|
|
Improve Search architecture
Summary:
The search indexing API has several problems right now:
- Always runs in-process.
- It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it.
- Being able to use the task queue will also make rebuilding indexes faster.
- Instead, make the API phid-oriented.
- No uniform indexing API.
- Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird.
- Instead, provide a uniform API.
- No uniform CLI.
- We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers.
- Instead, let indexers expose documents for indexing.
- Not application-oriented.
- All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world.
- Instead, move indexers to applications and load them with SymbolLoader.
Test Plan:
- `bin/search index`
- Indexed one revision, one task.
- Indexed `--type TASK`, `--type DREV`, etc., for all types.
- Indexed `--all`.
- Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it.
- Creating users is a pain; searched for a user after indexing.
- Creating commits is a pain; searched for a commit after indexing.
- Mocks aren't currently loadable in the result view, so their indexing is moot.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: 20after4, aran
Maniphest Tasks: T1991, T2104
Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
|
|
|
}
|