1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-14 01:26:13 +01:00
phorge-phorge/src/applications/search/index/PhabricatorSearchDocumentIndexer.php

179 lines
5.2 KiB
PHP
Raw Normal View History

Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
<?php
abstract class PhabricatorSearchDocumentIndexer {
abstract public function getIndexableObject();
abstract protected function buildAbstractDocumentByPHID($phid);
protected function getViewer() {
return PhabricatorUser::getOmnipotentUser();
}
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
public function shouldIndexDocumentByPHID($phid) {
$object = $this->getIndexableObject();
return (phid_get_type($phid) == phid_get_type($object->generatePHID()));
}
public function getIndexIterator() {
$object = $this->getIndexableObject();
return new LiskMigrationIterator($object);
}
protected function loadDocumentByPHID($phid) {
$object = id(new PhabricatorObjectQuery())
->setViewer($this->getViewer())
->withPHIDs(array($phid))
->executeOne();
if (!$object) {
throw new Exception("Unable to load object by phid '{$phid}'!");
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
}
return $object;
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
}
public function indexDocumentByPHID($phid) {
try {
$document = $this->buildAbstractDocumentByPHID($phid);
$object = $this->loadDocumentByPHID($phid);
// Automatically rebuild CustomField indexes if the object uses custom
// fields.
if ($object instanceof PhabricatorCustomFieldInterface) {
$this->indexCustomFields($document, $object);
}
// Automatically rebuild subscriber indexes if the object is subscribable.
if ($object instanceof PhabricatorSubscribableInterface) {
$this->indexSubscribers($document);
}
// Automatically build project relationships
if ($object instanceof PhabricatorProjectInterface) {
$this->indexProjects($document, $object);
}
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
$engine = PhabricatorSearchEngineSelector::newSelector()->newEngine();
try {
$engine->reindexAbstractDocument($document);
} catch (Exception $ex) {
$phid = $document->getPHID();
$class = get_class($engine);
phlog("Unable to index document {$phid} with engine {$class}.");
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
phlog($ex);
}
$this->dispatchDidUpdateIndexEvent($phid, $document);
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
} catch (Exception $ex) {
$class = get_class($this);
phlog("Unable to build document {$phid} with indexer {$class}.");
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
phlog($ex);
}
return $this;
}
protected function newDocument($phid) {
return id(new PhabricatorSearchAbstractDocument())
->setPHID($phid)
->setDocumentType(phid_get_type($phid));
}
protected function indexSubscribers(
PhabricatorSearchAbstractDocument $doc) {
$subscribers = PhabricatorSubscribersQuery::loadSubscribersForPHID(
$doc->getPHID());
$handles = id(new PhabricatorHandleQuery())
->setViewer($this->getViewer())
->withPHIDs($subscribers)
->execute();
foreach ($handles as $phid => $handle) {
$doc->addRelationship(
PhabricatorSearchRelationship::RELATIONSHIP_SUBSCRIBER,
$phid,
$handle->getType(),
$doc->getDocumentModified()); // Bogus timestamp.
}
}
protected function indexProjects(
PhabricatorSearchAbstractDocument $doc,
PhabricatorProjectInterface $object) {
$project_phids = PhabricatorEdgeQuery::loadDestinationPHIDs(
$object->getPHID(),
PhabricatorProjectObjectHasProjectEdgeType::EDGECONST);
if ($project_phids) {
foreach ($project_phids as $project_phid) {
$doc->addRelationship(
PhabricatorSearchRelationship::RELATIONSHIP_PROJECT,
$project_phid,
PhabricatorProjectProjectPHIDType::TYPECONST,
$doc->getDocumentModified()); // Bogus timestamp.
}
}
}
protected function indexTransactions(
PhabricatorSearchAbstractDocument $doc,
PhabricatorApplicationTransactionQuery $query,
array $phids) {
$xactions = id(clone $query)
->setViewer($this->getViewer())
->withObjectPHIDs($phids)
->execute();
foreach ($xactions as $xaction) {
if (!$xaction->hasComment()) {
continue;
}
$comment = $xaction->getComment();
$doc->addField(
PhabricatorSearchField::FIELD_COMMENT,
$comment->getContent());
}
}
protected function indexCustomFields(
PhabricatorSearchAbstractDocument $document,
PhabricatorCustomFieldInterface $object) {
// Rebuild the ApplicationSearch indexes. These are internal and not part of
// the fulltext search, but putting them in this workflow allows users to
// use the same tools to rebuild the indexes, which is easy to understand.
$field_list = PhabricatorCustomField::getObjectFields(
$object,
PhabricatorCustomField::ROLE_DEFAULT);
$field_list->setViewer($this->getViewer());
$field_list->readFieldsFromStorage($object);
// Rebuild ApplicationSearch indexes.
$field_list->rebuildIndexes($object);
// Rebuild global search indexes.
$field_list->updateAbstractDocument($document);
}
private function dispatchDidUpdateIndexEvent(
$phid,
PhabricatorSearchAbstractDocument $document) {
$event = new PhabricatorEvent(
PhabricatorEventType::TYPE_SEARCH_DIDUPDATEINDEX,
array(
'phid' => $phid,
'object' => $this->loadDocumentByPHID($phid),
'document' => $document,
));
$event->setUser($this->getViewer());
PhutilEventEngine::dispatchEvent($event);
}
Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261
2012-12-21 23:21:31 +01:00
}