phorge-phorge

mirror of https://we.phorge.it/source/phorge.git synced 2024-11-30 18:52:42 +01:00

Author	SHA1	Message	Date
epriestley	d36f98a15a	Clarify acceptable values for `--threshold` in `search ngrams` Summary: See D18710. Test Plan: o_O Reviewers: amckinley Reviewed By: amckinley Differential Revision: https://secure.phabricator.com/D18712	2017-10-17 14:32:25 -07:00
epriestley	63d1230ade	Parameterize the common ngrams threshold Summary: Ref T13000. Since other changes have generally made the ngrams table manageable, I'm not planning to enable common ngrams by default at this time. Instead, make the threshold configurable with "--threshold" so we can guide installs through tuning this if they want (e.g. PHI110), and tune hosted instances. (This might eventually become automatic, but just smoothing this bit off for now feels reasonable to me.) Test Plan: Ran with `--reset`, and with various invalid and valid `--threshold` arguments. Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13000 Differential Revision: https://secure.phabricator.com/D18710	2017-10-17 14:13:49 -07:00
Dmitri Iouchtchenko	9bd6a37055	Fix spelling Summary: Noticed a couple of typos in the docs, and then things got out of hand. Test Plan: - Stared at the words until my eyes watered and the letters began to swim on the screen. - Consulted a dictionary. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam Differential Revision: https://secure.phabricator.com/D18693	2017-10-09 10:48:04 -07:00
epriestley	17e83b53d5	Add "bin/search query" for debugging query execution Summary: Ref T13000. Currently, queries can only be executed from the web UI, which requires logging in as a user. I really want to avoid doing that wherever we can, but being able to execute queries on an instance (and, particularly, see the ngrams and timings on the underlying lookups) would have been helpful in several cases. Improve tooling a bit in advance of the "common ngrams" stuff going out since it seems likely that it will be useful if issues arise. Test Plan: Ran `bin/search query --query ...`, got useful minimal output. Ran with `--trace` to get internals. Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13000 Differential Revision: https://secure.phabricator.com/D18690	2017-10-06 08:50:34 -07:00
epriestley	66df5b1493	Add a garbage collector for common ngrams Summary: Ref T13000. After an ngram is marked as "common", we can delete it from the storage table. Currently, the only way to get ngrams marked as "common" is to manually run `bin/search ngrams`, so this has no impact on normal installs. Test Plan: Ran `bin/garbage collect`, saw it start chewing through my local Maniphest ngrams table and removing common ngrams. Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13000 Differential Revision: https://secure.phabricator.com/D18687	2017-10-05 11:41:18 -07:00
epriestley	3e589cdd73	Add a workflow for populating (or depopulating) the common ngrams table Summary: Depends on D18672. Ref T13000. This does an on-demand build of the common ngrams table. Plan here is: - Push to `secure`. - Build the common ngrams table here. - See if stuff breaks? If it looks okay on this dataset, we can build out the GC support and try it in production. Test Plan: - Locally, my dataset has a bunch of `bin/lipsum` tasks with similar, common words. - Verified that ipsum terms now skip ngrams. For "lorem ipsum" search performance actually IMPROVED by skipping the ngrams table (12s to 9s). - Queried for normal terms, got very fast results using the ngram table, as normal. Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13000 Differential Revision: https://secure.phabricator.com/D18673	2017-10-03 13:28:19 -07:00
epriestley	b46e2bb4cc	Convert cluster/projects config options to newer modular structure Summary: Ref T12845. Converts the cluster and project config options to the new stuff; this is mostly just shifting boilerplate around. Test Plan: Edited, deleted, and mangled these options from the web UI and CLI. Reviewers: chad, amckinley Reviewed By: amckinley Maniphest Tasks: T12845 Differential Revision: https://secure.phabricator.com/D18166	2017-06-27 12:35:54 -07:00
epriestley	6052bc1933	Extend "fulltext" and "ngrams" interfaces from "indexable" interface Summary: Ref T8788. See D17702. This allows `bin/search index` to index stuff which only implements `Ngrams`, not `Fulltext`. Test Plan: Kinda poked around `bin/search index` a bit, yell if you hit more issues deeper down the stack? Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T8788 Differential Revision: https://secure.phabricator.com/D17704	2017-04-17 12:59:41 -07:00
epriestley	bd93978200	Count and report skipped documents from "bin/search index" Summary: Ref T12450. There's currently a bad behavior where inserting a document into one search service marks it as up to date everywhere. This isn't nearly as obvious as it should be because `bin/search index` doesn't make it terribly clear when a document was skipped because the index version was already up to date. When running `bin/seach index` without `--force` or `--background`, keep track of updated vs not-updated documents and print out some guidance. In other configurations, try to provide more help too. Test Plan: {F4452134} Reviewers: chad, 20after4 Reviewed By: 20after4 Maniphest Tasks: T12450 Differential Revision: https://secure.phabricator.com/D17597	2017-04-02 13:45:30 -07:00
epriestley	5f939dcce0	Re-run config validation from `bin/search` Summary: Ref T12450. Normally, we validate config when: - You restart the webserver. - You edit it with `bin/config set ...`. - You edit it with the web UI. However, you can also change config by editing `local.json`, `some_env.conf.php`, a `SiteConfig` class, etc. In these cases, you may miss config warnings. Explicitly re-run search config checks from `bin/search`, similar to the additional database checks we run from `bin/storage`, to try to produce a better error message if the user has made a configuration error. Test Plan: ``` $ ./bin/search init Usage Exception: Setting "cluster.search" is misconfigured: Invalid search engine type: elastic. Valid types are: elasticsearch, mysql. ``` Reviewers: chad, 20after4 Reviewed By: 20after4 Maniphest Tasks: T12450 Differential Revision: https://secure.phabricator.com/D17574	2017-03-28 14:53:26 -07:00
epriestley	e7c76d92d5	Make `bin/search init` messaging a little more consistent Summary: Ref T12450. This mostly just smooths out the text a little to improve consistency. Also: - Use `isWritable()`. - Make the "skipping because not writable" message more clear and tailored. - Try not to use the word "index" too much to avoid confusion with `bin/search index` -- instead, talk about "initialize a service". Test Plan: Ran `bin/search init` with a couple of different (writable / not writable) configs, saw slightly clearer messaging. Reviewers: chad, 20after4 Reviewed By: 20after4 Maniphest Tasks: T12450 Differential Revision: https://secure.phabricator.com/D17572	2017-03-28 13:57:37 -07:00
Mukunda Modell	9e2f263bb4	Add repositories to fulltext search index. Summary: This implements a simplistic `PhabricatorRepositoryFulltextEngine` Currently only the repository name, description, timestamps and status are indexed. Note: I had to change the `search index` workflow to disambiguate PhabricatorRepository from PhabricatorRepositoryCommit Test Plan: * ran `./bin/search index --type PhabricatorRepository --force` * searched for some repositories. Saw reasonable results matching on either title or description. * Edited a repository in the web ui * Added unique key words to the repo description. * I was then able to find that repo by searching for the new keywords. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin Tags: #search, #diffusion Differential Revision: https://secure.phabricator.com/D17300	2017-03-28 07:58:22 +00:00
Mukunda Modell	e41c25de50	Support multiple fulltext search clusters with 'cluster.search' config Summary: The goal is to make fulltext search back-ends more extensible, configurable and robust. When this is finished it will be possible to have multiple search storage back-ends and potentially multiple instances of each. Individual instances can be configured with roles such as 'read', 'write' which control which hosts will receive writes to the index and which hosts will respond to queries. These two roles make it possible to have any combination of: * read-only * write-only * read-write * disabled This 'roles' mechanism is extensible to add new roles should that be needed in the future. In addition to supporting multiple elasticsearch and mysql search instances, this refactors the connection health monitoring infrastructure from PhabricatorDatabaseHealthRecord and utilizes the same system for monitoring the health of elasticsearch nodes. This will allow Wikimedia's phabricator to be redundant across data centers (mysql already is, elasticsearch should be as well). The real-world use-case I have in mind here is writing to two indexes (two elasticsearch clusters in different data centers) but reading from only one. Then toggling the 'read' property when we want to migrate to the other data center (and when we migrate from elasticsearch 2.x to 5.x) Hopefully this is useful in the upstream as well. Remaining TODO: * test cases * documentation Test Plan: (WARNING) This will most likely require the elasticsearch index to be deleted and re-created due to schema changes. Tested with elasticsearch versions 2.4 and 5.2 using the following config: ```lang=json "cluster.search": [ { "type": "elasticsearch", "hosts": [ { "host": "localhost", "roles": { "read": true, "write": true } } ], "port": 9200, "protocol": "http", "path": "/phabricator", "version": 5 }, { "type": "mysql", "roles": { "write": true } } ] Also deployed the same changes to Wikimedia's production Phabricator instance without any issues whatsoever. ``` Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: Korvin, epriestley Tags: #elasticsearch, #clusters, #wikimedia Differential Revision: https://secure.phabricator.com/D17384	2017-03-26 08:16:47 +00:00
epriestley	23c42486e4	Rename "SearchEngine" to "FulltextStorageEngine" Summary: Ref T9979. I picked this name long before the advent of modern "Engine" architecture and it ended up being pretty confusing. Rename "SearchEngine" (currently: mysql or elasticsearch, used to store and query fulltext indexes) to "FulltextStorageEngine" to make it more clear what it does and disambituate it from ApplicationSearch, which also has a bunch of stuff called "SearchEngine", "SearchEngineExtension", etc. Test Plan: Grepped for `phabricatorsearchengine`. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9979 Differential Revision: https://secure.phabricator.com/D14843	2015-12-21 17:26:19 -08:00
epriestley	99c9df96b4	Convert all "DocumentIndexers" into "FulltextEngines" Summary: Ref T9979. This simplifies/standardizes the code a bit, but mostly gives us more consistent class names and structure. Test Plan: - Used `bin/search index --type ...` to index documents of every indexable type. - Searched for documents by unique text, found them. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9979 Differential Revision: https://secure.phabricator.com/D14842	2015-12-21 17:25:23 -08:00
epriestley	99bd12b98d	Lift Conpherence indexing up out of the Fulltext index Summary: Ref T9979. There are currently some hacks around Conpherence indexing: it does not really use the fulltext index, but its own specialized index. However, it's kind of hacked up so it can get reindexed by the normal indexing pipeline. Lift it up into IndexEngine, instead of FulltextEngine. Specifically, the new stuff is going to look like this: - IndexEngine: Rebuild all indexes. - ConpherenceIndexExtension: Rebuild thread indexes. - ProjectMemberIndexExtension: Rebuild project membership views. - NgramIndexExtension: Rebuild ngram indexes. - FulltextIndexExtension / FulltextEngine: Rebuild fulltext indexes, a special type of index. - FulltextCommentExtension: Rebuild comment fulltext indexes. - FulltextProjectExtension: Rebuild project fulltext indexes. - etc. Most of this is at least sort-of-in-place as of this diff, although some of the part in the middle is still pretty rough. Test Plan: - Made a unique comment in a Conpherence thread. - Used `bin/search index --force` to rebuild the index. - Searched for the comment. - Found the thread. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9979 Differential Revision: https://secure.phabricator.com/D14841	2015-12-21 17:25:05 -08:00
epriestley	2447d9bdf2	Begin improving modularity of IndexEngine, add locks Summary: Ref T9890. Ref T9979. Several adjacent goals: - The `SearchEngine` vs `ApplicationSearchEngine` thing is really confusing. There are also a bunch of confusing class names and class relationships within the fulltext indexing. I want to rename these classes to be more standard (`IndexEngine`, `IndexEngineExtension`, etc). Rename `SearchIndexer` to `IndexEngine`. A future change will rename `SearchEngine`. - Add the index locks described in T9890. - Structure things a little more normally so future diffs can do the "EngineExtension" thing more cleanly. Test Plan: Indexing: - Renamed a task to have a unique word in the title. - Ran `bin/search index Txxx`. - Searched for unique word. - Found task. Locking: - Added a `sleep(10)` after the `lock()` call. - Ran `bin/search index Txxx` in two windows. - Saw first one lock, sleep 10 seconds, index. - Saw second one give up temporarily after failing to grab the lock. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9890, T9979 Differential Revision: https://secure.phabricator.com/D14834	2015-12-21 17:04:10 -08:00
epriestley	24845c70b9	Refine error behavior of `bin/search index` Summary: Fixes T5991. If //all requested documents// failed to index, consider this a catastrophic failure and exit with an error code. Test Plan: - Ran `bin/search index --type TASK`, observed successful exit despite a small number of un-indexable documents. - Ran `bin/search index PHID-TASK-xxx` for an invalid task, observed exception on exit after complete failure. - Ran normal indexing through daemons. Reviewers: chad Reviewed By: chad Maniphest Tasks: T5991 Differential Revision: https://secure.phabricator.com/D14174	2015-09-27 13:11:11 -07:00
Joshua Spence	368f359114	Use PhutilClassMapQuery instead of PhutilSymbolLoader Summary: Use `PhutilClassMaQuery` instead of `PhutilSymbolLoader`, mostly for consistency. Depends on D13588. Test Plan: Poked around a bunch of pages. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin Differential Revision: https://secure.phabricator.com/D13589	2015-08-14 07:49:01 +10:00
Joshua Spence	36e2d02d6e	phtize all the things Summary: `pht`ize a whole bunch of strings in rP. Test Plan: Intense eyeballing. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: hach-que, Korvin, epriestley Differential Revision: https://secure.phabricator.com/D12797	2015-05-22 21:16:39 +10:00
Joshua Spence	16a8ed72bd	Modernize search engine selection Summary: Remove the `PhabricatorDefaultSearchEngineSelector` class. This is quite similar to D12053. Test Plan: Went to `/view/PhabricatorSearchApplication/` and saw the storage engine configuration. Set `search.elastic.host` and saw the highlighted storage engine change. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin, epriestley Differential Revision: https://secure.phabricator.com/D12670	2015-05-20 06:59:59 +10:00
epriestley	c0e15f2c65	Fix bad ancestor classname Summary: Derped this up in D11234. Test Plan: Ran `bin/search index --all`. Reviewers: joshuaspence Reviewed By: joshuaspence Subscribers: epriestley Differential Revision: https://secure.phabricator.com/D11273	2015-01-07 16:13:20 -08:00
epriestley	a455e50e29	Build a Conpherence thread index Summary: Ref T3165. Builds a dedicated index for Conpherence to avoid scale/policy filtering concerns. - This is pretty one-off but I think it's generally OK. - There's no UI for it. - `ConpherenceFulltextQuery` is very low-level. You would need to do another query on the PHIDs it returns to actually show anything to the user. - The `previousTransactionPHID` is so you can load chat context efficiently. Specifically, if you want to show results like this: > previous line of context > line of chat that matches the query > next line of context ...you can read the previous lines out of `previousTransactionPHID` directly, and the next lines by issuing one query with `WHERE previousTransactionPHID IN (...)`. I'm not 100% sure this is useful, but it seemed like a reasonable thing to provide, since there's no way to query this efficiently otherwise and I figure a lot of chat might make way more sense with a couple of lines of context. Test Plan: - Indexed a thread manually (whole thing indexed). - Indexed a thread by updating it (just the new comment indexed). - Wrote a hacky test script and got reasonable-looking query results. Reviewers: btrahan Reviewed By: btrahan Subscribers: epriestley Maniphest Tasks: T3165 Differential Revision: https://secure.phabricator.com/D11234	2015-01-06 10:24:30 -08:00
Chad Horohoe	a366f85c11	Properly create Elasticsearch index Summary: When the index does not exist and auto_create_index isn't enabled, running ./bin/index results in a failure. That's T5990 Instead create an index properly. This also allows us to do nice things like do a proper mapping and analysis like for substring matching like outlined by @fabe in T6552. Test Plan: Deleted and created index multiple times to verify proper index creation and usage. Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: Korvin, manybubbles, chasemp, fabe, epriestley Differential Revision: https://secure.phabricator.com/D10955	2014-12-22 13:10:52 -08:00
Joshua Spence	0151c38b10	Apply some autofix linter rules Summary: Self-explanatory. Test Plan: Eyeball it. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin Differential Revision: https://secure.phabricator.com/D10454	2014-09-10 06:55:05 +10:00
Joshua Spence	8756d82cf6	Remove `@group` annotations Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues. Test Plan: Eye-ball it. Reviewers: #blessed_reviewers, epriestley, chad Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin, hach-que Differential Revision: https://secure.phabricator.com/D9859	2014-07-10 08:12:48 +10:00
Joshua Spence	c86604bad8	Reduce the verbosity of the `./bin/search index` script. Summary: Currently, the `./bin/search index` script produces a lot of output (one line for every indexed object). Instead, use a `PhutilConsoleProgressBar` to indicate progress. This is much less verbose and gives a real indication of how long the script should take to complete. Test Plan: Ran `./bin/search index` and verified that a progress bar was output. Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: epriestley, Korvin Differential Revision: https://secure.phabricator.com/D9364	2014-06-03 11:46:43 -07:00
epriestley	a716fe99f3	Perform search indexing in the worker queue and respect `bin/search index --background` Summary: Fixes T3857. Earlier work made this trivial and just left product questions, which I've answered by requiring the daemons to run on reasonable installs. Test Plan: Ran `bin/search index` and `bin/search index --background`. Observed indexes write in the former case and tasks queue in the latter case. Commented with a unique string on a revision and searched for it a moment later, got exactly one result (that revision), verifying that reindexing works correctly. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T3857 Differential Revision: https://secure.phabricator.com/D7966	2014-01-14 13:22:56 -08:00
epriestley	e397103bf2	Extend all "ManagementWorkflow" classes from a base class Summary: Ref T2015. Not directly related to Drydock, but I've wanted to do this for a bit. Introduce a common base class for all the workflows in the scripts in `bin/*`. This slightly reduces code duplication by moving `isExecutable()` to the base, but also provides `getViewer()`. This is a little nicer than `PhabricatorUser::getOmnipotentUser()` and gives us a layer of indirection if we ever want to introduce more general viewer mechanisms in scripts. Test Plan: Lint; ran some of the scripts. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2015 Differential Revision: https://secure.phabricator.com/D7838	2013-12-27 13:15:40 -08:00
epriestley	e96201773d	Index projects in the main search index Summary: Part one of a large and complicated plot: - The last filter for Maniphest "pro" queries is "Group By". - This is currently executed in a convoluted and ridiculous way, loading massive amounts of data. - The primary reason it works like it does is that we don't have a project name index available in Maniphest, so we can't sort in the DB. - So, I want to provide a name index to Maniphest and push this work to the DB. To do that, my plan is: - Index projects in Search. - Add a "did update index" event. - Have Maniphest listen for it. - When projects are updated, update their indexes in Maniphest. - Rewrite the giant mess of "group by: project" to be somewhat reasonable. - This may also extend to some future "group by: assignee". This is the first small step down this path, which just indexes projects in search. Test Plan: Ran `bin/search index --type project`, then searched for projects. Reviewers: btrahan Reviewed By: btrahan CC: aran Differential Revision: https://secure.phabricator.com/D6955	2013-09-12 13:05:19 -07:00
epriestley	2845d11962	Remove PhabricatorPHID::fromObjectName Summary: Ref T2715. This only ever supported like 10% of object types; get rid of it in favor of the new infra. Test Plan: - Ran `bin/search index D12`; `bin/search index <some valid phid>`, `bin/search index derp`. - Turned off Search jump, searched for `D12`. - Used `phid.lookup`. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2715 Differential Revision: https://secure.phabricator.com/D6519	2013-07-22 12:17:37 -07:00
epriestley	7d771b4ff7	Support "M" in phid.lookup and ircbot Summary: Fixes T2651. This could be futher generalized but it's a bit out of the way. Test Plan: See chatlog. Reviewers: chad Reviewed By: chad CC: aran Maniphest Tasks: T2651 Differential Revision: https://secure.phabricator.com/D5236	2013-03-05 12:31:52 -08:00
epriestley	a22bea2a74	Apply lint rules to Phabricator Summary: Mostly applies a new call spacing rule; also a few things that have slipped through via pull requests and such Test Plan: `find src/ -type f -name '*.php' \| xargs -n16 arc lint --output summary --apply-patches` Reviewers: chad Reviewed By: chad CC: aran Differential Revision: https://secure.phabricator.com/D5002	2013-02-19 13:33:10 -08:00
epriestley	f6b1964740	Improve Search architecture Summary: The search indexing API has several problems right now: - Always runs in-process. - It would be nice to push this into the task queue for performance. However, the API currently passses an object all the way through (and some indexers depend on preloaded object attributes), so it can't be dumped into the task queue at any stage since we can't serialize it. - Being able to use the task queue will also make rebuilding indexes faster. - Instead, make the API phid-oriented. - No uniform indexing API. - Each "Editor" currently calls SomeCustomIndexer::indexThing(). This won't work with AbstractTransactions. The API is also just weird. - Instead, provide a uniform API. - No uniform CLI. - We have `scripts/search/reindex_everything.php`, but it doesn't actually index everything. Each new document type needs to be separately added to it, leading to stuff like D3839. Third-party applications can't provide indexers. - Instead, let indexers expose documents for indexing. - Not application-oriented. - All the indexers live in search/ right now, which isn't the right organization in an application-orietned view of the world. - Instead, move indexers to applications and load them with SymbolLoader. Test Plan: - `bin/search index` - Indexed one revision, one task. - Indexed `--type TASK`, `--type DREV`, etc., for all types. - Indexed `--all`. - Added the word "saboteur" to a revision, task, wiki page, and question and then searched for it. - Creating users is a pain; searched for a user after indexing. - Creating commits is a pain; searched for a commit after indexing. - Mocks aren't currently loadable in the result view, so their indexing is moot. Reviewers: btrahan, vrana Reviewed By: btrahan CC: 20after4, aran Maniphest Tasks: T1991, T2104 Differential Revision: https://secure.phabricator.com/D4261	2012-12-21 14:21:31 -08:00

34 commits