Summary:
Ref T13000. Since other changes have generally made the ngrams table manageable, I'm not planning to enable common ngrams by default at this time.
Instead, make the threshold configurable with "--threshold" so we can guide installs through tuning this if they want (e.g. PHI110), and tune hosted instances.
(This might eventually become automatic, but just smoothing this bit off for now feels reasonable to me.)
Test Plan: Ran with `--reset`, and with various invalid and valid `--threshold` arguments.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18710
Summary:
Ref T13000. After an ngram is marked as "common", we can delete it from the storage table.
Currently, the only way to get ngrams marked as "common" is to manually run `bin/search ngrams`, so this has no impact on normal installs.
Test Plan: Ran `bin/garbage collect`, saw it start chewing through my local Maniphest ngrams table and removing common ngrams.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18687
Summary:
Depends on D18672. Ref T13000. This does an on-demand build of the common ngrams table.
Plan here is:
- Push to `secure`.
- Build the common ngrams table here.
- See if stuff breaks?
If it looks okay on this dataset, we can build out the GC support and try it in production.
Test Plan:
- Locally, my dataset has a bunch of `bin/lipsum` tasks with similar, common words.
- Verified that ipsum terms now skip ngrams. For "lorem ipsum" search performance actually IMPROVED by skipping the ngrams table (12s to 9s).
- Queried for normal terms, got very fast results using the ngram table, as normal.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13000
Differential Revision: https://secure.phabricator.com/D18673