1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-25 16:22:43 +01:00
No description
Find a file
epriestley 1de130c9f5 Allow the Ferret engine to remove "common" ngrams from the index
Summary:
Ref T13000. This adds support for tracking "common" ngrams, which occur in too many documents to be useful as part of the ngram index.

If an ngram is listed in the "common" table, it won't be written when indexing documents, or queried for when searching for them.

In this change, nothing actually writes to the "common" table. I'll start writing to the table in a followup change.

Specifically, I plan to do this:

  - A new GC process updates the "common" table periodically, by writing ngrams which appear in more than X% of documents to it, for some value of X, if there are at least a minimum number of documents (maybe like 4,000).
  - A new GC process deletes ngrams that have been added to the common table from the existing indexes.

Hopefully, this will pare down the ngrams index to something reasonable over time without requiring any manual tuning.

Test Plan:
  - Ran some queries and indexes.
  - Manually inserted ngrams `xxx` and `yyy` into the ngrams table, searched and indexed, saw them ignored as viable ngrams for search/index.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18672
2017-10-03 13:27:42 -07:00
bin Add a profileimage generation workflow for the cli 2017-03-04 15:43:13 -08:00
conf Support "ssl.chain" in Aphlict configuration 2016-04-14 10:41:21 -07:00
externals Add profile images to Repositories 2017-06-12 07:51:39 -07:00
resources Allow the Ferret engine to remove "common" ngrams from the index 2017-10-03 13:27:42 -07:00
scripts Treat commit hook execution in observed repositories as a no-op, not an error 2017-08-01 08:32:42 -07:00
src Allow the Ferret engine to remove "common" ngrams from the index 2017-10-03 13:27:42 -07:00
support Update AphlictClientServer to support ws2 or ws3 2017-08-11 15:17:07 -07:00
webroot Make PHUITwoColumnView a little more printable 2017-09-25 19:56:22 +00:00
.arcconfig Set "history.immutable" to "false" explicitly in .arcconfig 2016-08-03 08:12:49 -07:00
.arclint Begin adding test coverage to GitHub Events API parsers 2016-03-09 09:30:07 -08:00
.arcunit Use the configuration driven unit test engine 2015-08-11 07:57:11 +10:00
.editorconfig Fix text lint issues 2015-02-12 07:00:13 +11:00
.gitignore Make i18n string extraction faster and more flexible 2016-07-04 10:23:30 -07:00
LICENSE Fix text lint issues 2015-02-12 07:00:13 +11:00
NOTICE Update Phabricator NOTICE file to reflect modern legal circumstances 2014-06-25 13:42:13 -07:00
README.md Remove push to IRC from "readme.md" too 2015-10-24 18:39:16 -07:00

Phabricator is a collection of web applications which help software companies build better software.

Phabricator includes applications for:

  • reviewing and auditing source code;
  • hosting and browsing repositories;
  • tracking bugs;
  • managing projects;
  • conversing with team members;
  • assembling a party to venture forth;
  • writing stuff down and reading it later;
  • hiding stuff from coworkers; and
  • also some other things.

You can learn more about the project (and find links to documentation and resources) at Phabricator.org

Phabricator is developed and maintained by Phacility.


SUPPORT RESOURCES

For resources on filing bugs, requesting features, reporting security issues, and getting other kinds of support, see Support Resources.

NO PULL REQUESTS!

We do not accept pull requests through GitHub. If you would like to contribute code, please read our Contributor's Guide.

LICENSE

Phabricator is released under the Apache 2.0 license except as otherwise noted.