1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-22 14:52:41 +01:00

Fix an issue with selecting the right stemmed ngrams with Ferret engine queries

Summary:
Ref T12819. In D18581, I corrected one bug (ngram selection for terms) but introduced a minor new bug. We now pass `' query '` (term corpus with boundary spaces) to the stemmer, but it bails out on this since English words don't start with spaces.

Trim these extra boundary spaces off before invoking the stemmer.

The practical effect of this is that searching for non-stem variations of a word ("detection") now finds stemmed variations again ("detect"). Prior to fixing this bug, the stem could find longer variations but not the other way around.

Test Plan: Searched for "detection", found results matching "detect" after patch (and saw same results for "detect" and "detection").

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18593
This commit is contained in:
epriestley 2017-09-12 07:44:29 -07:00
parent e6f0f86518
commit fdc0d8c2f6

View file

@ -1683,6 +1683,9 @@ abstract class PhabricatorCursorPagedPolicyAwareQuery
// If this is a stemmed term, only look for ngrams present in both the
// unstemmed and stemmed variations.
if ($is_stemmed) {
// Trim the boundary space characters so the stemmer recognizes this
// is (or, at least, may be) a normal word and activates.
$terms_value = trim($terms_value, ' ');
$stem_value = $stemmer->stemToken($terms_value);
$stem_ngrams = $engine->getTermNgramsFromString($stem_value);
$ngrams = array_intersect($ngrams, $stem_ngrams);