mirror of
https://we.phorge.it/source/phorge.git
synced 2024-11-22 23:02:42 +01:00
Fix an issue with selecting the right stemmed ngrams with Ferret engine queries
Summary: Ref T12819. In D18581, I corrected one bug (ngram selection for terms) but introduced a minor new bug. We now pass `' query '` (term corpus with boundary spaces) to the stemmer, but it bails out on this since English words don't start with spaces. Trim these extra boundary spaces off before invoking the stemmer. The practical effect of this is that searching for non-stem variations of a word ("detection") now finds stemmed variations again ("detect"). Prior to fixing this bug, the stem could find longer variations but not the other way around. Test Plan: Searched for "detection", found results matching "detect" after patch (and saw same results for "detect" and "detection"). Reviewers: chad Reviewed By: chad Maniphest Tasks: T12819 Differential Revision: https://secure.phabricator.com/D18593
This commit is contained in:
parent
e6f0f86518
commit
fdc0d8c2f6
1 changed files with 3 additions and 0 deletions
|
@ -1683,6 +1683,9 @@ abstract class PhabricatorCursorPagedPolicyAwareQuery
|
|||
// If this is a stemmed term, only look for ngrams present in both the
|
||||
// unstemmed and stemmed variations.
|
||||
if ($is_stemmed) {
|
||||
// Trim the boundary space characters so the stemmer recognizes this
|
||||
// is (or, at least, may be) a normal word and activates.
|
||||
$terms_value = trim($terms_value, ' ');
|
||||
$stem_value = $stemmer->stemToken($terms_value);
|
||||
$stem_ngrams = $engine->getTermNgramsFromString($stem_value);
|
||||
$ngrams = array_intersect($ngrams, $stem_ngrams);
|
||||
|
|
Loading…
Reference in a new issue