Fix an issue with selecting the right stemmed ngrams with Ferret engine queries

Summary: Ref T12819. In D18581, I corrected one bug (ngram selection for terms) but introduced a minor new bug. We now pass `' query '` (term corpus with boundary spaces) to the stemmer, but it bails out on this since English words don't start with spaces. Trim these extra boundary spaces off before invoking the stemmer. The practical effect of this is that searching for non-stem variations of a word ("detection") now finds stemmed variations again ("detect"). Prior to fixing this bug, the stem could find longer variations but not the other way around. Test Plan: Searched for "detection", found results matching "detect" after patch (and saw same results for "detect" and "detection"). Reviewers: chad Reviewed By: chad Maniphest Tasks: T12819 Differential Revision: https://secure.phabricator.com/D18593
2024-11-26 08:42:41 +01:00 · 2017-09-12 07:44:29 -07:00 · 2017-09-12 07:44:29 -07:00 · fdc0d8c2f6
commit fdc0d8c2f6
parent e6f0f86518
1 changed files with 3 additions and 0 deletions
--- a/src/infrastructure/query/policy/PhabricatorCursorPagedPolicyAwareQuery.php
+++ b/src/infrastructure/query/policy/PhabricatorCursorPagedPolicyAwareQuery.php
@ -1683,6 +1683,9 @@ abstract class PhabricatorCursorPagedPolicyAwareQuery
        // If this is a stemmed term, only look for ngrams present in both the
        // unstemmed and stemmed variations.
        if ($is_stemmed) {
+          // Trim the boundary space characters so the stemmer recognizes this
+          // is (or, at least, may be) a normal word and activates.
+          $terms_value = trim($terms_value, ' ');
          $stem_value = $stemmer->stemToken($terms_value);
          $stem_ngrams = $engine->getTermNgramsFromString($stem_value);
          $ngrams = array_intersect($ngrams, $stem_ngrams);