1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-02-08 12:58:31 +01:00
phorge-phorge/src/applications/search/ferret/__tests__
epriestley 086a125ad5 Improve performance of Ferret engine ngram extraction, particularly for large input strings
Summary:
See PHI87. Ref T12974. The `array_slice()` method of splitting the string apart can perform poorly for large input strings. I think this is mostly just the large number of calls plus building and returning an array being not entirely trivial.

We can just use `substr()` instead, as long as we're a little bit careful about keeping track of where we're slicing the string if it has UTF8 characters.

Test Plan:
  - Created a task with a single, unbroken blob of base64 encoded data as the description, roughly 100KB long.
  - Saw indexing performance improve from ~6s to ~1.5s after patch.
  - Before: https://secure.phabricator.com/xhprof/profile/PHID-FILE-nrxs4lwdvupbve5lhl6u/
  - After: https://secure.phabricator.com/xhprof/profile/PHID-FILE-6vs2akgjj5nbqt7yo7ul/

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18649
2017-09-27 10:41:39 -07:00
..
PhabricatorFerretEngineTestCase.php Improve performance of Ferret engine ngram extraction, particularly for large input strings 2017-09-27 10:41:39 -07:00