phorge-phorge/resources/sql/autopatches/20151221.search.2.ownersngrams.sql at 7bd4089a269490941236c020a040f104b9748ebe - revi-archive/phorge-phorge - SiliconForest Atelier

revi-archive/phorge-phorge

mirror of https://we.phorge.it/source/phorge.git synced 2024-12-11 08:06:13 +01:00

epriestley 96fe8c0b83 Implement basic ngram search for Owners Package names

Summary:
Ref T9979. This uses ngrams (specifically, trigrams) to build a reasonably efficient index for substring matching. Specifically, for a package like "Example", with ID 123, we store rows like this:

```
< ex, 123>
<exa, 123>
<xam, 123>
<amp, 123>
<mpl, 123>
<ple, 123>
<le , 123>
```

When the user searches for `exam`, we join this table for packages with tokens `exa` and `xam`. MySQL can do this a lot more efficiently than it can process a `LIKE "%exam%"` query against a huge table.

When the user searches for a one-letter or two-letter string, we only search the beginnings of words. This is probably what they want, the only thing we can do quickly, and a reasonable/expected behavior for typeaheads.

Test Plan:
  - Ran storage upgrades and search indexer.
  - Searched for stuff with "name contains".
  - Used typehaead and got sensible results.
  - Searched for `aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz` and saw only 16 joins.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9979

Differential Revision: https://secure.phabricator.com/D14846

2015-12-22 08:00:33 -08:00

7 lines

302 B

SQL

Raw Blame History

 CREATE TABLE {$NAMESPACE}_owners.owners_name_ngrams (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
   objectID INT UNSIGNED NOT NULL,
   ngram CHAR(3) NOT NULL COLLATE {$COLLATE_TEXT},
   KEY `key_object` (objectID),
   KEY `key_ngram` (ngram, objectID)
 ) ENGINE=InnoDB, COLLATE {$COLLATE_TEXT};