Summary:
Ref T6740. When we index a document, also save a copy of the stemmed version.
When querying, search the combined corpus for the terms.
(We may need to tune this a bit later since it's possible for literal, quoted terms to match in the stemmed section, but I think this wil rarely cause issues in practice.)
A downside here is that search sort of breaks if you upgrade into this and don't reindex. I wasn't able to find a way to issue the query that remained compatible with older indexes and didn't have awful performance, so my plan is:
- Put this on `secure`.
- Rebuild the index.
- If things look good after a couple of days, add a way that we can tell people they need to rebuild the search index with a setup warning.
We might get some reports between now and then, but if this is super awful we should know by the end of the weekend.
Test Plan:
WOW AMAZING
{F2021466}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6740
Differential Revision: https://secure.phabricator.com/D16947
Summary: Ref T11741. I'll wait until the release cut to land this; it just adds a test for InnoDB FULLTEXT being available instead of always returning `false`.
Test Plan:
- Ran with InnoDB fulltext locally for a day and a half without issues.
- Ran `bin/storage upgrade`, saw it detect InnoDB fulltext.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16946
Summary:
Ref T11741. This makes everything work if we switch to InnoDB, but never actually switches yet.
Since the default minimum word length (3) and stopword list (36 common English words) in InnoDB are generally pretty reasonable, I just didn't add any setup advice for them. I figure we're better off with simpler setup until we identify some real problem that the builtin stopwords create.
Test Plan: Swapped the `false` to `true`, ran `storage adjust`, got InnoDB fulltext indexes, searched for stuff, got default "AND" behavior.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16942
Summary:
Ref T11741. InnoDB uses a stopwords table instead of a stopwords file.
During `storage upgrade`, synchronize the table from the stopwords file on disk.
Test Plan:
- Ran `storage upgrade`.
- Ran `select * from stopwords`, saw stopwords.
- Added some garbage to the table.
- Ran `storage upgrade`, saw it remove it.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16940
Summary:
Ref T11741. Fixes T10642. Parse and compile user queries with a consistent ruleset, then submit queries to the backend using whatever ruleset MySQL is configured with.
This means that `ft_boolean_syntax` no longer needs to be configured (we'll just do the right thing in all cases).
This should improve behavior with RDS immediately (T10642), and allow us to improve behavior with InnoDB in the future (T11741).
Test Plan:
- Ran various queries in the UI, saw the expected results.
- Ran bad queries, got useful errors.
- Searched threads in Conpherence.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10642, T11741
Differential Revision: https://secure.phabricator.com/D16939
Summary:
Ref T4788. Fixes T10703.
In the longer term I want to put this on top of ApplicationSearch, but that's somewhat complex and we're at a fairly good point to pause this feature for feedback.
Inch toward that instead: provide more appropriate filters and defaults without rebuilding the underlying engine. Specifically:
- No "assigned" for commits (barely makes sense).
- No "assigned" for mocks (does not make sense).
- Default to "open" for parent tasks, subtasks, close as duplicate, and merge into.
Also, add a key to the `search_document` table to improve the performance of the "all open stuff of type X" query. "All Open Tasks" is about 100x faster on my machine with this key.
Test Plan:
- Clicked all object relationships, saw more sensible filters and defaults.
- Saw "open" query about 100x faster locally (300ms to 3ms).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4788, T10703
Differential Revision: https://secure.phabricator.com/D16202
Summary:
Ref T1191. Although I fixed some of the mutations earlier (in D10598), I missed the column mutations under old versions of MySQL. In particular, this isn't valid:
- `ALTER TABLE ... MODIFY columnName VARCHAR(64) COLLATE binary`
Issue the permitted version of this instead, which is:
- `ALTER TABLE ... MODIFY columnName VARBINARY(64)`
Also fixed an issue where a clean schema had the wrong nullability for a column in the draft table. Force it to the expected nullability.
The other trick here is around the one column with a FULLTEXT index on it, which needs a little massaging.
Test Plan:
- Forced my local install to return `false` for utf8mb4 support.
- Did a clean adjust into `binary` columns.
- Poked around, added emoji to things.
- Reverted the fake check and did a clean adjust into `utf8mb4` columns.
- Emoji survived.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: fabe, epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10627
Summary: Ref T1191. The index's case sensitivity depends on the column type. Using `text` makes the search case-sensitive, which is not desirable.
Test Plan: After adjustment, searched for "PROJECTS" and found hits against "projects".
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10619
Summary:
Ref T1191. Notable:
- Drops a very old saved query table. See comments inline: plan was to remove it after a year. It's been ~a year and two weeks.
- This has our only fulltext index. I'm not supporting that formally for now, but left a note.
- This has our only MyISAM table. I'm not supporting that explicitly for now, but it shouldn't affect anything. I may deal with this in the future.
- These tables don't actually write directly via Lisk, so there's some fiddling to get the schemata right.
Test Plan: Down to ~250 warnings. No more surplus databases or tables.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10589
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.
Test Plan: Eye-ball it.
Reviewers: #blessed_reviewers, epriestley, chad
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9859
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.
NOTE: `arc diff` timed out so I'm pushing it without review.
Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.
Auditors: epriestley
Maniphest Tasks: T1103
Summary: Last of the big final patches. Left a few debatable classes (12 out of about 400) that I'll deal with individually eventually.
Test Plan: Ran testEverythingImplemented.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, epriestley
Maniphest Tasks: T795
Differential Revision: https://secure.phabricator.com/D1881
Summary:
In preparation for adding another search engine (see T355):
- Rename "executor" to "engine".
- Move all engine-specific operations into the engine. Specifically, this
means that indexing moves out of the document store and into the engine (it was
sort of silly where it was before).
- Split choice of an engine into an overridable "selector" class, a base API,
and a concrete MySQL implementation (just like storage engine selection).
- Make all callers go through the indirection layer.
The default selector just unconditionally selects the MySQL engine, but now
(with D786) I can build an Elastic Search engine and you guys can build a
multi-target engine if you want and I don't get there fast enough.
Test Plan:
- Created a new document (task).
- Searched for and found it.
- Viewed index reconstruction.
Reviewed By: jungejason
Reviewers: jungejason, amckinley, tuomaspelkonen, aran
CC: aran, jungejason, epriestley
Differential Revision: 788