1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-22 04:31:13 +01:00
Commit graph

657 commits

Author SHA1 Message Date
epriestley
0409279595 Support Excel as a data export format
Summary:
Depends on D18954. Ref T13049. This brings over the existing Maniphest Excel export pipeline in a generic way.

The `<Type>ExportField` classes know directly that `PHPExcel` exists, which is a little sketchy, but writing an Excel indirection layer sounds like a lot of work and I don't anticipate us changing Excel backends anytime soon, so trying to abstract this feels YAGNI.

This doesn't bring over the install instructions for PHPExcel or the detection of whether or not it exists. I'll bring that over in a future change.

Test Plan: Exported users as Excel, opened them up, got a sensible-looking Excel sheet.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18955
2018-01-29 16:00:41 -08:00
epriestley
a067f64ebb Support export engine extensions and implement an extension for custom fields
Summary:
Depends on D18953. Ref T13049. Allow applications and infrastructure to supplement exportable fields for objects.

Then, implement an extension for custom fields. Only a couple field types (int, string) are supported for now.

Test Plan: Added some custom fields to Users, populated them, exported users. Saw custom fields in the export.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18954
2018-01-29 15:59:58 -08:00
epriestley
8b8a3142b3 Support export of data in files larger than 8MB
Summary:
Depends on D18952. Ref T13049. For files larger than 8MB, we need to engage the chunk storage engine. `PhabricatorFile::newFromFileData()` always writes a single chunk, and can't handle files larger than the mandatory chunk threshold (8MB).

Use `IteratorUploadSource`, which can, and "stream" the data into it. This should raise the limit from 8MB to 2GB (maximum size of a string in PHP).

If we need to go above 2GB we could stream CSV and text pretty easily, and JSON without too much trouble, but Excel might be trickier. Hopefully no one is trying to export 2GB+ datafiles, though.

Test Plan:
  - Changed the JSON exporter to just export 8MB of the letter "q": `return str_repeat('q', 1024 * 1024 * 9);`.
  - Before change: fatal, "no storage engine can store this file".
  - After change: export works cleanly.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18953
2018-01-29 15:58:34 -08:00
epriestley
0de6210808 Give data exporters a header row
Summary:
Depends on D18951. Ref T13049. When we export to CSV or plain text, add a header row in the first line of the file to explain what each column means. This often isn't obvious with PHIDs, etc.

JSON has keys and is essentially self-labeling, so don't do anything special.

Test Plan: Exported CSV and text, saw new headers. Exported JSON, no changes.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18952
2018-01-29 15:17:30 -08:00
epriestley
213eb8e93d Define common ID and PHID export fields in SearchEngine
Summary:
Ref T13049. All exportable objects should always have these fields, so make them builtins.

This also sets things up for extensions (like custom fields).

Test Plan: Exported user data, got the same export as before.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13049

Differential Revision: https://secure.phabricator.com/D18951
2018-01-29 15:17:00 -08:00
epriestley
9d69118664 Add a discovery format hint for date fields in SearchEngine UIs
Summary:
See PHI316. Maniphest and other applications currently have controls like `Created After: [_____]` where you just get an empty text field.

Although most formats work -- including relative formats like "3 days ago" -- and we validate inputs so you get an error if you enter something nonsensical, this still isn't very user friendly.

T8060 or some other approach is likely the long term of this control.

In the meantime, add placeholder text to suggest that `YYYY-MM-DD` or `X days ago` will work.

Test Plan: Viewed date inputs, saw placeholder text.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18942
2018-01-26 13:11:10 -08:00
epriestley
0ec83132a8 Support basic export of user accounts
Summary:
Depends on D18934. Ref T13046. Add support for the new export flow to a second application.

My goal here is mostly just to make sure that this is general enough to work in more than one place, and exporting user accounts seems plausible as a useful feature, although we do see occasional requests for this feature exactly (like <https://discourse.phabricator-community.org/t/users-export-to-csv/968>).

The exported data may not truly be useful for much (no disabled/admin/verified/MFA flags, no external account data, no email addresses for policy reasons) but we can expand it as use cases arise.

Test Plan: Exported user accounts in several formats.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13046

Differential Revision: https://secure.phabricator.com/D18935
2018-01-26 11:17:44 -08:00
epriestley
a79bb55f3f Support CSV, JSON, and tab-separated text as export formats
Summary: Depends on D18919. Ref T13046. Adds some simple modular exporters.

Test Plan: Exported pull logs in each format.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13046

Differential Revision: https://secure.phabricator.com/D18934
2018-01-26 11:16:52 -08:00
epriestley
c0b8e4784b Add a basic, general-purpose export workflow for all objects with SearchEngine support
Summary:
Depends on D18918. Ref T13046. Ref T5954. Pull logs can currently be browsed in the web UI, but this isn't very powerful, especially if you have thousands of them.

Allow SearchEngine implementations to define exportable fields so that users can "Use Results > Export Data" on any query. In particular, they can use this workflow to download a file with pull logs.

In the future, this can replace the existing "Export to Excel" feature in Maniphest.

For now, we hard-code JSON as the only supported datatype and don't actually make any effort to format the data properly, but this leaves room to add more exporters (CSV, Excel) and data type awareness (integer casting, date formatting, etc) in the future.

For sufficiently large result sets, this will probably time out. At some point, I'll make this use the job queue (like bulk editing) when the export is "large" (affects more than 1K rows?).

Test Plan: Downloaded pull logs in JSON format.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13046, T5954

Differential Revision: https://secure.phabricator.com/D18919
2018-01-26 11:15:59 -08:00
Aviv Eyal
d8f2630d5c Modernize QuickSearch typeahead
Summary:
Use ClassQuery to find datasources for the quick-search.

Mostly, this allows extensions to add quicksearches.

Test Plan:
using `/typeahead/class/`, tested several search terms that make sense.
Removed the tag interface from a datasource, which removed it from results.

Reviewers: epriestley, amckinley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D18760
2017-11-30 15:07:49 +00:00
epriestley
d36f98a15a Clarify acceptable values for --threshold in search ngrams
Summary: See D18710.

Test Plan: o_O

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18712
2017-10-17 14:32:25 -07:00
epriestley
63d1230ade Parameterize the common ngrams threshold
Summary:
Ref T13000. Since other changes have generally made the ngrams table manageable, I'm not planning to enable common ngrams by default at this time.

Instead, make the threshold configurable with "--threshold" so we can guide installs through tuning this if they want (e.g. PHI110), and tune hosted instances.

(This might eventually become automatic, but just smoothing this bit off for now feels reasonable to me.)

Test Plan: Ran with `--reset`, and with various invalid and valid `--threshold` arguments.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18710
2017-10-17 14:13:49 -07:00
Dmitri Iouchtchenko
9bd6a37055 Fix spelling
Summary: Noticed a couple of typos in the docs, and then things got out of hand.

Test Plan:
  - Stared at the words until my eyes watered and the letters began to swim on the screen.
  - Consulted a dictionary.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D18693
2017-10-09 10:48:04 -07:00
epriestley
17e83b53d5 Add "bin/search query" for debugging query execution
Summary:
Ref T13000. Currently, queries can only be executed from the web UI, which requires logging in as a user. I really want to avoid doing that wherever we can, but being able to execute queries on an instance (and, particularly, see the ngrams and timings on the underlying lookups) would have been helpful in several cases.

Improve tooling a bit in advance of the "common ngrams" stuff going out since it seems likely that it will be useful if issues arise.

Test Plan: Ran `bin/search query --query ...`, got useful minimal output. Ran with `--trace` to get internals.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18690
2017-10-06 08:50:34 -07:00
epriestley
66df5b1493 Add a garbage collector for common ngrams
Summary:
Ref T13000. After an ngram is marked as "common", we can delete it from the storage table.

Currently, the only way to get ngrams marked as "common" is to manually run `bin/search ngrams`, so this has no impact on normal installs.

Test Plan: Ran `bin/garbage collect`, saw it start chewing through my local Maniphest ngrams table and removing common ngrams.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18687
2017-10-05 11:41:18 -07:00
epriestley
3e589cdd73 Add a workflow for populating (or depopulating) the common ngrams table
Summary:
Depends on D18672. Ref T13000. This does an on-demand build of the common ngrams table.

Plan here is:

  - Push to `secure`.
  - Build the common ngrams table here.
  - See if stuff breaks?

If it looks okay on this dataset, we can build out the GC support and try it in production.

Test Plan:
  - Locally, my dataset has a bunch of `bin/lipsum` tasks with similar, common words.
  - Verified that ipsum terms now skip ngrams. For "lorem ipsum" search performance actually IMPROVED by skipping the ngrams table (12s to 9s).
  - Queried for normal terms, got very fast results using the ngram table, as normal.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18673
2017-10-03 13:28:19 -07:00
epriestley
1de130c9f5 Allow the Ferret engine to remove "common" ngrams from the index
Summary:
Ref T13000. This adds support for tracking "common" ngrams, which occur in too many documents to be useful as part of the ngram index.

If an ngram is listed in the "common" table, it won't be written when indexing documents, or queried for when searching for them.

In this change, nothing actually writes to the "common" table. I'll start writing to the table in a followup change.

Specifically, I plan to do this:

  - A new GC process updates the "common" table periodically, by writing ngrams which appear in more than X% of documents to it, for some value of X, if there are at least a minimum number of documents (maybe like 4,000).
  - A new GC process deletes ngrams that have been added to the common table from the existing indexes.

Hopefully, this will pare down the ngrams index to something reasonable over time without requiring any manual tuning.

Test Plan:
  - Ran some queries and indexes.
  - Manually inserted ngrams `xxx` and `yyy` into the ngrams table, searched and indexed, saw them ignored as viable ngrams for search/index.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18672
2017-10-03 13:27:42 -07:00
epriestley
a3a6c4ed2e Fix fatal when searching for "r matey prepare to be boarded"
Summary:
See <https://discourse.phabricator-community.org/t/unrecoverable-fatal-error-on-repository-search-in-top-search-bar/503/2>.

The Ferret engine replaced `withNameContains()`, but I missed this obscure callsite.

Test Plan:
  - Searched for `r matey prepare to be boarded`.
  - Before: fatal.
  - After: no fatal.
  - Also searched for `r <actual repository name>`, got repository.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18661
2017-09-29 09:45:39 -07:00
epriestley
086a125ad5 Improve performance of Ferret engine ngram extraction, particularly for large input strings
Summary:
See PHI87. Ref T12974. The `array_slice()` method of splitting the string apart can perform poorly for large input strings. I think this is mostly just the large number of calls plus building and returning an array being not entirely trivial.

We can just use `substr()` instead, as long as we're a little bit careful about keeping track of where we're slicing the string if it has UTF8 characters.

Test Plan:
  - Created a task with a single, unbroken blob of base64 encoded data as the description, roughly 100KB long.
  - Saw indexing performance improve from ~6s to ~1.5s after patch.
  - Before: https://secure.phabricator.com/xhprof/profile/PHID-FILE-nrxs4lwdvupbve5lhl6u/
  - After: https://secure.phabricator.com/xhprof/profile/PHID-FILE-6vs2akgjj5nbqt7yo7ul/

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18649
2017-09-27 10:41:39 -07:00
epriestley
a1d9a2389d Improve Ferret engine indexing performance for large blocks of text
Summary:
See PHI87. Ref T12974. Currently, we do a lot more work here than we need to: we call `phutil_utf8_strtolower()` on each token, but can do it once at the beginning on the whole block.

Additionally, since ngrams don't care about order, we only need to convert unique tokens into ngrams. This saves us some `phutil_utf8v()`. These calls can be slow for large inputs.

Test Plan:
  - Created a ~4MB task description.
  - Ran `bin/search index Txxx --profile ...` to profile indexing performance before and after the change.
  - Saw total runtime drop form 38s to 9s.
  - Before: <https://secure.phabricator.com/xhprof/profile/PHID-FILE-wiht5d7lkyazaywwxovw/>
  - After: <https://secure.phabricator.com/xhprof/profile/PHID-FILE-efxv56q2hulr6kjrxbx6/>

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18647
2017-09-27 08:15:40 -07:00
epriestley
1ac52c09e7 Improve search highlighting for CJK and substring queries
Summary:
Fixes T12995. Currently, the result highlighter (which shows //where// terms matched) only works in "term" mode, not in "substring" mode.

Provide better feedback and behvaior:

  - When a term is a substring term, color it a little differently and add a tooltip. (This is partly to make it easier to debug/diagnose things, probably not enormously valuable to users.)
  - When a term is a substring term, highlight it anywhere in the results.

Test Plan:
Queried for latin and CJK terms.

Here is CJK being highlighted:

{F5192195}

Here is substring vs non-substring implicit behavior:

{F5192196}

Here's ONLY terms being highlighted:

{F5192198}

Here's terms and substrings, since the query now has a substring:

{F5192201}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12995

Differential Revision: https://secure.phabricator.com/D18635
2017-09-22 11:34:46 -07:00
epriestley
da0a08a7e1 Make "mysql" mean "Ferret engine" in Fulltext search
Summary: Ref T12819. Swaps constants so existing configurations that use a "mysql" engine now use the Ferret engine, not an InnoDB/MyISAM FULLTEXT engine.

Test Plan: Swapped my local config back to "mysql" (the default), saw Ferret engine results in the UI.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18590
2017-09-11 18:05:12 -07:00
epriestley
39b74572e6 Return fulltext tokens from the Ferret fulltext engine
Summary:
Ref T12819. These render the little "Searched For: X, Y, U V" hint about how something was parsed.

(This might get a "substring" color or "title only" color or something in the future.)

Test Plan: {F5178807}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18589
2017-09-11 18:04:56 -07:00
epriestley
6edf98eb3b Unprototype the Ferret UI fields
Summary:
Ref T12819. Show the new Ferret engine fields (and enable the indexer) unconditionally.

Also pull them to the top since they're fairly general-purpose and appear more broadly now, and also they actually work correctly (WOW).

Some redundant fields (like "Name Contains" in Repositories and Owners) could probably be removed now, I may clean those up in a followup.

Test Plan: Browsed around, saw Ferret fields in UI without "(Prototype)" suffix.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18587
2017-09-11 18:04:25 -07:00
epriestley
d67cc8e5c5 Remove some redundant information from the Ferret engine index
Summary:
Ref T12819. The "full" field has all other fields, and the "core" field has "title" and "body". Due to the way the "full" and "core" fields were being built, the "core" field also got included in the "full" field, so the "full" field has two copies of the title, two copies of the body, and then one copy of everything else.

Put only one copy of each distinct thing in each "full" and "core". Also, simplify the logic a little bit so we build these virtual fields in a more consistent way.

Test Plan: Ran `bin/search index` and looked at the fields in the database, saw less redundant information.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18580
2017-09-08 09:40:12 -07:00
epriestley
7ea6de6e9c Split Ferret engine strings for tokenization on any sequence of whitespace
Summary:
Ref T12819. Currently, strings are split only on spaces, but newlines (and, if they exist, tabs) should also split strings.

Without this, we can fail to get the proper term boundary tokens for words which begin at the start of a line or end at the end of a line.

Test Plan: Reindexed a document with "xyz\nabc", saw `"yz "` and `" ab"` term boundary tokens generate properly.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18579
2017-09-08 09:39:57 -07:00
epriestley
2218caee0f Reduce the amount of boilerplate that implementing FerretInterface requires
Summary:
See brief discussion in D18554. All the index tables are the same for every application (and, at this point, seem unlikely to change) and we never actually pass these objects around (they're only used internally).

In some other cases (like Transactions) not every application has the same tables (for example, Differential has extra field for inline comments), and/or we pass the objects around (lots of stuff uses `$xactions` directly).

However, in this case, and in Edges, we don't interact with any representation of the database state directly in much of the code, and it doesn't change from application to application.

Just automatically define document, field, and ngram tables for anything which implements `FerretInterface`. This makes the query and index logic a tiny bit messier but lets us delete a ton of boilerplate classes.

Test Plan: Indexed objects, searched for objects. Same results as before with much less code. Ran `bin/storage upgrade`, got a clean bill of health.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18559
2017-09-07 13:23:31 -07:00
epriestley
3ff9d4a4ca Support Ferret engine for searching users
Summary:
Ref T12819. Adds support for indexing user accounts so they appear in global fulltext results.

Also, always rank users ahead of other results.

Test Plan: Indexed users. Searched for a user, got that user.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18552
2017-09-07 13:22:12 -07:00
epriestley
a2a2b3f7f4 Sort global fulltext results by overall relevance
Summary:
Ref T12819. Currently, under the Ferret engine, we query each application's index separately and then aggregate the results.

At the moment, results are aggregated by type first, then by actual rank. For example, all the revisions appear first, then all the tasks.

Instead, surface the internal ranking data from the underlying query and sort by it.

Test Plan: Searched for "A B" with a task named "A B" and a revision named "A". Saw task first. Broadly, saw mixed task and revision order in result sets.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18551
2017-09-07 13:21:58 -07:00
epriestley
8059db894d Use the Ferret engine fulltext document table to drive auxiliary fulltext constraints
Summary:
Ref T12819. I started trying to get individual engines to drive these constraints (e.g., `ManiphestTaskQuery` can do most of the work) but this is a big pain, especially since most engines don't support "any owner" or "no owner", and not everything has an owner, and so on and so on. Going down this path would have meant a huge pile of stub functions everywhere, I think.

Instead, drive these through the main engine using the fulltext document table, which already has everything we need to apply these constraints in a uniform way.

Also tweak some parts of query construction and result ordering.

Test Plan: Searched for documents by author, owner, unowned, any owner, tags, subscribers, fulltext in global search. Got sensible results without any application-specific code.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18550
2017-09-07 13:21:42 -07:00
epriestley
4ea677ba97 Skeleton support for running global fulltext queries via the Ferret engine
Summary:
Ref T12819. Provides a Ferret-engine-based fulltext engine to ultimately replace the InnoDB fulltext engine.

This is still pretty basic (hard-coded and buggy) but technically sort of works.

To activate this, you must explicitly configure it, so it isn't visible to users yet.

Test Plan: Searched for objects with global fulltext search, got a mixture of matching revisions and tasks back.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18548
2017-09-06 13:15:36 -07:00
epriestley
551c62b91a Support Ferret engine queries in ApplicationSearch via extension instead of hard-code
Summary: Ref T12819. Uses an extension rather than hard-coding support into Maniphest.

Test Plan: Saw "Query" field appear in Differential, which also implements the interface and has support. Used field in both applications.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18547
2017-09-06 13:15:10 -07:00
epriestley
faca1deea5 Remove the fulltext "reconstructDocument()" method
Summary: Ref T12819. This was originally intended for debugging, but never actually used and not clearly useful. There are no callers and it probably does not work. Just get rid of it.

Test Plan: Grepped for callers; none exist.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18544
2017-09-06 11:48:35 -07:00
Chad Little
fc893658b8 Update menu item names for Applications -> Favorites
Summary: Adds a `MenuName` method to applications that `ProfileMenuItem` uses instead of the application name if set. This improves the home/menu/new user experience at little cost. Also renamed the label from Applications to Favorites, since this menu gets altered to provide more than just applications. This also allows instances to set back to Maniphest if they so choose. Overall I think this direction resolves 95% of my concerns, with maybe a small potential downside which I don't really anticipate. We already name Dashboard panels by their object, and that hasn't really caused confusion. I think these links are similar. I click 'Tasks' and get presented a list of my tasks from Maniphest.

Test Plan: Review each of the name changes as a default new install and a modified install.

Reviewers: epriestley, amckinley

Reviewed By: epriestley

Spies: Korvin

Differential Revision: https://secure.phabricator.com/D18524
2017-09-05 19:05:03 -07:00
epriestley
20aad35e60 Move Ferret engine "title:..." field definitions to the engine itself
Summary: Ref T12819. Move these out of the core engine into the Ferret engine. In the future different applications can define different functions, like "summary:..." or whatever. This may get more formalization when I possibly do "author:" and such some time down the road.

Test Plan: Searched for "title:...". Searched for "dog:...", got a useful error.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18536
2017-09-05 11:57:51 -07:00
epriestley
46abc11114 Reduce the number of magic strings in the Ferret implementation
Summary:
Ref T12819. Push more of the magic `' '` stuff into the engine and simplify calls to ngram construction.

Also fixes a bug where a task with title "apple banana" and description "cherry doughnut" could match query "banana cherry" by separating separate term segments with newlines instead of spaces.

Test Plan:
  - Indexed some objects.
  - Searched (term, substring, quoted terms).
  - Viewed index in database.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18534
2017-09-05 11:57:35 -07:00
epriestley
4a7593f47f Consolidate more Ferret engine code into FerretEngine
Summary: Ref T12819. Earlier I separated some ngram code into an "ngram engine" hoping to share it across the simple Ngrams stuff and the full Ferret stuff, but they actually use slightly different rules. Just pull more of this stuff into FerretEngine to reduce the number of moving pieces and the amount of code duplication.

Test Plan: Searched for terms, rebuilt indexes.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18533
2017-09-05 11:57:18 -07:00
epriestley
577d498033 Create a virtual "core" field in the Ferret engine for "title and body together"
Summary: See PHI46. The `core:` function means "find results in either the title or body, but not other auxiliary fields like comments".

Test Plan: Searched for text present in the title (yes), body (yes), and comments (no) with the `core:...` prefix.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D18514
2017-09-01 09:40:56 -07:00
epriestley
f4f73e0a7e Separate fulltext engine extensions into "enrich" and "index" phases
Summary:
Ref T12819. Some of the extensions "enrich" the document (adding more fields or relationships), while others "index" it (insert it into some kind of index for later searching).

Currently, these are all muddled under a single "index" phase. However, the Ferret extension cares about fields and relationships which other extensions may add.

Split this into two phases: "enrich" adds fields and relationships so other extensions can read them later if they want. "Index" happens after the document is built and has all the fields and relationships.

The specific problem this solves is that comments may not have been added to the document when the Ferret extension runs. By moving them to the "enrich" phase, the Ferret engine will be able to see and index comments.

Test Plan: Ran `bin/search index ...`, grepped for `indexFulltextDocument`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18513
2017-09-01 09:40:11 -07:00
epriestley
df9c24e750 Provide some "term vs substring" support for the Ferret engine
Summary:
Ref T12819. Distinguishes between "term" queries and "substring" queries, and tries to match them correctly most of the time. For example:

  - `example` matches "example", obviously.
  - `~amp` matches "example", but `amp` does not.
  - `examples` matches "example" through stemming.
  - `"examples"` does not match "example" (quoted text does not stem).
  - `"an examp"` does not match "an example" (quoted text is still term text).
  - `~"an examp"` matches "an example" (quoted, substring-operator text uses substring search).

Test Plan: Ran searches similar to the above, they seemed to do what they should.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18500
2017-08-30 11:30:04 -07:00
epriestley
0e2e525bb4 Add a "terms" corpus to Ferret fields
Summary:
Ref T12819. Ferret currently does substring search, but this is not the default mode users expect: when you search for the "RICO" act, you do not expect to find documents containing "apRICOt" even though "RICO" is a substring.

To support term search, index the corpus as a list of terms with puncutation removed and whitespace normalized so the engine can match against it.

Test Plan:
Ran `storage upgrade`, ran `search index`, saw sensible database results:

```
   rawCorpus: This is the task description.

Hark! Whom'st'dve eaten this "food" shall surely ~perish~?? #blessed
normalCorpus: thi the task descript hark whom dve eaten food shall sure perish bless
  termCorpus:  This is the task description Hark Whom'st'dve eaten this food shall surely perish blessed
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18498
2017-08-30 11:29:14 -07:00
epriestley
77ef38f9a8 Aggregate corpus data in Ferret field rows
Summary:
Ref T12819. This addresses two issues:

  - One practical issue is that right now, if you search for "dog cat", and they appear in different fields (for example, "dog" appears ONLY in the title, while "cat" appears ONLY in a comment) we won't find the document. This is somewhat rare -- usually, if "dog" appears in the title, it's also repeated in the description -- but I think clearly a bug. To attack this, start automatically creating a virtual "ALL" field with the full document text which we'll use as the primary thing we match against.
  - For fields which may occur more than once -- today, only comments -- aggregate them all into one big "all of the text" row instead of writing one row per comment. This partly addresses the first point ("dog" in one comment and "cat" in a different comment won't be found) and partly makes some of the query gymnastics easier.

Test Plan:
Ran `bin/storage upgrade`, ran `bin/search index <Txxx>`, saw sensible corpus values in the database:

```
mysql> select * from maniphest_task_ffield\G
*************************** 1. row ***************************
          id: 3
  documentID: 1981
    fieldKey: full
   rawCorpus: This is the task title
This is the task description.
normalCorpus: thi the task titl
thi the task descript
*************************** 2. row ***************************
          id: 4
  documentID: 1981
    fieldKey: titl
   rawCorpus: This is the task title
normalCorpus: thi the task titl
*************************** 3. row ***************************
          id: 5
  documentID: 1981
    fieldKey: body
   rawCorpus: This is the task description.
normalCorpus: thi the task descript
3 rows in set (0.00 sec)
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18497
2017-08-30 11:28:30 -07:00
epriestley
4005a465f7 Make Ferret indexing more robust (UTF8, exception handling)
Summary:
Ref T12819. Two minor improvements from live data:

  - Tokenize in a UTF8-aware way.
  - When one document fails to index, kill the transaction explicitly (rather than leaving it hanging) so we don't cause other failures later.

Test Plan: Created some UTF8 documents locally, indexed them, got clean results.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18487
2017-08-28 15:49:57 -07:00
epriestley
f97157e7ed Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives
Summary:
Ref T12819. I gave this stuff a sweet code name because all the terms related to "fulltext" and "search" already mean 5 different things. It, uh, ferrets out documents for you?

I'm building this to work a lot like the existing ngram index, which seems to work pretty well. If this sticks, it will auto-resolve the join issue (in T12443) by letting us do the entire thing locally in a JOIN and thus dodge a lot of mess.

This index gets built alongside other indexes, but only shows up in the UI if you have prototypes enabled. If you do, it appears under the existing fulltext field in Maniphest. No existing functionality is affected or disrupted.

NOTE: The query engine half of this is still EXTREMELY primitive, and this probably performs worse than the existing field for now. If this doesn't show obvious signs of being awful on `secure` I'll improve that in followup changes.

Test Plan:
Indexed my tasks, ran some simple queries, got the results I wanted, even for queries "ko", "k", "v0.1".

{F5147746}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819, T12443

Differential Revision: https://secure.phabricator.com/D18484
2017-08-28 14:52:59 -07:00
epriestley
47da632a22 Separate saved queries in applications into "personal" and "global" queries
Summary:
Ref T12956. UI changes:

  - Administrators get a new `[X] Save as global query` option when saving a query.
  - "Edit Queries..." is split into "Personal" and "Global" sections. For administrators, each section can be edited. For non-admins, only the top section can be edited, but any query can be pinned.

A couple notes:

  - This doesn't support "pin for everyone by default". New users just get the first query from the bottom set. That seems reasonable for now.
  - Reordering is currently a little buggy (it works if you've reordered before, but not if you're reordering for the first time), but I need to migrate before I can fix / test that properly. So that'll get cleaned up in the next change or two.

Test Plan:
  - As an admin and non-admin, viewed, edited, disabled, saved-as-personal and saved-as-global various queries.

{F5098581}

{F5098582}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18426
2017-08-24 15:24:34 -07:00
epriestley
58b889c5b0 Make the default ApplicationSearch query explicit, not just the first item in the list
Summary:
Ref T12956. Currently, when you visit `/maniphest/` (or any other ApplicationSearch application) we execute the first query in the list by default.

In T12956, I plan to make changes so that personal queries are always first, then global/builtin queries. Without changing the "default query" rule, this will make it harder to have, for example, some custom queries in Differential but still run a global query like "Active" by default. To make this work, you'd have to save a personal copy of the "Active" query, then put it at the top.

This feels a bit cumbersome and this rule is kind of implicit and a little weird anyway. To make this work a little better as we make changes here, add an explicit pinning action, like the one we have in Project ProfileMenus.

You can now explicitly choose a query to make default.

Test Plan:
  - Browsed without pinning anything, saw normal behavior.
  - Pinned queries, viewed `/maniphest/`, saw a non-initial query selected by default.
  - Pinned a query, deleted it, nothing exploded.

{F5098484}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18422
2017-08-24 15:21:00 -07:00
Chad Little
748725a47d Don't select disabled menu items as default
Summary: Fixes T12969. If you disable "Home" but leave it at the top, we still load it.

Test Plan: Disabled "Home". Move Dashboard into first position, see correct home layout.

Reviewers: epriestley

Reviewed By: epriestley

Spies: Korvin

Maniphest Tasks: T12969

Differential Revision: https://secure.phabricator.com/D18455
2017-08-23 09:40:30 -07:00
epriestley
8c3243ef68 Lightly modernize NamedQueryQuery
Summary: Ref T12956. No real behavioral changes here, just slightly more modern code.

Test Plan: Reviewed named queries in Maniphest and "Edit Queries...".

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18420
2017-08-14 09:07:11 -07:00
epriestley
cfb86dddd2 Warn users that compound terms separated by apostrophes don't work in the MySQL FULLTEXT index either
Summary: Ref T12928. Like `v0.1`, terms in the form `yo's` (sequences of two or fewer characters separated by apostrophes) do not get indexed.

Test Plan: {F5078192}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12928

Differential Revision: https://secure.phabricator.com/D18324
2017-08-02 16:06:08 -07:00
Chad Little
ea0db5aa9d Clean up dropdown carets
Summary: Adds dropdown carets to buttons more universally that are actually dropdowns.

Test Plan: Differential, Application Search, Diffusion. Mobile and Desktop.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D18292
2017-07-28 15:11:25 -07:00