1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-18 19:40:55 +01:00
Commit graph

628 commits

Author SHA1 Message Date
epriestley
8059db894d Use the Ferret engine fulltext document table to drive auxiliary fulltext constraints
Summary:
Ref T12819. I started trying to get individual engines to drive these constraints (e.g., `ManiphestTaskQuery` can do most of the work) but this is a big pain, especially since most engines don't support "any owner" or "no owner", and not everything has an owner, and so on and so on. Going down this path would have meant a huge pile of stub functions everywhere, I think.

Instead, drive these through the main engine using the fulltext document table, which already has everything we need to apply these constraints in a uniform way.

Also tweak some parts of query construction and result ordering.

Test Plan: Searched for documents by author, owner, unowned, any owner, tags, subscribers, fulltext in global search. Got sensible results without any application-specific code.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18550
2017-09-07 13:21:42 -07:00
epriestley
4ea677ba97 Skeleton support for running global fulltext queries via the Ferret engine
Summary:
Ref T12819. Provides a Ferret-engine-based fulltext engine to ultimately replace the InnoDB fulltext engine.

This is still pretty basic (hard-coded and buggy) but technically sort of works.

To activate this, you must explicitly configure it, so it isn't visible to users yet.

Test Plan: Searched for objects with global fulltext search, got a mixture of matching revisions and tasks back.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18548
2017-09-06 13:15:36 -07:00
epriestley
551c62b91a Support Ferret engine queries in ApplicationSearch via extension instead of hard-code
Summary: Ref T12819. Uses an extension rather than hard-coding support into Maniphest.

Test Plan: Saw "Query" field appear in Differential, which also implements the interface and has support. Used field in both applications.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18547
2017-09-06 13:15:10 -07:00
epriestley
faca1deea5 Remove the fulltext "reconstructDocument()" method
Summary: Ref T12819. This was originally intended for debugging, but never actually used and not clearly useful. There are no callers and it probably does not work. Just get rid of it.

Test Plan: Grepped for callers; none exist.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18544
2017-09-06 11:48:35 -07:00
Chad Little
fc893658b8 Update menu item names for Applications -> Favorites
Summary: Adds a `MenuName` method to applications that `ProfileMenuItem` uses instead of the application name if set. This improves the home/menu/new user experience at little cost. Also renamed the label from Applications to Favorites, since this menu gets altered to provide more than just applications. This also allows instances to set back to Maniphest if they so choose. Overall I think this direction resolves 95% of my concerns, with maybe a small potential downside which I don't really anticipate. We already name Dashboard panels by their object, and that hasn't really caused confusion. I think these links are similar. I click 'Tasks' and get presented a list of my tasks from Maniphest.

Test Plan: Review each of the name changes as a default new install and a modified install.

Reviewers: epriestley, amckinley

Reviewed By: epriestley

Spies: Korvin

Differential Revision: https://secure.phabricator.com/D18524
2017-09-05 19:05:03 -07:00
epriestley
20aad35e60 Move Ferret engine "title:..." field definitions to the engine itself
Summary: Ref T12819. Move these out of the core engine into the Ferret engine. In the future different applications can define different functions, like "summary:..." or whatever. This may get more formalization when I possibly do "author:" and such some time down the road.

Test Plan: Searched for "title:...". Searched for "dog:...", got a useful error.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18536
2017-09-05 11:57:51 -07:00
epriestley
46abc11114 Reduce the number of magic strings in the Ferret implementation
Summary:
Ref T12819. Push more of the magic `' '` stuff into the engine and simplify calls to ngram construction.

Also fixes a bug where a task with title "apple banana" and description "cherry doughnut" could match query "banana cherry" by separating separate term segments with newlines instead of spaces.

Test Plan:
  - Indexed some objects.
  - Searched (term, substring, quoted terms).
  - Viewed index in database.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18534
2017-09-05 11:57:35 -07:00
epriestley
4a7593f47f Consolidate more Ferret engine code into FerretEngine
Summary: Ref T12819. Earlier I separated some ngram code into an "ngram engine" hoping to share it across the simple Ngrams stuff and the full Ferret stuff, but they actually use slightly different rules. Just pull more of this stuff into FerretEngine to reduce the number of moving pieces and the amount of code duplication.

Test Plan: Searched for terms, rebuilt indexes.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18533
2017-09-05 11:57:18 -07:00
epriestley
577d498033 Create a virtual "core" field in the Ferret engine for "title and body together"
Summary: See PHI46. The `core:` function means "find results in either the title or body, but not other auxiliary fields like comments".

Test Plan: Searched for text present in the title (yes), body (yes), and comments (no) with the `core:...` prefix.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D18514
2017-09-01 09:40:56 -07:00
epriestley
f4f73e0a7e Separate fulltext engine extensions into "enrich" and "index" phases
Summary:
Ref T12819. Some of the extensions "enrich" the document (adding more fields or relationships), while others "index" it (insert it into some kind of index for later searching).

Currently, these are all muddled under a single "index" phase. However, the Ferret extension cares about fields and relationships which other extensions may add.

Split this into two phases: "enrich" adds fields and relationships so other extensions can read them later if they want. "Index" happens after the document is built and has all the fields and relationships.

The specific problem this solves is that comments may not have been added to the document when the Ferret extension runs. By moving them to the "enrich" phase, the Ferret engine will be able to see and index comments.

Test Plan: Ran `bin/search index ...`, grepped for `indexFulltextDocument`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18513
2017-09-01 09:40:11 -07:00
epriestley
df9c24e750 Provide some "term vs substring" support for the Ferret engine
Summary:
Ref T12819. Distinguishes between "term" queries and "substring" queries, and tries to match them correctly most of the time. For example:

  - `example` matches "example", obviously.
  - `~amp` matches "example", but `amp` does not.
  - `examples` matches "example" through stemming.
  - `"examples"` does not match "example" (quoted text does not stem).
  - `"an examp"` does not match "an example" (quoted text is still term text).
  - `~"an examp"` matches "an example" (quoted, substring-operator text uses substring search).

Test Plan: Ran searches similar to the above, they seemed to do what they should.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18500
2017-08-30 11:30:04 -07:00
epriestley
0e2e525bb4 Add a "terms" corpus to Ferret fields
Summary:
Ref T12819. Ferret currently does substring search, but this is not the default mode users expect: when you search for the "RICO" act, you do not expect to find documents containing "apRICOt" even though "RICO" is a substring.

To support term search, index the corpus as a list of terms with puncutation removed and whitespace normalized so the engine can match against it.

Test Plan:
Ran `storage upgrade`, ran `search index`, saw sensible database results:

```
   rawCorpus: This is the task description.

Hark! Whom'st'dve eaten this "food" shall surely ~perish~?? #blessed
normalCorpus: thi the task descript hark whom dve eaten food shall sure perish bless
  termCorpus:  This is the task description Hark Whom'st'dve eaten this food shall surely perish blessed
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18498
2017-08-30 11:29:14 -07:00
epriestley
77ef38f9a8 Aggregate corpus data in Ferret field rows
Summary:
Ref T12819. This addresses two issues:

  - One practical issue is that right now, if you search for "dog cat", and they appear in different fields (for example, "dog" appears ONLY in the title, while "cat" appears ONLY in a comment) we won't find the document. This is somewhat rare -- usually, if "dog" appears in the title, it's also repeated in the description -- but I think clearly a bug. To attack this, start automatically creating a virtual "ALL" field with the full document text which we'll use as the primary thing we match against.
  - For fields which may occur more than once -- today, only comments -- aggregate them all into one big "all of the text" row instead of writing one row per comment. This partly addresses the first point ("dog" in one comment and "cat" in a different comment won't be found) and partly makes some of the query gymnastics easier.

Test Plan:
Ran `bin/storage upgrade`, ran `bin/search index <Txxx>`, saw sensible corpus values in the database:

```
mysql> select * from maniphest_task_ffield\G
*************************** 1. row ***************************
          id: 3
  documentID: 1981
    fieldKey: full
   rawCorpus: This is the task title
This is the task description.
normalCorpus: thi the task titl
thi the task descript
*************************** 2. row ***************************
          id: 4
  documentID: 1981
    fieldKey: titl
   rawCorpus: This is the task title
normalCorpus: thi the task titl
*************************** 3. row ***************************
          id: 5
  documentID: 1981
    fieldKey: body
   rawCorpus: This is the task description.
normalCorpus: thi the task descript
3 rows in set (0.00 sec)
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18497
2017-08-30 11:28:30 -07:00
epriestley
4005a465f7 Make Ferret indexing more robust (UTF8, exception handling)
Summary:
Ref T12819. Two minor improvements from live data:

  - Tokenize in a UTF8-aware way.
  - When one document fails to index, kill the transaction explicitly (rather than leaving it hanging) so we don't cause other failures later.

Test Plan: Created some UTF8 documents locally, indexed them, got clean results.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18487
2017-08-28 15:49:57 -07:00
epriestley
f97157e7ed Build a prototype fulltext engine ("Ferret") using only basic MySQL primitives
Summary:
Ref T12819. I gave this stuff a sweet code name because all the terms related to "fulltext" and "search" already mean 5 different things. It, uh, ferrets out documents for you?

I'm building this to work a lot like the existing ngram index, which seems to work pretty well. If this sticks, it will auto-resolve the join issue (in T12443) by letting us do the entire thing locally in a JOIN and thus dodge a lot of mess.

This index gets built alongside other indexes, but only shows up in the UI if you have prototypes enabled. If you do, it appears under the existing fulltext field in Maniphest. No existing functionality is affected or disrupted.

NOTE: The query engine half of this is still EXTREMELY primitive, and this probably performs worse than the existing field for now. If this doesn't show obvious signs of being awful on `secure` I'll improve that in followup changes.

Test Plan:
Indexed my tasks, ran some simple queries, got the results I wanted, even for queries "ko", "k", "v0.1".

{F5147746}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819, T12443

Differential Revision: https://secure.phabricator.com/D18484
2017-08-28 14:52:59 -07:00
epriestley
47da632a22 Separate saved queries in applications into "personal" and "global" queries
Summary:
Ref T12956. UI changes:

  - Administrators get a new `[X] Save as global query` option when saving a query.
  - "Edit Queries..." is split into "Personal" and "Global" sections. For administrators, each section can be edited. For non-admins, only the top section can be edited, but any query can be pinned.

A couple notes:

  - This doesn't support "pin for everyone by default". New users just get the first query from the bottom set. That seems reasonable for now.
  - Reordering is currently a little buggy (it works if you've reordered before, but not if you're reordering for the first time), but I need to migrate before I can fix / test that properly. So that'll get cleaned up in the next change or two.

Test Plan:
  - As an admin and non-admin, viewed, edited, disabled, saved-as-personal and saved-as-global various queries.

{F5098581}

{F5098582}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18426
2017-08-24 15:24:34 -07:00
epriestley
58b889c5b0 Make the default ApplicationSearch query explicit, not just the first item in the list
Summary:
Ref T12956. Currently, when you visit `/maniphest/` (or any other ApplicationSearch application) we execute the first query in the list by default.

In T12956, I plan to make changes so that personal queries are always first, then global/builtin queries. Without changing the "default query" rule, this will make it harder to have, for example, some custom queries in Differential but still run a global query like "Active" by default. To make this work, you'd have to save a personal copy of the "Active" query, then put it at the top.

This feels a bit cumbersome and this rule is kind of implicit and a little weird anyway. To make this work a little better as we make changes here, add an explicit pinning action, like the one we have in Project ProfileMenus.

You can now explicitly choose a query to make default.

Test Plan:
  - Browsed without pinning anything, saw normal behavior.
  - Pinned queries, viewed `/maniphest/`, saw a non-initial query selected by default.
  - Pinned a query, deleted it, nothing exploded.

{F5098484}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18422
2017-08-24 15:21:00 -07:00
Chad Little
748725a47d Don't select disabled menu items as default
Summary: Fixes T12969. If you disable "Home" but leave it at the top, we still load it.

Test Plan: Disabled "Home". Move Dashboard into first position, see correct home layout.

Reviewers: epriestley

Reviewed By: epriestley

Spies: Korvin

Maniphest Tasks: T12969

Differential Revision: https://secure.phabricator.com/D18455
2017-08-23 09:40:30 -07:00
epriestley
8c3243ef68 Lightly modernize NamedQueryQuery
Summary: Ref T12956. No real behavioral changes here, just slightly more modern code.

Test Plan: Reviewed named queries in Maniphest and "Edit Queries...".

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12956

Differential Revision: https://secure.phabricator.com/D18420
2017-08-14 09:07:11 -07:00
epriestley
cfb86dddd2 Warn users that compound terms separated by apostrophes don't work in the MySQL FULLTEXT index either
Summary: Ref T12928. Like `v0.1`, terms in the form `yo's` (sequences of two or fewer characters separated by apostrophes) do not get indexed.

Test Plan: {F5078192}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12928

Differential Revision: https://secure.phabricator.com/D18324
2017-08-02 16:06:08 -07:00
Chad Little
ea0db5aa9d Clean up dropdown carets
Summary: Adds dropdown carets to buttons more universally that are actually dropdowns.

Test Plan: Differential, Application Search, Diffusion. Mobile and Desktop.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D18292
2017-07-28 15:11:25 -07:00
epriestley
018d1b77bf Identify compound short search tokens in the form "xx.yy" as unqueryable in the search UI
Summary:
Ref T12928. The index doesn't work for these, so show the user that there's a problem and drop the terms.

This doesn't fix the problem, but makes the behavior more clear.

Test Plan:
{F5053703}

{F5053704}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12928

Differential Revision: https://secure.phabricator.com/D18254
2017-07-20 14:24:09 -07:00
Aviv Eyal
d1f144b214 Fix Search Application Config
Summary: Fix T12924. Looks like this melted in D17384, and nobody noticed yet.

Test Plan: Visit page, see fancy table.

Reviewers: epriestley, 20after4, #blessed_reviewers

Reviewed By: epriestley, 20after4, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T12924

Differential Revision: https://secure.phabricator.com/D18230
2017-07-18 17:44:56 +00:00
epriestley
b46e2bb4cc Convert cluster/projects config options to newer modular structure
Summary: Ref T12845. Converts the cluster and project config options to the new stuff; this is mostly just shifting boilerplate around.

Test Plan: Edited, deleted, and mangled these options from the web UI and CLI.

Reviewers: chad, amckinley

Reviewed By: amckinley

Maniphest Tasks: T12845

Differential Revision: https://secure.phabricator.com/D18166
2017-06-27 12:35:54 -07:00
epriestley
a198590533 Degrade more gracefully when ProfileMenu dashboards fail to render
Summary:
Ref T12871. This replaces a dead end UI (user totally locked out) with one where the menu is still available, if the default menu item is one which generates a policy exception (e.g., because users can't see the dashboard).

Really, we should do better than this and not select this item as the default item if the viewer can't see it, but there is currently no reliable way to test for "can the viewer see this item?" so this is a more involved change. I'm thinking we get this minor improvement into the release, then pursue a more detailed fix afterward.

Test Plan:
  - Added a dashboard as the top item in the global menu.
  - Changed the dashboard to be visible to only user B.
  - Viewed Home as user A.
  - Before patch: entire page is a policy exception dialog.
  - After patch, things are better:

{F5014179}

Reviewers: chad, amckinley

Reviewed By: amckinley

Maniphest Tasks: T12871

Differential Revision: https://secure.phabricator.com/D18152
2017-06-23 12:31:36 -07:00
epriestley
f704f905d2 Let PhabricatorSearchCheckboxesField survive saved query data with mismatched types
Summary:
Fixes T12851.

This should fix the error I'm seeing, which is:

* `Argument 1 passed to array_fuse() must be of the type array, boolean given`

There may be a better way to patch this up than overriding the getValue() method,
however.

Test Plan:
- Changed the default "Tags" filter to specify `true` instead of `array('self')`, then viewed that filter in the UI.
- Before patch: fatal.
- After patch: page loads. Note that `true` is not interpreted as `array('self')`, but the page isn't broken, which is a big improvement.

Reviewers: #blessed_reviewers, 20after4, chad, amckinley

Reviewed By: #blessed_reviewers, amckinley

Subscribers: Korvin

Maniphest Tasks: T12851

Differential Revision: https://secure.phabricator.com/D18132
2017-06-23 12:29:47 -07:00
Chad Little
00400ae6f9 Search and Replace calls to setShade
Summary: grep for setShade and update to setColor. Add deprecated warning.

Test Plan: Diffusion, Workboards, Maniphest, Project tags, tokenizer, uiexamples

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, O14 ATC Monitoring

Differential Revision: https://secure.phabricator.com/D17995
2017-05-22 18:59:53 +00:00
epriestley
f880000eb0 Stem fulltext tokens before filtering them for stopwords
Summary:
Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan:
Queried for words that stem to stopwords, saw them filtered:

{F4915843}

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

{F4915844}

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12596

Differential Revision: https://secure.phabricator.com/D17728
2017-04-19 10:02:21 -07:00
epriestley
6052bc1933 Extend "fulltext" and "ngrams" interfaces from "indexable" interface
Summary: Ref T8788. See D17702. This allows `bin/search index` to index stuff which only implements `Ngrams`, not `Fulltext`.

Test Plan: Kinda poked around `bin/search index` a bit, yell if you hit more issues deeper down the stack?

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T8788

Differential Revision: https://secure.phabricator.com/D17704
2017-04-17 12:59:41 -07:00
Chad Little
5587abf04c Remove recentParticipants from ConpherenceThread
Summary: We no longer display this any more in the UI, so go ahead and remove the callsites and db column.

Test Plan: New Room, with and without participants.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17683
2017-04-13 13:55:08 -07:00
Chad Little
2c5ee2a225 Fix Durable Column CSS-Overload
Summary: This moves the count on the Conpherence Menu Item into a phui-list-item-count, and removes the CSS call to the entire Conphrence stack when durable column is open.

Test Plan: Test with and without the chat column, and a menu with a count

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17677
2017-04-13 11:29:30 -07:00
epriestley
ada9046e31 Fix a fulltext search issue where finding token length and stopwords could fail
Summary:
Ref T12137. If a database is missing the InnoDB or MyISAM table engines, the big combined query to get both will fail.

Instead, try InnoDB first and then MyISAM.

(I have both engines locally so this worked until I deployed it.)

Test Plan: Faked an InnoDB error like `secure`, got a MyISAM result.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12137

Differential Revision: https://secure.phabricator.com/D17673
2017-04-12 19:22:46 -07:00
epriestley
3245e74f16 Show users how fulltext search queries are parsed and executed; don't query stopwords or short tokens
Summary:
Depends on D17670. Fixes T12137. Fixes T12003. Ref T2632.

This shows users a readout of which terms were actually searched for.

This also drops those terms from the query we submit to the backend, dodging the weird behaviors / search engine bugs in T12137.

This might need some design tweaking.

Test Plan: {F4899825}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12137, T12003, T2632

Differential Revision: https://secure.phabricator.com/D17672
2017-04-12 19:07:54 -07:00
epriestley
cb49acc2ca Update Phabricator to use intermediate tokens from the query compiler
Summary:
Depends on D17669. Ref T12137. Ref T12003. Ref T2632. Ref T7860.

Converts Phabricator to the new parse + compile workflow with intermediate tokens.

Also fixes a bug where searches for `cat"` or similar (unmatched quotes) wouldn't produce a nice exception.

Test Plan:
  - Fulltext searched.
  - Fulltext searched in Conpherence.
  - Fulltext searched with bad syntax.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12137, T12003, T7860, T2632

Differential Revision: https://secure.phabricator.com/D17670
2017-04-12 19:07:33 -07:00
epriestley
4bf968148c Fix pagination of fulltext search results
Summary:
Fixes T8285. Fulltext search relies on an underlying engine which can not realistically use cursor paging. This is unusual and creates some oddness.

Tweak a few numbers -- and how offsets are handled -- to separate the filtered offset and unfiltered offset.

Test Plan:
  - Set page size to 2.
  - Ran a query.
  - Paged forward and backward through results sensibly, seeing the full result set.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T8285

Differential Revision: https://secure.phabricator.com/D17667
2017-04-12 17:57:46 -07:00
Chad Little
6bf595b951 Check is viewer is a participant before showing count
Summary: In Conpherence ProfileMenuItem we show an unread count if you're a participant, but all message count if you're not. Just remove that.

Test Plan: Log out of room in Conpherence, leave messages on second account, check menu item on both accounts.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17664
2017-04-12 13:27:07 -07:00
Chad Little
75303567b3 Add a Conpherence Profile Menu Item
Summary: Builds a Conpherence Profile Menu Item, complete with counts for the unreads. This allows pinning to home as well as swapping out thread list in Conpherence for pinning eventually.

Test Plan: Add a menu item, chat in room, log into other account, see room count. Room count disappears after viewing.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17662
2017-04-12 13:07:44 -07:00
epriestley
7e6f37fffb Rename "ElasticSearch" filenames to "Elasticsearch" (2/2)
Sometimes git does some odd magic on case-insensitive filesystems, try to
trick it.

Auditors: chad
2017-04-02 14:59:36 -07:00
epriestley
a9e2732a5c Spell "Elasticsearch" correctly, not "ElasticSearch"
Summary: Ref T12450. These are like 95% my fault, but Elastic appears to spell the name "Elasticsearch" consistently in their branding.

Test Plan: `grep ElasticSearch`

Reviewers: chad, 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17601
2017-04-02 14:58:59 -07:00
epriestley
0f144d29e9 When "cluster.search" changes, don't trust the old index versions
Summary:
Ref T12450. We track a "document version" for updating search indexes, so that if a document is rapidly updated many times in a row we can skip most of the work.

However, this version doesn't consider "cluster.search" configuration, so if you add a new service (like a new ElasticSearch host) we still think that every document is up-to-date. When you run `bin/search index` to populate the index (without `--force`), we just do nothing.

This isn't necessarily very obvious. D17597 makes it more clear, by printing "everything was skipped and nothing happened" at the end.

Here, fix the issue by considering the content of "cluster.search" when computing fulltext document versions: if you change `cluster.search`, we throw away the version index and reindex everything.

This is slightly more work than we need to do, but changes to "cluster.search" are rare and this is much easier than trying to individually track which versions of which documents are in which services, which probably isn't very useful anyway.

Test Plan:
  - Ran `bin/search index --type project`, saw everything get skipped.
  - Changed `cluster.search`.
  - Ran `search index` again, saw everything get updated.
  - Ran a third time without changing `cluster.search`, everything was properly skipped.

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17598
2017-04-02 13:45:48 -07:00
epriestley
bd93978200 Count and report skipped documents from "bin/search index"
Summary:
Ref T12450. There's currently a bad behavior where inserting a document into one search service marks it as up to date everywhere.

This isn't nearly as obvious as it should be because `bin/search index` doesn't make it terribly clear when a document was skipped because the index version was already up to date.

When running `bin/seach index` without `--force` or `--background`, keep track of updated vs not-updated documents and print out some guidance. In other configurations, try to provide more help too.

Test Plan: {F4452134}

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17597
2017-04-02 13:45:30 -07:00
epriestley
6d81675032 Remove "url" from Elasticsearch index
Summary:
Ref T12450. This was added a very very long time ago (D2298).

I don't want to put this in the upstream index anymore because I don't want to encourage third parties to develop software which reads the index directly. Reading the index directly is a big skeleton key which bypasses policy checks.

This was added before much of the policy model existed, when that wasn't as much of a concern. On a tecnhnical note, this also doesn't update when `phabricator.base-uri` changes.

This can be written as a search index extension if an install relies on it for some bizarre reason, although none should and I'm unaware of any actual use cases in the wild for it, even at Facebook.

Test Plan: Indexed some random stuff into ElasticSearch.

Reviewers: chad, 20after4

Reviewed By: chad

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17600
2017-04-02 13:26:45 -07:00
epriestley
64234535e3 Remove FIELD_KEYWORDS, index project slugs as body content
Summary:
D17384 added a "keywords" field but only partially implemented it.

  - Remove this field.
  - Index project slugs as part of the document body instead.

Test Plan:
  - Ran `bin/search index PHID-PROJ-... --force`.
  - Found project by searching for a unique slug.

Reviewers: chad, 20after4

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D17596
2017-04-02 09:36:32 -07:00
Mukunda Modell
cb1d904654 Make sure writes go to the right cluster
Summary:
Two little issues

1. there was an extra call to getHostForWrite,
2. The engine instance was shared between multiple service definitions so it
was overwriting the list of writable hosts from one service with hosts from another.

Test Plan:
tested in wikimedia production with multiple services defined like this:

```language=json
 [
        {
          "hosts": [
            {
              "host": "search.svc.codfw.wmnet",
              "protocol": "https",
              "roles": {
                "read": true,
                "write": true
              },
              "version": 5
            }
          ],
          "path": "/phabricator",
          "port": 9243,
          "type": "elasticsearch"
        },
        {
          "hosts": [
            {
              "host": "search.svc.eqiad.wmnet",
              "protocol": "https",
              "roles": {
                "read": true,
                "write": true
              },
              "version": 5
            }
          ],
          "path": "/phabricator",
          "port": 9243,
          "type": "elasticsearch"
        }
      ]
```

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D17581
2017-03-30 18:08:05 +00:00
Mukunda Modell
67a1c40476 Set content-type to application/json
Summary:
Elasticsearch really wants a raw json body and it fails to accept
the request as of es version 5.3

Test Plan:
Tested with elasticsearch 5.2 and 5.3.

Before this change 5.2 worked but 5.3 failed with
`HTTP/406 "Content-Type header [application/x-www-form-urlencoded] is not supported"` [1]

After this change, both worked.

[1] https://phabricator.wikimedia.org/P5158

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17580
2017-03-30 18:07:47 +00:00
Mukunda Modell
654f0f6043 Make messages translatable and more sensible.
Summary:
These exception messages & comments didn't quite match reality.
Fixed and added pht() around a couple of them.

Test Plan: I didn't test this :P

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D17578
2017-03-28 23:17:35 +00:00
epriestley
5f939dcce0 Re-run config validation from bin/search
Summary:
Ref T12450. Normally, we validate config when:

  - You restart the webserver.
  - You edit it with `bin/config set ...`.
  - You edit it with the web UI.

However, you can also change config by editing `local.json`, `some_env.conf.php`, a `SiteConfig` class, etc. In these cases, you may miss config warnings.

Explicitly re-run search config checks from `bin/search`, similar to the additional database checks we run from `bin/storage`, to try to produce a better error message if the user has made a configuration error.

Test Plan:
```
$ ./bin/search init
Usage Exception: Setting "cluster.search" is misconfigured: Invalid search engine type: elastic. Valid types are: elasticsearch, mysql.
```

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17574
2017-03-28 14:53:26 -07:00
epriestley
c22693ff29 Remove PhabricatorSearchEngineTestCase
Summary:
Ref T12450. This is now pointless and just asserts that `cluster.search` has a default value.

We might restore a fancier version of this eventually, but get rid of this for now.

Test Plan: Scruitinized the test case.

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17573
2017-03-28 13:57:55 -07:00
epriestley
e7c76d92d5 Make bin/search init messaging a little more consistent
Summary:
Ref T12450. This mostly just smooths out the text a little to improve consistency. Also:

  - Use `isWritable()`.
  - Make the "skipping because not writable" message more clear and tailored.
  - Try not to use the word "index" too much to avoid confusion with `bin/search index` -- instead, talk about "initialize a service".

Test Plan: Ran `bin/search init` with a couple of different (writable / not writable) configs, saw slightly clearer messaging.

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17572
2017-03-28 13:57:37 -07:00
Mukunda Modell
699228c73b Address some New Search Configuration Errata
Summary:
  [ ] Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
  [ ] Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
  [ ] We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From T9893 it seems like we may //only// have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
  [ ] Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically seen users choosing Elastic when they aren't actually trying to solve any specific problem.
  [ ] When users search for fulltext results in Maniphest and hit too many documents, the current behavior is approximately silent failure (see T12443). D17384 has also lowered the ceiling for ElasticSearch, although previous changes lowered it for MySQL search. We should not fail silently, and ideally should build toward T12003.
  [ ] D17384 added a new "keywords" field, but MySQL does not search it (I think?). The behavior should be as consistent across MySQL and Elastic as we can make it. Likely cleaner is giving "Project" objects a body, with "slugs" and "description" separated by newlines?
  [ ] `PhabricatorSearchEngineTestCase` is now pointless and only detects local misconfigurations.
  [ ] It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them. The upstream test could run against MySQL, and some `bin/search test` could run against a configured engine like ElasticSearch. This would make it easier to make sure that behavior was as uniform as possible across engine implementations.
  [ ] Does every assigned task now match "user" in ElasticSearch?
  [x] `PhabricatorElasticFulltextStorageEngine` has a `json_encode()` which should be `phutil_json_encode()`.
  [ ] `PhabricatorSearchService` throws an untranslated exception.
  [ ] When a search cluster is down, we probably don't degrade with much grace (unhandled exception)?
  [ ] I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
  [ ] bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
  [ ] CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
  [ ] It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
  [ ] When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
  [x] Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.
  [x] Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
  [x] Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
  [x] Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization in ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
  [x] The "index incorrect" warning UI uses inconsistent title case.
  [x] The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).

refs T12450

Test Plan:
* Stared blankly at the code.
* Disabled 'write' role on mysql fulltext service.
* Edited a task, ran search indexer, verified that the mysql index wasn't being updated.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17564
2017-03-28 20:19:38 +00:00