1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-18 18:51:12 +01:00
Commit graph

147 commits

Author SHA1 Message Date
epriestley
9502312b60 Remove "final" from "private" methods in Phabricator
Summary:
Ref T13588. This has never been meaningful, but a "final private" method is specifically forbidden in PHP8.

Remove meaningless "final" from these methods, per new lint checks.

Test Plan: Ran `arc lint --everything` to identify affected methods, then `... | xargs -n1 arc lint --apply-patches`.

Maniphest Tasks: T13588

Differential Revision: https://secure.phabricator.com/D21540
2021-02-03 14:13:29 -08:00
epriestley
671986592b Add a missing "GROUP BY" to MailQuery when querying for multiple recipients
Summary:
See <https://discourse.phabricator-community.org/t/mail-details-view-broken/4315>. The change in D21400 detects a missing "GROUP BY" in some variations of this query.

Specifically, we may join multiple recipient rows (since mail may have multiple recipients) and then fail to group the results.

Fix this by adding the "GROUP BY". Additionally, remove the special-cased behavior when no authors or recipients are specified -- it's complicated and not entirely correct (e.g., may produce a "no object" instead of a policy error when querying by ID), and likely predates overheating.

Test Plan:
  - Disabled `metamta.one-mail-per-recipient` in Config.
  - Generated a message to 2+ recipients.
  - Viewed the message detail; queried for the message by specifying 2+ recipients.
  - Viewed the unfiltered list of messages, saw the query overheat.

Differential Revision: https://secure.phabricator.com/D21486
2020-10-30 13:02:22 -07:00
epriestley
2748f83e12 Modularize Ferret fulltext functions
Summary: Ref T13511. Currently, Ferret fulltext field functions (like "title:") are hard-coded. Modularize them so extensions may define new ones.

Test Plan: Added a new custom field which emits data for the indexer, searched for "animal-noises:moo", "animal-noises:-", etc., in global search and application search.

Maniphest Tasks: T13511

Differential Revision: https://secure.phabricator.com/D21131
2020-04-16 13:41:13 -07:00
epriestley
9bdf477f2f Combine the two different ngram-splitting algorithms into a single engine
Summary:
Ref T13501. Depends on D21127. With the "prefix" behavior removed in D21127, we now have two virtually identical copies of the same code.

The newer one in Ferret is better: it slices utf8 correctly and is slightly more efficient on large inputs. Pull it out and make all callers call into it.

Test Plan:
  - Grepped for all affected symbols.
  - Ran `bin/search index --force ...` to reindex various objects (tasks, files).
  - Searched for things in the UI.

Maniphest Tasks: T13501

Differential Revision: https://secure.phabricator.com/D21128
2020-04-16 09:45:00 -07:00
epriestley
fb3f423279 Remove broken and unfixable "prefix" ngram behavior
Summary:
Ref T13501. The older ngram code has some "prefix" behavior that tries to handle cases where a user issues a very short (one or two character) query.

This code doesn't work, presumably never worked, and can not be made to work (or, at least, I don't see a way, and am fairly sure one does not exist).

If the user searches for "xy", we can find trigrams in the form "xy*" using the index, but not in the form "*xy". The code makes a misguided effort to look for " xy", but this will only find "xy" in words that begin with "xy", like "xylophone".

For example, searching Files for "om" does not currently find "random.txt".

Remove this behavior. Without engaging the trigram index, these queries fall back to an unidexed "LIKE" table scan, but that's about the best we can do.

Test Plan: Searched for "om", hit "random.txt".

Maniphest Tasks: T13501

Differential Revision: https://secure.phabricator.com/D21127
2020-04-16 09:44:37 -07:00
epriestley
0511b2a012 Implement the "present" and "absent" operators in the Ferret execution engine
Summary:
Ref T13509. Now that the compiler can parse these queries, actually implement them.

These are fairly easy to implement:

  - For present, just "JOIN". If it works, the field is present.
  - For absent, we "LEFT JOIN" and then "WHERE any_column IS NULL".

Test Plan: Searched for various documents with and without fields present, got sensible results in Maniphest. For example, "body:-" finds tasks with no body, "body:- duck" finds tasks with no body and "duck" elsewhere in the content, and so on.

Maniphest Tasks: T13509

Differential Revision: https://secure.phabricator.com/D21110
2020-04-14 10:55:30 -07:00
epriestley
8fa8d0e648 Make Ferret query functions sticky only if their values are not quoted
Summary:
Ref T13509. Currently, functions are "sticky", but this stickness is in the query execution layer.

Instead:

  - move stickiness to the query compiler; and
  - make it so that functions are not sticky if their arguments are quoted.

For example:

  - `title:x y` previously meant `title:x title:y` (and still does). The "title:" is sticky.
  - `title:"x" y` previously meant `title:x title:y`. It now means `title:x all:y`. The "title:" is not sticky because the argument is quoted.

Test Plan: Added unit tests, ran unit tests.

Maniphest Tasks: T13509

Differential Revision: https://secure.phabricator.com/D21108
2020-04-14 10:47:51 -07:00
epriestley
0872051bfa Make AuthProvider, ExternalAccount, and ExternalAccountIdentifier all Destructible
Summary: Depends on D21014. Ref T13493. Make these objects all use destructible interfaces and destroy sub-objects appropriately.

Test Plan:
  - Used `bin/remove destroy --trace ...` to destroy a provider, a user, and an external account.
  - Observed destruction of sub-objects, including external account identifiers.

Maniphest Tasks: T13493

Differential Revision: https://secure.phabricator.com/D21015
2020-02-22 17:46:29 -08:00
epriestley
f1b054a20f Correct the interaction between overheating and offset-based paging
Summary:
Ref T13386. If you issue `differential.query` with a large offset (like 3000), it can overheat regardless of policy filtering and fail with a nonsensical error message.

This is because the overheating limit is based only on the query limit, not on the offset.

For example, querying for "limit = 100" will never examine more than 1,100 rows, so a query with "limit = 100, offset = 3000" will always fail (provided there are at least that many revisions).

Not all numbers work like you might expect them to becuase there's also a 1024-row fetch window, but basically small limits plus big offsets always fail.

Test Plan: Artificially reduced the internal window size from 1024 to 5, then ran `differential.query` with `offset=50` and `limit=3`. Before: overheated with weird error message. After: clean result.

Maniphest Tasks: T13386

Differential Revision: https://secure.phabricator.com/D20728
2019-08-21 20:27:35 -07:00
epriestley
cb4add3116 In Ferret, allow documents with no title to match query terms by using LEFT JOIN on the "title" ranking field
Summary:
Fixes T13345. See D20650. Currently, `PhabricatorCursorPagedPolicyAwareQuery` does a JOIN against the "title" field so it can apply additional ranking/ordering conditions to the query.

This means that documents with no title (which don't have this field) are always excluded from the result set.

We'd prefer to include them, just not give them any bonus ranking/relevance boost. Use a LEFT JOIN so they get included.

Test Plan:
  - Applied D20650 (diff 1), made it use raw `getTitle()` as the document title, indexed a paste with no title.
  - Searched for a term in the paste body.
  - Before change: no results.
  - After change: found result.

{F6601159}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13345

Differential Revision: https://secure.phabricator.com/D20660
2019-07-18 10:37:36 -07:00
epriestley
642113708a Build a rough transaction-level view of Feed
Summary:
Ref T13294. An install is interested in a way to easily answer audit-focused questions like "what edits were made to any Herald rule in Q1 2019?".

We can answer this kind of question with a more granular version of feed that focuses on being exhaustive rather than being human-readable.

This starts a rough version of it and deals with the two major tricky pieces: transactions are in a lot of different tables; and paging across them is not trivial.

To solve "lots of tables", we just query every table. There's a little bit of sleight-of-hand to get this working, but nothing too awful.

To solve "paging is hard", we order by "<dateCreated, phid>". The "phid" part of this order doesn't have much meaning, but it lets us put every transaction in a single, stable, global order and identify a place in that ordering given only one transaction PHID.

Test Plan: {F6463076}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13294

Differential Revision: https://secure.phabricator.com/D20531
2019-05-21 12:28:00 -07:00
epriestley
00b1c4190c Correct some straggling Ferret/Cursor interactions
Summary:
See PHI1182. Ref T13266. The recent fixes didn't quite cover the case where you have a query, but order by something other than relevance, and page forward.

Refine the tests around building/selecting these columns and paging values a little bit to be more specific about what they care about.

Test Plan:
Executed queries, then went to "Next Page", for:

  - query text, non-relevance order.
  - query text, relevance order.
  - no query text, non-relevance order.
  - no query text, relevance order.

Also, made an API call similar to the one in PHI1182.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13266

Differential Revision: https://secure.phabricator.com/D20354
2019-04-01 14:53:33 -07:00
epriestley
7f90570636 When paging by Ferret "rank", page using "HAVING rank > ...", not "WHERE rank > ..."
Summary:
Ref T13091. The Ferret "rank" column is a function of the query text and looks something like `SELECT ..., 2 + 2 AS rank, ...`.

You can't apply conditions to this kind of dynamic column with a WHERE clause: you get a slightly unhelpful error like "column rank unknown in where clause". You must use HAVING:

```
mysql> SELECT 2 + 2 AS x WHERE x = 4;
ERROR 1054 (42S22): Unknown column 'x' in 'where clause'
mysql> SELECT 2 + 2 AS x HAVING x = 4;
+---+
| x |
+---+
| 4 |
+---+
1 row in set (0.00 sec)
```

Add a flag to paging column definitions to let them specify that they must be applied with HAVING, then apply the whole paging clause with HAVING if any column requires HAVING.

Test Plan:
  - In Maniphest, ran a fulltext search matching more than 100 results, ordered by "Relevance", then clicked "Next Page".
  - Before patch: query with `... WHERE rank > 123 OR ...` caused MySQL error because `rank` is not a WHERE-able column.
  - After patch: query builds as `... HAVING rank > 123 OR ...`, pages properly, no MySQL error.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13091

Differential Revision: https://secure.phabricator.com/D20298
2019-03-25 10:52:36 -07:00
epriestley
b90e02bec2 Select Ferret fulltext columns in results so fulltext queries work under UNION
Summary:
Ref T13091. In Differential, if you provide a query and "Sort by: Relevance", we build a query like this:

```
((SELECT revision.* FROM ... ORDER BY rank) UNION ALL (SELECT revision.* FROM ... ORDER BY rank)) ORDER BY rank
```

The internal "ORDER BY rank" is technically redundant (probably?), but doesn't hurt anything, and makes construction easier.

The problem is that the outer "ORDER BY rank" at the end, which attempts to order the results of the two parts of the UNION, can't actually order them, since `rank` wasn't selected.

(The column isn't actually "rank", which //is// selected -- it's the document modified/created subcolumns, which are not.)

To fix this, actually select the fulltext columns into the result set.

Test Plan:
  - Ran a non-empty fulltext query in Differential with "Bucket: Required Action" selected so the UNION construction fired.
  - Ran normal queries in Maniphest and global search.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13091

Differential Revision: https://secure.phabricator.com/D20297
2019-03-19 13:22:42 -07:00
epriestley
a8ba8217d8 Skip Ferret fulltext columns in "ORDER BY" if there's no fulltext query
Summary:
Ref T13091. If you "Order By: Relevance" but don't actually specify a query, we currently raise a bare exception.

This operation is sort of silly/pointless, but it seems like it's probably best to just return the results for the other constraints in the fallback order (usually, by ID). Alternatively, we could raise a non-bare exception here ("You need to provide a fulltext query to order by relevance.")

Test Plan: Queried tasks by relevance with no actual query text.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13091

Differential Revision: https://secure.phabricator.com/D20296
2019-03-19 13:20:34 -07:00
epriestley
3940c8e1f4 Make the UI when you use an invalid cursor ("?after=19874189471232892") a little nicer
Summary:
Ref T13259. Currently, visiting a page that executes a query with an invalid cursor raises a bare exception that escapes to top level.

Catch this a little sooner and tailor the page a bit.

Test Plan: Visited `/maniphest/?after=335234234223`, saw a nicer exception page.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20295
2019-03-19 13:04:07 -07:00
epriestley
18b444e427 When queries overheat, raise an exception
Summary:
Ref T13259. Currently, queries set a flag and return a partial result set when they overheat. This is mostly okay:

  - It's very unusual for queries to overheat if they don't have a real viewer.
  - Overheating is rare in general.
  - In most cases where queries can overheat, the context is a SearchEngine UI, which handles this properly.

In T13259, we hit a case where a query with an omnipotent viewer can overheat: if you have more than 1,000 consecutive commits in the database with invalid `repositoryID` values, we'll overheat and bail out. This is pretty bad, since we don't process everything.

Change this beahvior:

  - Throw by default, so this stuff doesn't slip through the cracks.
  - Handle the SearchEngine case explicitly ("it's okay to overheat, we'll handle it").
  - Make `QueryIterator` disable overheating behavior: if we're iterating over all objects, we want to hit the whole table even if most of it is garbage.

There are some cases where this might cause new exception behavior that we don't necessarily want. For example, in Owners, each package shows "recent commits in this package". If you can't see the first 1,000 recent commits, you'd previously get a slow page with no results. Now you'll probably get an exception.

If these crop up, I think the best approach for now is to deal with them on a case-by-case basis and see how far we get. In the "Owners" case, it might be good to query by repositories you can see first, then query by commits in the package in those repositories. That should give us a better outcome than any generic behavior we could implement.

Test Plan:
  - Added 100000 to all repositoryID values for commits on my local install.
  - Before making changes, ran `bin/repository rebuild-identities --all --trace`. Saw the script process 1,000 rows and exit silently.
  - Applied the first part ("throw by default") and ran `bin/repository rebuild-identities`. Saw the script process 1,000 rows, then raise an exception.
  - Applied the second part ("disable for queryiterator") and ran the script again. Saw the script process all 15,000 rows without issues (although none are valid and none actually load).
  - Viewed Diffusion, saw appropriate NUX / "overheated" UIs.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20294
2019-03-19 13:02:59 -07:00
epriestley
8449c1793a Convert complex query subclasses to use internal cursors
Summary:
Depends on D20292. Ref T13259. This converts the rest of the `getPagingValueMap()` callsites to operate on internal cursors instead.

These are pretty one-off for the most part, so I'll annotate them inline.

Test Plan:
  - Grouped tasks by project, sorted by title, paged through them, saw consistent outcomes.
  - Queried edges with "edge.search", paged through them using the "after" cursor.
  - Poked around the other stuff without catching any brokenness.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20293
2019-03-19 13:02:16 -07:00
epriestley
d4847c3eeb Convert simple query subclasses to use internal cursors
Summary:
Depends on D20291. Ref T13259. Move all the simple cases (where paging depends only on the partial object and does not depend on keys) to a simple wrapper.

This leaves a smaller set of more complex cases where we care about external data or which keys were requested that I'll convert in followups.

Test Plan: Poked at things, but a lot of stuff is still broken until everything is converted.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20292
2019-03-19 13:00:27 -07:00
epriestley
1b0ef43910 Separate internal and external Query Cursors more cleanly, to fix pagination against broken objects
Summary:
Ref T13259.

(NOTE) This is "infrastructure/guts" only and breaks some stuff in Query subclasses. I'll fix that stuff in a followup, it's just going to be a larger diff that's mostly mechanical.

When a user clicks "Next Page" on a tasks view and gets `?after=100`, we want to show them the next 100 //visible// tasks. It's possible that tasks 1-100 are visible, but tasks 101-788 are not, and the next visible task is 789.

We load task ID `100` first, to make sure they can actually see it: you aren't allowed to page based on objects you can't see. If we let you, you could use "order=title&after=100", plus creative retitling of tasks, to discover the title of task 100: create tasks named "A", "B", etc., and see which one is returned first "after" task 100. If it's "D", you know task 100 must start with "C".

Assume the user can see task 100. We run a query like `id > 100` to get the next 100 tasks.

However, it's possible that few (or none) of these tasks can be seen. If the next visible task is 789, none of the tasks in the next page of results will survive policy filtering.

So, for queries after the initial query, we need to be able to page based on tasks that the user can not see: we want to be able to issue `id > 100`, then `id > 200`, and so on, until we overheat or find a page of results (if 789-889 are visible, we'll make it there before overheating).

Currently, we do this in a not-so-great way:

  - We pass the external cursor (`100`) directly to the subquery.
  - We query for that object using `getPagingViewer()`, which is a piece of magic that returns the real viewer on the first page and the omnipotent viewer on the 2nd..nth page. This is very sketchy.
  - The subquery builds paging values based on that object (`array('id' => 100)`).
  - We turn the last result from the subquery back into an external cursor (`200`) and save it for the next time.

Note that the last step happens BEFORE policy (and other) filtering.

The problems with this are:

  - The phantom-schrodinger's-omnipotent-viewer thing isn't explicity bad, but it's sketchy and generally not good. It feels like it could easily lead to a mistake or bug eventually.
  - We issue an extra query each time we page results, to convert the external cursor back into a map (`100`, `200`, `300`, etc).
  - In T13259, there's a new problem: this only works if the object is filtered out for policy reasons and the omnipotent viewer can still see it. It doesn't work if the object is filtered for some other reason.

To expand on the third point: in T13259, we hit a case where 100+ consecutive objects are broken (they point to a nonexistent `repositoryID`). These objects get filtered unconditionally. It doesn't matter if the viewer is omnipotent or not.

In that case: we set the next external cursor from the raw results (e.g., `200`). Then we try to load it (using the omnipotent viewer) to turn it into a map of values for paging. This fails because the object isn't loadable, even as the omnipotent viewer.

---

To fix this stuff, the new approach steps back a little bit. Primarily, I'm separating "external cursors" from "internal cursors".

An "External Cursor" is a string that we can pass in `?after=X` URIs. It generally identifies an object which the user can see.

An "Internal Cursor" is a raw result from `loadPage()`, i.e. before policy filtering. Usually, (but not always) this is a `LiskDAO` object that doesn't have anything attached yet and hasn't been policy filtered.

We now do this, broadly:

  - Convert the external cursor to an internal cursor.
  - Execute the query using internal cursors.
  - If necessary, convert the last visible result back into an external cursor at the very end.

This fixes all the problems:

  - Sketchy Omnipotent Viewer: We no longer ever use an omnipotent viewer. (We pick cursors out of the result set earlier, instead.)
  - Too Many Queries: We only issue one query at the beginning, when going from "external" to "internal". This query is generally unavoidable since we need to make sure the viewer can see the object and that it's a real / legitimate object. We no longer have to query an extra time for each page.
  - Total Failure on Invalid Objects: we now page directly with objects out of `loadPage()`, before any filtering, so we can page over invisible or invalid objects without issues.

This change switches us over to internal/external cursors, and makes simple cases (ID-based ordering) work correctly. It doesn't work for complex cases yet since subclasses don't know how to get paging values out of an internal cursor yet. I'll update those in a followup.

Test Plan: For now, poked around a bit. Some stuff is broken, but normal ID-based lists load correctly and page properly. See next diff for a more detailed test plan.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20291
2019-03-19 12:59:58 -07:00
epriestley
27ea775fda Fix a log warning when searching for ranges on custom "Date" fields
Summary:
See <https://discourse.phabricator-community.org/t/traceback-rendering-task-query-in-dashboard/2450/>.

It looks like this blames to D19126, which added some more complex constraint logic but overlooked "range" constraints, which are handled separately.

Test Plan:
  - Added a custom "date" field to Maniphest with `"search": true`.
  - Executed a range query against the field.

Then:

  - Before: Warnings about undefined indexes in the log.
  - After: No such warnings.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: jbrownEP

Differential Revision: https://secure.phabricator.com/D20225
2019-02-28 19:49:18 -08:00
epriestley
03ac59a877 Don't put "spacePHID IN (...)" constraints in queries which will raise policy exceptions
Summary:
See T13240. Ref T13242. When we're issuing a query that will raise policy exceptions (i.e., give the user a "You Shall Not Pass" dialog if they can not see objects it loads), don't do space filtering in MySQL: when objects are filtered out in MySQL, we can't distinguish between "bad/invalid ID/object" and "policy filter", so we can't raise a policy exception.

This leads to cases where viewing an object shows "You Shall Not Pass" if you can't see it for any non-Spaces reason, but "404" if the reason is Spaces.

There's no product reason for this, it's just that `spacePHID IN (...)` is important for non-policy-raising queries (like a list of tasks) to reduce how much application filtering we need to do.

Test Plan:
Before:

```
$ git pull
phabricator-ssh-exec: No repository "spellbook" exists!
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
```

After:

```
$ git pull
phabricator-ssh-exec: [You Shall Not Pass: Unknown Object (Repository)] This object is in a space you do not have permission to access.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
```

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13242

Differential Revision: https://secure.phabricator.com/D20042
2019-01-28 09:03:32 -08:00
epriestley
933462b487 Continue cleaning up queries in the wake of changes to "%Q"
Summary: Depends on D19810. Ref T13217. Ref T13216. I mostly used `grep implode | grep OR` and `grep implode | grep AND` to find these -- not totally exhaustive but should be a big chunk of the callsites that are missing `%LO` / `%LA`.

Test Plan:
These are tricky to test exhaustively, but I made an attempt to hit most of them:

- Browsed Almanac interfaces.
- Created/browsed Calendar events.
- Enabled/disabled/showed the lock log.
- Browsed repositories.
- Loaded Facts UI.
- Poked at Multimeter.
- Used typeahead for users and projects.
- Browsed Phriction.
- Ran various fulltext searches.

Not sure these are reachable:

- All the lint stuff might be dead/unreachable/nonfunctional?

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim

Maniphest Tasks: T13217, T13216

Differential Revision: https://secure.phabricator.com/D19814
2018-11-16 12:49:44 -08:00
epriestley
86fd204148 Fix all query warnings in "arc unit --everything"
Summary:
Ref T13216. Ref T13217. Depends on D19800. This fixes all of the remaining query warnings that pop up when you run "arc unit --everything".

There's likely still quite a bit of stuff lurking around, but hopefully this covers a big set of the most common queries.

Test Plan: Ran `arc unit --everything`. Before change: lots of query warnings. After change: no query warnings.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217, T13216

Differential Revision: https://secure.phabricator.com/D19801
2018-11-15 03:51:25 -08:00
epriestley
2f10d4adeb Continue making application fixes to Phabricator for changes to %Q semantics
Summary: Depends on D19789. Ref T13217. Continue updating things to use the new %Q-flavored conversions instead of smushing a bunch of strings together.

Test Plan: Browsed around, far fewer errors. These changes are largely mechanical in nature.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19790
2018-11-15 03:50:02 -08:00
epriestley
98690ee326 Update many Phabricator queries for new %Q query semantics
Summary: Depends on D19785. Ref T13217. This converts many of the most common clause construction pathways to the new %Q / %LQ / %LO / %LA / %LJ semantics.

Test Plan: Browsed around a bunch, saw fewer warnings and no obvious behavioral errors. The transformations here are generally mechanical (although I did them by hand).

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: hach-que

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19789
2018-11-15 03:48:10 -08:00
epriestley
64b52b9952 Make SELECT construction in PolicyAwareQuery safer
Summary: Depends on D19784. Ref T13217. Reduce uses of unsafe `%Q` in SELECT construction.

Test Plan: This reduces the number of safety warnings when loading Phabricator home from ~900 to ~800.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13217

Differential Revision: https://secure.phabricator.com/D19785
2018-11-14 15:32:09 -08:00
epriestley
71d4fa41c9 Support the Ferret "=" (exact match) operator in the actual query engine
Summary:
Ref PHI778. In D18492, I added support for parsing this operator, but did not actually implement it in the query engine.

Implementation is fairly straightforward. This supports querying for objects by exact title with `title:="exact title"`. This is probably a bad idea, but sometimes maybe useful anyway.

Test Plan: Queried for `title:="xxx"`, found only exact matches.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: ahoffer2

Differential Revision: https://secure.phabricator.com/D19529
2018-07-23 12:44:00 -07:00
epriestley
d0591f3680 Support some QueryConstraint operations against generic ApplicationSearch query logic
Summary: Ref T13090. Currently, it isn't possible to query custom fields for complex constraints. Lay the groundwork to implement some of the easy ones (none(), any()) for Datasource/PHID fields.

Test Plan: Hard-coded some constraints and queried with them; see next change for more substantial testing.

Maniphest Tasks: T13090

Differential Revision: https://secure.phabricator.com/D19126
2018-02-22 12:49:49 -08:00
epriestley
1d213dc1fa Clean up virtual "_ft_rank" column for query construction of Ferret objects
Summary:
Ref T12974. Ferret object queries SELECT a virtual "_ft_rank" column for relevance ordering.

Currently, they always SELECT this column. That's fine and doesn't hurt anything, but makes developing and debugging things kind of a pain since every query has this `, blah blah _ft_rank` junk.

Instead, construct this column only if we're actually going to use it.

Mostly, this cleans up DarkConsole / query logs a bit.

Test Plan:
Viewed normal query results on various pages, viewed global search results, ordered Maniphest tasks by normal stuff and by "Relevance".

Viewed DarkConsole, saw no more "_ft_rank" junk on normal pages.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12974

Differential Revision: https://secure.phabricator.com/D18728
2017-10-23 16:18:04 -07:00
epriestley
157f47cd14 Rewrite CommitQuery to use UNION for performance
Summary:
Ref T12680. See PHI167. See that task for discussion.

Rewrite `DiffusionCommitQuery` to work more like `DifferentialRevisionQuery`, and use a UNION to find "all revisions you need to audit OR respond to".

I tried to get this working a little more cleanly than RevisionQuery does, and can probably simplify that now.

Test Plan: Poked at the UI locally without hitting any apparent issues, but my local data is pretty garbage at this point. I'll take a look at how the query plans work on `secure`.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T12680

Differential Revision: https://secure.phabricator.com/D18722
2017-10-23 10:32:24 -07:00
Dmitri Iouchtchenko
9bd6a37055 Fix spelling
Summary: Noticed a couple of typos in the docs, and then things got out of hand.

Test Plan:
  - Stared at the words until my eyes watered and the letters began to swim on the screen.
  - Consulted a dictionary.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D18693
2017-10-09 10:48:04 -07:00
epriestley
1de130c9f5 Allow the Ferret engine to remove "common" ngrams from the index
Summary:
Ref T13000. This adds support for tracking "common" ngrams, which occur in too many documents to be useful as part of the ngram index.

If an ngram is listed in the "common" table, it won't be written when indexing documents, or queried for when searching for them.

In this change, nothing actually writes to the "common" table. I'll start writing to the table in a followup change.

Specifically, I plan to do this:

  - A new GC process updates the "common" table periodically, by writing ngrams which appear in more than X% of documents to it, for some value of X, if there are at least a minimum number of documents (maybe like 4,000).
  - A new GC process deletes ngrams that have been added to the common table from the existing indexes.

Hopefully, this will pare down the ngrams index to something reasonable over time without requiring any manual tuning.

Test Plan:
  - Ran some queries and indexes.
  - Manually inserted ngrams `xxx` and `yyy` into the ngrams table, searched and indexed, saw them ignored as viable ngrams for search/index.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13000

Differential Revision: https://secure.phabricator.com/D18672
2017-10-03 13:27:42 -07:00
epriestley
94ab0c9afb Spell "Relevance" correctly
Summary: Despite how I (and everyone else?) pronounce it, it is spelled with an "a". See PHI38.

Test Plan: Googled both spellings.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D18622
2017-09-18 09:36:55 -07:00
epriestley
fdc0d8c2f6 Fix an issue with selecting the right stemmed ngrams with Ferret engine queries
Summary:
Ref T12819. In D18581, I corrected one bug (ngram selection for terms) but introduced a minor new bug. We now pass `' query '` (term corpus with boundary spaces) to the stemmer, but it bails out on this since English words don't start with spaces.

Trim these extra boundary spaces off before invoking the stemmer.

The practical effect of this is that searching for non-stem variations of a word ("detection") now finds stemmed variations again ("detect"). Prior to fixing this bug, the stem could find longer variations but not the other way around.

Test Plan: Searched for "detection", found results matching "detect" after patch (and saw same results for "detect" and "detection").

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18593
2017-09-12 12:13:42 -07:00
epriestley
39b74572e6 Return fulltext tokens from the Ferret fulltext engine
Summary:
Ref T12819. These render the little "Searched For: X, Y, U V" hint about how something was parsed.

(This might get a "substring" color or "title only" color or something in the future.)

Test Plan: {F5178807}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18589
2017-09-11 18:04:56 -07:00
epriestley
c662dda0f1 When selecting Ferret ngrams, select term ngrams (not raw ngrams) for term search
Summary:
Ref T12819. For queries like `v0.2`, we would incorrectly search for ngrams including `0.2`, but this is only a substring ngram: the term corpus splits this into `v0` and `2`, so `0.2` is not in the ngrams table.

When executing term queries, search for term ngrams instead. This makes "v0.2" work properly again.

Test Plan: Searched for "v0.2", found a task with "v0.2" in the title.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18581
2017-09-08 09:47:58 -07:00
epriestley
2218caee0f Reduce the amount of boilerplate that implementing FerretInterface requires
Summary:
See brief discussion in D18554. All the index tables are the same for every application (and, at this point, seem unlikely to change) and we never actually pass these objects around (they're only used internally).

In some other cases (like Transactions) not every application has the same tables (for example, Differential has extra field for inline comments), and/or we pass the objects around (lots of stuff uses `$xactions` directly).

However, in this case, and in Edges, we don't interact with any representation of the database state directly in much of the code, and it doesn't change from application to application.

Just automatically define document, field, and ngram tables for anything which implements `FerretInterface`. This makes the query and index logic a tiny bit messier but lets us delete a ton of boilerplate classes.

Test Plan: Indexed objects, searched for objects. Same results as before with much less code. Ran `bin/storage upgrade`, got a clean bill of health.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18559
2017-09-07 13:23:31 -07:00
epriestley
a2a2b3f7f4 Sort global fulltext results by overall relevance
Summary:
Ref T12819. Currently, under the Ferret engine, we query each application's index separately and then aggregate the results.

At the moment, results are aggregated by type first, then by actual rank. For example, all the revisions appear first, then all the tasks.

Instead, surface the internal ranking data from the underlying query and sort by it.

Test Plan: Searched for "A B" with a task named "A B" and a revision named "A". Saw task first. Broadly, saw mixed task and revision order in result sets.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18551
2017-09-07 13:21:58 -07:00
epriestley
8059db894d Use the Ferret engine fulltext document table to drive auxiliary fulltext constraints
Summary:
Ref T12819. I started trying to get individual engines to drive these constraints (e.g., `ManiphestTaskQuery` can do most of the work) but this is a big pain, especially since most engines don't support "any owner" or "no owner", and not everything has an owner, and so on and so on. Going down this path would have meant a huge pile of stub functions everywhere, I think.

Instead, drive these through the main engine using the fulltext document table, which already has everything we need to apply these constraints in a uniform way.

Also tweak some parts of query construction and result ordering.

Test Plan: Searched for documents by author, owner, unowned, any owner, tags, subscribers, fulltext in global search. Got sensible results without any application-specific code.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18550
2017-09-07 13:21:42 -07:00
epriestley
395a2ed6d1 Add an "only()" edge logic constraint, meaning "only the other constraints, exactly"
Summary:
See PHI57. For example, a query for "ios, only()" finds tags tasked with iOS, exactly, and no other tags.

I called this "only()" instead of "exact()" because we use the term/function "Exact" elsewhere with a different meaning, e.g. in Differential.

Test Plan:
Basic query for a tag:

{F5168857}

Same query with "only", finds tasks tagged with only that tag:

{F5168858}

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D18543
2017-09-06 12:16:06 -07:00
epriestley
64b7778f32 Add support for relevance-ranking Ferret engine results
Summary: Ref T12819. "Relevance" here just means "how many of your search terms are present in the title?" but that's about the best we can do anyway.

Test Plan: Indexed tasks "A B", "A Z", "Z B", and "Z Z" (all with "A B" in comments). Searched for "A B". Got results ranked in the listed order, with "A B" as the most relevant hit for query "A B".

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18539
2017-09-05 16:45:20 -07:00
epriestley
20aad35e60 Move Ferret engine "title:..." field definitions to the engine itself
Summary: Ref T12819. Move these out of the core engine into the Ferret engine. In the future different applications can define different functions, like "summary:..." or whatever. This may get more formalization when I possibly do "author:" and such some time down the road.

Test Plan: Searched for "title:...". Searched for "dog:...", got a useful error.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18536
2017-09-05 11:57:51 -07:00
epriestley
46abc11114 Reduce the number of magic strings in the Ferret implementation
Summary:
Ref T12819. Push more of the magic `' '` stuff into the engine and simplify calls to ngram construction.

Also fixes a bug where a task with title "apple banana" and description "cherry doughnut" could match query "banana cherry" by separating separate term segments with newlines instead of spaces.

Test Plan:
  - Indexed some objects.
  - Searched (term, substring, quoted terms).
  - Viewed index in database.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18534
2017-09-05 11:57:35 -07:00
epriestley
4a7593f47f Consolidate more Ferret engine code into FerretEngine
Summary: Ref T12819. Earlier I separated some ngram code into an "ngram engine" hoping to share it across the simple Ngrams stuff and the full Ferret stuff, but they actually use slightly different rules. Just pull more of this stuff into FerretEngine to reduce the number of moving pieces and the amount of code duplication.

Test Plan: Searched for terms, rebuilt indexes.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18533
2017-09-05 11:57:18 -07:00
epriestley
577d498033 Create a virtual "core" field in the Ferret engine for "title and body together"
Summary: See PHI46. The `core:` function means "find results in either the title or body, but not other auxiliary fields like comments".

Test Plan: Searched for text present in the title (yes), body (yes), and comments (no) with the `core:...` prefix.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D18514
2017-09-01 09:40:56 -07:00
epriestley
3b43a70773 Add "title:..." support to the Ferret engine
Summary:
Ref T12819. Adds (hacky, hard-coded) field support (for now, only for "title").

I've written this so `title:quick ferret` is the same as `title:quick title:ferret`. I think this is what users probably mean.

You can do the other thing as `ferret title:quick`, or `title:quick all:ferret`.

Test Plan: Searched for `title:x`, `title:"x"`, `title:~"x"`, etc. Searched for "garbage:y", got an exception since that's not a recognized function. Searched for `title:x y`, saw both do title search.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18503
2017-08-30 11:30:42 -07:00
epriestley
048aa36c23 Support "-term" in Ferret engine queries
Summary:
Ref T12819. Supports negating search terms, e.g. "apple -honeycrisp".

When negating a term, we're a little more strict about what can match (that is, what can //prevent// a document from being returned) since it's easy for a user to type "apple -honeycrisp -honey -crisp -crispies -olcrispers -honeyyums" to keep refining their search, but hard/impossible to split apart an overboard term.

Test Plan:
  - Ran `apple -smith`, `apple -"granny smith"`, etc.
  - Verified `phone -tact` does not exclude `phone contact`.
  - (In theory, `phone -~tact` would, but the parser currently doesn't support this, and I'm not champing at the bit to add support.)

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18502
2017-08-30 11:30:24 -07:00
epriestley
df9c24e750 Provide some "term vs substring" support for the Ferret engine
Summary:
Ref T12819. Distinguishes between "term" queries and "substring" queries, and tries to match them correctly most of the time. For example:

  - `example` matches "example", obviously.
  - `~amp` matches "example", but `amp` does not.
  - `examples` matches "example" through stemming.
  - `"examples"` does not match "example" (quoted text does not stem).
  - `"an examp"` does not match "an example" (quoted text is still term text).
  - `~"an examp"` matches "an example" (quoted, substring-operator text uses substring search).

Test Plan: Ran searches similar to the above, they seemed to do what they should.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18500
2017-08-30 11:30:04 -07:00
epriestley
e5a495f435 Parse raw Ferret queries into tokens before processing them
Summary:
Ref T12819. Depends on D18492. Instead of passing a raw query into the Query layer, parse it first.

This allows the query layer to figure out which parts should be substring vs term match, and would allow the SearchEngine layer to do `author:...` eventually by picking it out before sending it to the Ferret engine.

Test Plan: Ran some Ferret queries. They work like before, except that nonsense like `-+"quack"` raises an exception now.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18499
2017-08-30 11:29:46 -07:00