Summary:
Ref T12470. Provides an "integrity" utility which runs in these modes:
- Verify: check that hashes match.
- Compute: backfill missing hashes.
- Strip: remove hashes. Useful for upgrading across a hash change.
- Corrupt: intentionally corrupt hashes. Useful for debugging.
- Overwrite: force hash recomputation.
Users normally shouldn't need to run any of this stuff, but this provides a reasonable toolkit for managing integrity hashes.
I'll recommend existing installs use `bin/files integrity --compute all` in the upgrade guidance to backfill hashes for existing files.
Test Plan:
- Ran the script in many modes against various files, saw expected operation, including:
- Verified a file, corrupted it, saw it fail.
- Verified a file, stripped it, saw it have no hash.
- Stripped a file, computed it, got a clean verify.
- Stripped a file, overwrote it, got a clean verify.
- Corrupted a file, overwrote it, got a clean verify.
- Overwrote a file, overwrote again, got a no-op.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12470
Differential Revision: https://secure.phabricator.com/D17629
Summary:
Ref T12298. The PullLocal daemon has had hibernation code for a little while, but it never actually activated because we don't sleep for more than 15 seconds in any case.
Add a maximum sleep instead and use that to control the longest sleep we'll do for hibernation purposes.
Also, when a repository or repository URI is edited, write a NEEDS_UPDATE event into the message table to make sure the daemons de-hibernate.
Test Plan: Used `bin/phd debug pull`, saw the daemon actually hibernate instead of just sleeping for 15 seconds.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12298
Differential Revision: https://secure.phabricator.com/D17635
Summary:
Ref T12272. I wrote this correctly, then broke it by adding the simplification which treats "accept the defaults" as "accept everything".
This simplification lets us render "epriestley accepted this revision." instead of "epriestley accepted this revision onbehalf of: long, list, of, every, default, reviewer, they, have, authority, over." so it's a good thing, but make it only affect the reviewers it's supposed to affect.
Test Plan:
- Did an accept with a force-accept available but unchecked.
- Before patch: incorrectly accepted all possible reviewers.
- After patch: accepted only checked reviewers.
- Also checked the force-accept box, accepted, got a proper force-accept.
Reviewers: chad, lvital
Reviewed By: lvital
Maniphest Tasks: T12272
Differential Revision: https://secure.phabricator.com/D17634
Summary: Allow API callers to retrieve reviewer information via a new "reviewers" attachment.
Test Plan: {F4675784}
Reviewers: chad, lvital
Reviewed By: lvital
Subscribers: lvital
Differential Revision: https://secure.phabricator.com/D17633
Summary: Fixes T12508. Files don't have an `editPolicy`, and we started actually checking that the keys are real things in D17616.
Test Plan:
- Before patch: created a paste, got an "editPolicy" exception.
- After patch: created a paste that worked properly.
Reviewers: avivey, chad
Reviewed By: avivey
Maniphest Tasks: T12508
Differential Revision: https://secure.phabricator.com/D17628
Summary: Ref T12219. Chrome can send requests with a "Range: bytes=0-" header, which just means "the whole file", but we don't respond correctly because of a `null` vs `0` issue.
Test Plan: Sent a raw `bytes=0-` request, saw a proper resonse.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12219
Differential Revision: https://secure.phabricator.com/D17627
Summary:
Ref T12219. We currently only support Range requests like "bytes=123-456", but "bytes=123-", meaning "until end of file", is valid, and Chrome can send these requests.
I suspect this is the issue with T12219.
Test Plan: Used `nc local.phacility.com 80` to pipe raw requests, saw both "bytes=123-456" and "bytes=123-" requests satisfied correctly.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12219
Differential Revision: https://secure.phabricator.com/D17626
Summary:
Ref T12470. This helps defuse attacks where an adversary can directly take control of whatever storage engine files are being stored in and change data there. These attacks would require a significant level of access.
Such attackers could potentially attack ranges of AES-256-CBC encrypted files by using Phabricator as a decryption oracle if they were also able to compromise a Phabricator account with read access to the files.
By storing a hash of the data (and, in the case of AES-256-CBC files, the IV) when we write files, and verifying it before we decrypt or read them, we can detect and prevent this kind of tampering.
This also helps detect mundane corruption and integrity issues.
Test Plan:
- Added unit tests.
- Uploaded new files, saw them get integrity hashes.
- Manually corrupted file data, saw it fail. Used `bin/files cat --salvage` to read it anyway.
- Tampered with IVs, saw integrity failures.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12470
Differential Revision: https://secure.phabricator.com/D17625
Summary: Ref T8266. Although we compute this correctly above, we ignored it when actually setting the header. Use the computed value to set the "Content-Length" header. This is consistent with the spec/documentation.
Test Plan: Before, some audio (like `rain.mp3`) was pretty spotty about loading in Safari. It now loads consistently for me locally.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T8266
Differential Revision: https://secure.phabricator.com/D17624
Summary:
Fixes T12079. Currently, when a file is encrypted and a request has "Content-Range", we apply the range first, //then// decrypt the result. This doesn't work since you can't start decrypting something from somewhere in the middle (at least, not with our cipher selection).
Instead: decrypt the result, //then// apply the range.
Test Plan: Added failing unit tests, made them pass
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12079
Differential Revision: https://secure.phabricator.com/D17623
The root issue here is actually just that I cherry-picked stable locally
but did not push it. However, this is a minor issue I also caught while
double-checking things.
Auditors: chad
Summary:
Ref T12464. This defuses any possible SHA1-collision attacks by using SHA256, for which there is no known collision.
(SHA256 hashes are larger -- 256 bits -- so expand the storage column to 64 bytes to hold them.)
Test Plan:
- Uploaded the same file twice, saw the two files generate the same SHA256 content hash and use the same underlying data.
- Tried with a fake hash algorihtm ("quackxyz") to make sure the failure mode worked/degraded correctly if we don't have SHA256 for some reason. Got two valid files with two copies of the same data, as expected.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12464
Differential Revision: https://secure.phabricator.com/D17620
Summary:
Ref T12464. We currently use SHA1 to detect when two files have the same content so we don't have to store two copies of the data.
Now that a SHA1 collision is known, this is theoretically dangerous. T12464 describes the shape of a possible attack.
Before replacing this with something more robust, shore things up so things work correctly if we don't hash at all. This mechanism is entirely optional; it only helps us store less data if some files are duplicates.
(This mechanism is also less important now than it once was, before we added temporary files.)
Test Plan: Uploaded multiple identical files, saw the uploads work and the files store separate copies of the same data.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12464
Differential Revision: https://secure.phabricator.com/D17619
Summary:
Ref T12464. This is a very old method which let you create a file on the server by referring to data which already existed in another file.
Basically, long ago, `arc` could say "Do you already have a file with hash X?" and just skip some work if the server did.
`arc` has not called this method since D13017, in May 2015.
Since it's easy to do so, just make this method pretend that it never has the file. Very old clients will continue to work, since they would expect this response in the common case and continue by uploading data.
Test Plan:
- Grepped for `uploadhash` in Phabricator and Arcanist.
- Called the method with the console, verified it returned `null`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12464
Differential Revision: https://secure.phabricator.com/D17618
Summary:
Ref T12464. This is a very old method which can return an existing file instead of creating a new one, if there's some existing file with the same content.
In the best case this is a bad idea. This being somewhat reasonable predates policies, temporary files, etc. Modern methods like `newFromFileData()` do this right: they share underlying data in storage, but not the actual `File` records.
Specifically, this is the case where we get into trouble:
- I upload a private file with content "X".
- You somehow generate a file with the same content by, say, viewing a raw diff in Differential.
- If the diff had the same content, you get my file, but you don't have permission to see it or whatever so everything breaks and is terrible.
Just get rid of this.
Test Plan:
- Generated an SSH key.
- Viewed a raw diff in Differential.
- (Did not test Phragment.)
Reviewers: chad
Reviewed By: chad
Subscribers: hach-que
Maniphest Tasks: T12464
Differential Revision: https://secure.phabricator.com/D17617
Summary:
Ref T11357. When creating a file, callers can currently specify a `ttl`. However, it isn't unambiguous what you're supposed to pass, and some callers get it wrong.
For example, to mean "this file expires in 60 minutes", you might pass either of these:
- `time() + phutil_units('60 minutes in seconds')`
- `phutil_units('60 minutes in seconds')`
The former means "60 minutes from now". The latter means "1 AM, January 1, 1970". In practice, because the GC normally runs only once every four hours (at least, until recently), and all the bad TTLs are cases where files are normally accessed immediately, these 1970 TTLs didn't cause any real problems.
Split `ttl` into `ttl.relative` and `ttl.absolute`, and make sure the values are sane. Then correct all callers, and simplify out the `time()` calls where possible to make switching to `PhabricatorTime` easier.
Test Plan:
- Generated an SSH keypair.
- Viewed a changeset.
- Viewed a raw diff.
- Viewed a commit's file data.
- Viewed a temporary file's details, saw expiration date and relative time.
- Ran unit tests.
- (Didn't really test Phragment.)
Reviewers: chad
Reviewed By: chad
Subscribers: hach-que
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17616
Summary:
Ref T11357. In D17611, I added `file.search`, which includes a `"dataURI"`. Partly, this is building toward resolving T8348.
However, in some cases you can't GET this URI because of a security measure:
- You have not configured `security.alternate-file-domain`.
- The file isn't web-viewable.
- (The request isn't an LFS request.)
The goal of this security mechanism is just to protect against session hijacking, so it's also safe to disable it if the viewer didn't present any credentials (since that means there's nothing to hijack). Add that exception, and reorganize the code a little bit.
Test Plan:
- From the browser (with a session), tried to GET a binary data file. Got redirected.
- Got a download with POST.
- From the CLI (without a session), tried to GET a binary data file. Go a download.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17613
Summary: Ref T11357. Implements a modern `file.search` for files, and freezes `file.info`.
Test Plan: Ran `file.search` from the Conduit console.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17612
Summary:
Ref T11357. This moves editing and commenting (but not creation) to EditEngine.
Since only the name is really editable, this is pretty straightforward.
Test Plan: Renamed files; commented on files.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17611
Summary: Ref T11357. A lot of file creation doesn't go through transactions, so we only actually have one real transaction type: editing a file name.
Test Plan:
Created and edited files.
{F4559287}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11357
Differential Revision: https://secure.phabricator.com/D17610
Summary:
Fixes T12502. This transaction probably should not be getting picked for feed rendering, but it currently does get selected in some cases.
This should probably be revisited eventually (e.g., when Maniphest moves to ModularTransactions) but just fix the brokenness for now.
Test Plan:
- Created a task in a space.
- Viewed feed.
- Saw the story render with readable text.
{F4555747}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12502
Differential Revision: https://secure.phabricator.com/D17609
Summary:
Fixes T12496. Sticky accept was accidentally impacted by the "void" changes in D17566.
Instead, don't always downgrade all accepts/rejects: on update, we only want to downgrade accepts.
Test Plan:
- With sticky accept off, updated an accepted revision: new state is "needs review".
- With sticky accept on, updated an accepted revision: new state is "accepted" (sticky accept working correctly).
- Did "reject" + "request review" to make sure that still works, worked fine.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12496
Differential Revision: https://secure.phabricator.com/D17605
Summary:
Fixes T12461. This returns the field as a dictionary with a `"raw"` value, so we could eventually do this if we want without breaking the API:
```
{
"type": "remarkup",
"raw": "**raw**",
"html": "<strong>raw</strong>",
"text": "raw"
}
```
Test Plan: Called `maniphest.search`, reviewed output.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12461
Differential Revision: https://secure.phabricator.com/D17603
Summary: Ref T12450. These are like 95% my fault, but Elastic appears to spell the name "Elasticsearch" consistently in their branding.
Test Plan: `grep ElasticSearch`
Reviewers: chad, 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17601
Summary:
Ref T12450. Currently, if a write fails, we stop and don't try to write to other index services. There's no technical reason not to keep trying writes, it makes some testing easier, and it would improve behavior in a scenario where engines are configured as "primary" and "backup" and the primary service is having some issues.
Also, make "no writable services are configured" acceptable, rather than an error. This state is probably goofy but if we want to detect it I think it should probably be a config-validation issue, not a write-time check. I also think it's not totally unreasonable to want to just turn off all writes for a while (maybe to reduce load while you're doing a background update).
Test Plan:
- Configured a bad ElasticSearch engine and a good MySQL engine.
- Ran `bin/search index ... --force`.
- Saw MySQL get updated even though ElasticSearch failed.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17599
Summary:
Ref T12450. We track a "document version" for updating search indexes, so that if a document is rapidly updated many times in a row we can skip most of the work.
However, this version doesn't consider "cluster.search" configuration, so if you add a new service (like a new ElasticSearch host) we still think that every document is up-to-date. When you run `bin/search index` to populate the index (without `--force`), we just do nothing.
This isn't necessarily very obvious. D17597 makes it more clear, by printing "everything was skipped and nothing happened" at the end.
Here, fix the issue by considering the content of "cluster.search" when computing fulltext document versions: if you change `cluster.search`, we throw away the version index and reindex everything.
This is slightly more work than we need to do, but changes to "cluster.search" are rare and this is much easier than trying to individually track which versions of which documents are in which services, which probably isn't very useful anyway.
Test Plan:
- Ran `bin/search index --type project`, saw everything get skipped.
- Changed `cluster.search`.
- Ran `search index` again, saw everything get updated.
- Ran a third time without changing `cluster.search`, everything was properly skipped.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17598
Summary:
Ref T12450. There's currently a bad behavior where inserting a document into one search service marks it as up to date everywhere.
This isn't nearly as obvious as it should be because `bin/search index` doesn't make it terribly clear when a document was skipped because the index version was already up to date.
When running `bin/seach index` without `--force` or `--background`, keep track of updated vs not-updated documents and print out some guidance. In other configurations, try to provide more help too.
Test Plan: {F4452134}
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17597
Summary:
Ref T12450. This was added a very very long time ago (D2298).
I don't want to put this in the upstream index anymore because I don't want to encourage third parties to develop software which reads the index directly. Reading the index directly is a big skeleton key which bypasses policy checks.
This was added before much of the policy model existed, when that wasn't as much of a concern. On a tecnhnical note, this also doesn't update when `phabricator.base-uri` changes.
This can be written as a search index extension if an install relies on it for some bizarre reason, although none should and I'm unaware of any actual use cases in the wild for it, even at Facebook.
Test Plan: Indexed some random stuff into ElasticSearch.
Reviewers: chad, 20after4
Reviewed By: chad
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17600
Summary:
Ref T12450. General adjustments:
- Try to make "Cluster: Search" more about "stuff in common + types" instead of pretty much all being Elastic-specific, so we can add Solr or whatever later.
- Provide guidance about rebuilding indexes after making a change.
- Simplify the basic examples, then provide a more advanced example at the ed.
- Really try to avoid suggesting anyone configure Elasticsearch ever for any reason.
Test Plan: Read documents, previewed in remarkup.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17602
Summary:
D17384 added a "keywords" field but only partially implemented it.
- Remove this field.
- Index project slugs as part of the document body instead.
Test Plan:
- Ran `bin/search index PHID-PROJ-... --force`.
- Found project by searching for a unique slug.
Reviewers: chad, 20after4
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D17596
Summary:
If you have `maniphest.custom-field-definitions` set to include "required" fields, a bunch of tests which create tasks can fail.
To avoid this, reset this config while running tests.
This mechanism should probably be more general (e.g., reset all config by default, only whitelist some config) but just fix this for now since it's a one-liner and doesn't make eventual cleanup any harder.
Test Plan: Ran `arc unit`, hitting tests that create tasks.
Reviewers: chad, 20after4
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D17595
Summary: When building a tokenizer-based edit control for a custom field (e.g. a datasource type), preserve a field validation error whilst building edit controls.
Test Plan:
- Create custom datasource field, set it to required
- Observe that 'Required' does not appear next to control
- Apply patch
- Observe 'Required' appears next to control
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D17592
Summary: Fixes T11630. Not sure what the max-width fixes, but I don't see anything off on various mobile, desktop.
Test Plan: Enable filetree in differential, drag navigation all over, see normal width calculations.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T11630
Differential Revision: https://secure.phabricator.com/D17591
Summary: Minor, uses 'user-circle' for account, and merchant logo for merchants in lists.
Test Plan: View the landing page, see updated logos and icons.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D17586
Summary: Move individual controller files into cooresponding folders. Makes it easier to locate sections and expand without clutter. Also made "chargelist" part of account since it's tied to having an account specifically.
Test Plan: Vist charges, merchants, subscription, accounts, and other pages. No errors from file move.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D17587
Summary:
Fixes T12479. If you end a line with a character like ":" in a context which can trigger autocomplete (e.g., `.:`), then try to make a newline, we swallow the keystroke.
Instead, allow the keystroke through if the user hasn't typed anything else yet.
Test Plan:
- Autocompleted emoji and users normally.
- In an empty textarea, typed `.:<return>`, got a newline instead of a swallowed keystroke.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12479
Differential Revision: https://secure.phabricator.com/D17583
Summary:
Two little issues
1. there was an extra call to getHostForWrite,
2. The engine instance was shared between multiple service definitions so it
was overwriting the list of writable hosts from one service with hosts from another.
Test Plan:
tested in wikimedia production with multiple services defined like this:
```language=json
[
{
"hosts": [
{
"host": "search.svc.codfw.wmnet",
"protocol": "https",
"roles": {
"read": true,
"write": true
},
"version": 5
}
],
"path": "/phabricator",
"port": 9243,
"type": "elasticsearch"
},
{
"hosts": [
{
"host": "search.svc.eqiad.wmnet",
"protocol": "https",
"roles": {
"read": true,
"write": true
},
"version": 5
}
],
"path": "/phabricator",
"port": 9243,
"type": "elasticsearch"
}
]
```
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D17581
Summary:
Elasticsearch really wants a raw json body and it fails to accept
the request as of es version 5.3
Test Plan:
Tested with elasticsearch 5.2 and 5.3.
Before this change 5.2 worked but 5.3 failed with
`HTTP/406 "Content-Type header [application/x-www-form-urlencoded] is not supported"` [1]
After this change, both worked.
[1] https://phabricator.wikimedia.org/P5158
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D17580
Summary:
These exception messages & comments didn't quite match reality.
Fixed and added pht() around a couple of them.
Test Plan: I didn't test this :P
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D17578
Summary:
Fixes T12460. Also ":)", ":(", ":/", and oldschool ":-)" variants.
Not included are variants with actual letters (`:D`, `:O`, `:P`) and obscure variants (`:^)`, `:*)`).
Test Plan: Typed `:3` (no emoji summoned). Typed `:dog3` (emoji summoned). Typed `@3` (user autocomplete summoned).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12460
Differential Revision: https://secure.phabricator.com/D17577
Summary:
Ref T12450. The way that config repair and setup issues interact is kind of complicated, and if `cluster.search` is invalid we may end up using `cluster.search` before we repair it.
I poked at things for a bit but wasn't confident I could get it to consistently repair before we use it without doing a big messy change.
The only thing that really matters is whether "type" is valid or not, so just put a slightly softer/more-tailored check in for that.
Test Plan:
- With `"type": "elastic"`, loaded setup issues.
- Before patch: hard fatal.
- After patch: softer fatal with more useful messaging.
{F4321048}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17576
Summary:
Ref T12450. Normally, we validate config when:
- You restart the webserver.
- You edit it with `bin/config set ...`.
- You edit it with the web UI.
However, you can also change config by editing `local.json`, `some_env.conf.php`, a `SiteConfig` class, etc. In these cases, you may miss config warnings.
Explicitly re-run search config checks from `bin/search`, similar to the additional database checks we run from `bin/storage`, to try to produce a better error message if the user has made a configuration error.
Test Plan:
```
$ ./bin/search init
Usage Exception: Setting "cluster.search" is misconfigured: Invalid search engine type: elastic. Valid types are: elasticsearch, mysql.
```
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17574
Summary:
Ref T12450. Minor cleanup:
- setRoles() has no callers.
- getRoles() has no callers (these two methods are leftovers from an earlier iteration of the change).
- The `hasRole()` logic doesn't work since nothing calls `setRole()`.
- `hasRole()` has only `isreadable/iswritable` as callers.
- The `isReadable()/isWritable()` logic doesn't work since `hasRole()` doesn't work.
Instead, just check if there are any readable/writable hosts. `Host` already inherits its config from `Service` so this gets the same answer without any fuss.
Also add some read/write constants to make grepping this stuff a little easier.
Test Plan:
- Grepped for all removed symbols, saw only newer-generation calls in `Host`.
- See next diff for use of `isWritable()`.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17571
Summary:
Ref T12450. This is now pointless and just asserts that `cluster.search` has a default value.
We might restore a fancier version of this eventually, but get rid of this for now.
Test Plan: Scruitinized the test case.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17573
Summary:
Ref T12450. This mostly just smooths out the text a little to improve consistency. Also:
- Use `isWritable()`.
- Make the "skipping because not writable" message more clear and tailored.
- Try not to use the word "index" too much to avoid confusion with `bin/search index` -- instead, talk about "initialize a service".
Test Plan: Ran `bin/search init` with a couple of different (writable / not writable) configs, saw slightly clearer messaging.
Reviewers: chad, 20after4
Reviewed By: 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17572
Summary:
[ ] Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
[ ] Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
[ ] We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From T9893 it seems like we may //only// have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
[ ] Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically seen users choosing Elastic when they aren't actually trying to solve any specific problem.
[ ] When users search for fulltext results in Maniphest and hit too many documents, the current behavior is approximately silent failure (see T12443). D17384 has also lowered the ceiling for ElasticSearch, although previous changes lowered it for MySQL search. We should not fail silently, and ideally should build toward T12003.
[ ] D17384 added a new "keywords" field, but MySQL does not search it (I think?). The behavior should be as consistent across MySQL and Elastic as we can make it. Likely cleaner is giving "Project" objects a body, with "slugs" and "description" separated by newlines?
[ ] `PhabricatorSearchEngineTestCase` is now pointless and only detects local misconfigurations.
[ ] It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them. The upstream test could run against MySQL, and some `bin/search test` could run against a configured engine like ElasticSearch. This would make it easier to make sure that behavior was as uniform as possible across engine implementations.
[ ] Does every assigned task now match "user" in ElasticSearch?
[x] `PhabricatorElasticFulltextStorageEngine` has a `json_encode()` which should be `phutil_json_encode()`.
[ ] `PhabricatorSearchService` throws an untranslated exception.
[ ] When a search cluster is down, we probably don't degrade with much grace (unhandled exception)?
[ ] I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
[ ] bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
[ ] CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
[ ] It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
[ ] When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
[x] Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.
[x] Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
[x] Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
[x] Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization in ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
[x] The "index incorrect" warning UI uses inconsistent title case.
[x] The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).
refs T12450
Test Plan:
* Stared blankly at the code.
* Disabled 'write' role on mysql fulltext service.
* Edited a task, ran search indexer, verified that the mysql index wasn't being updated.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17564