Summary:
Ref T13249. See <https://discourse.phabricator-community.org/t/configuring-the-number-of-taskmaster-daemons/2394/>.
Today, when a configuration value is "locked", we prevent //writes// to the database. However, we still perform reads. When you upgrade, we generally don't want a bunch of your configuration to change by surprise.
Some day, I'd like to stop reading locked configuration from the database. This would defuse an escalation where an attacker finds a way to write to locked configuration despite safeguards, e.g. through SQL injection or policy bypass. Today, they could write to `cluster.mailers` or similar and substantially escalate access. A better behavior would be to ignore database values for `cluster.mailers` and other locked config, so that these impermissible writes have no effect.
Doing this today would break a lot of installs, but we can warn them about it now and then make the change at a later date.
Test Plan:
- Forced a `phd.taskmasters` config value into the database.
- Saw setup warning.
- Used `bin/config delete --database phd.taskmasters` to clear the warning.
- Reviewed documentation changes.
- Reviewed `phd.taskmasters` documentation adjustment.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13249
Differential Revision: https://secure.phabricator.com/D20159
Summary: Depends on D20114. This is on the way out, so make that explicitly clear.
Test Plan: Read setup issue and configuration option.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D20115
Summary: Depends on D20069. Ref T13232. This is a very, very weak dependency and we can reasonably polyfill it.
Test Plan: Grepped for `iconv` in libphutil, arcanist, and Phabricator.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13232
Differential Revision: https://secure.phabricator.com/D20070
Summary:
Ref T4369. Ref T12297. Ref T13242. Ref PHI1010. I want to take a quick look at `transaction.search` and see if there's anything quick and obvious we can do to improve performance.
On `secure`, the `__profile__` flag does not survive POST like it's supposed to: when you profile a page and then submit a form on the page, the result is supposed to be profiled. The intent is to make it easier to profile Conduit calls.
I believe this is because we're hooking the profiler, then rebuilding POST data a little later -- so `$_POST['__profile__']` isn't set yet when the profiler checks.
Move the POST rebuild a little earlier to fix this.
Also, remove the very ancient "aphront.default-application-configuration-class". I believe this was only used by Facebook to do CIDR checks against corpnet or something like that. It is poorly named and long-obsolete now, and `AphrontSite` does everything we might reasonably have wanted it to do.
Test Plan: Poked around locally without any issues. Will check if this fixes the issue on `secure`.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13242, T12297, T4369
Differential Revision: https://secure.phabricator.com/D20046
Summary: Ref T13238. Warn users about these horrible options and encourage them to defuse them.
Test Plan: Hit both warnings, fixed the issues, issues went away.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13238
Differential Revision: https://secure.phabricator.com/D19999
Summary:
In ~2012, the first of these options was added because someone who hates dogs and works at Asana also hated `[Differential]` in the subject line. The use case there was actually //removing// the text, not changing it, but I made the prefix editable since it seemed like slightly less of a one-off.
These options are among the dumbest and most useless config options we have and very rarely used, see T11760. A very small number of instances have configured one of these options.
Newer applications stopped providing these options and no one has complained.
You can get the same effect with `translation.override`. Although I'm not sure we'll keep that around forever, it's a reasonable replacement today. I'll call out an example in the changelog to help installs that want to preserve this option.
If we did want to provide this, it should just be in {nav Applications > Settings} for each application, but I think it's wildly-low-value and "hack via translations" or "local patch" are entirely reasonable if you really want to change these strings.
Test Plan: Grepped for `subject-prefix`.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D19993
Summary:
Fixes T7477. Fixes T13066. Currently, inbound mail is processed by the first receiver that matches any "To:" address. "Cc" addresses are ignored.
**To, CC, and Multiple Receivers**
Some users would like to be able to "Cc" addresses like `bugs@` instead of having to "To" the address, which makes perfect sense. That's the driving use case behind T7477.
Since users can To/Cc multiple "create object" or "update object" addresses, I also wanted to make the behavior more general. For example, if you email `bugs@` and also `paste@`, your mail might reasonably make both a Task and a Paste. Is this useful? I'm not sure. But it seems like it's pretty clearly the best match for user intent, and the least-surprising behavior we can have. There's also no good rule for picking which address "wins" when two or more match -- we ended up with "address order", which is pretty arbitrary since "To" and "Cc" are not really ordered fields.
One part of this change is removing `phabricator.allow-email-users`. In practice, this option only controlled whether users were allowed to send mail to "Application Email" addresses with a configured default author, and it's unlikely that we'll expand it since I think the future of external/grey users is Nuance, not richer interaction with Maniphest/Differential/etc. Since this option only made "Default Author" work and "Default Author" is optional, we can simplify behavior by making the rule work like this:
- If an address specifies a default author, it allows public email.
- If an address does not, it doesn't.
That's basically how it worked already, except that you could intentionally "break" the behavior by not configuring `phabricator.allow-email-users`. This is a backwards compatility change with possible security implications (it might allow email in that was previously blocked by configuration) that I'll call out in the changelog, but I suspect that no installs are really impacted and this new behavior is generally more intuitive.
A somewhat related change here is that each receiver is allowed to react to each individual email address, instead of firing once. This allows you to configure `bugs-a@` and `bugs-b@` and CC them both and get two tasks. Useful? Maybe not, but seems like the best execution of intent.
**Sender vs Author**
Adjacently, T13066 described an improvement to error handling behavior here: we did not distinguish between "sender" (the user matching the email "From" address) and "actor" (the user we're actually acting as in the application). These are different when you're some internet rando and send to `bugs@`, which has a default author. Then the "sender" is `null` and the "author" is `@bugs-robot` or whatever (some user account you've configured).
This refines "Sender" vs "Author". This is mostly a purity/correctness change, but it means that we won't send random email error messages to `@bugs-robot`.
Since receivers are now allowed to process mail with no "sender" if they have some default "actor" they would rather use instead, it's not an error to send from an invalid address unless nothing processes the mail.
**Other**
This removes the "abundant receivers" error since this is no longer an error.
This always sets "external user" mail recipients to be unverified. As far as I can tell, there's no pathway by which we send them email anyway (before or after this change), although it's possible I'm missing something somewhere.
Test Plan:
I did most of this with `bin/mail receive-test`. I rigged the workflow slightly for some of it since it doesn't support multiple addresses or explicit "CC" and adding either would be a bit tricky.
These could also be tested with `scripts/mail/mail_handler.php`, but I don't currently have the MIME parser extension installed locally after a recent upgrade to Mojave and suspect T13232 makes it tricky to install.
- Ran unit tests, which provide significant coverage of this flow.
- Sent mail to multiple Maniphest application emails, got multiple tasks.
- Sent mail to a Maniphest and a Paste application email, got a task and a paste.
- Sent mail to a task.
- Saw original email recorded on tasks. This is a behavior particular to tasks.
- Sent mail to a paste.
- Sent mail to a mock.
- Sent mail to a Phame blog post.
- Sent mail to a Legalpad document.
- Sent mail to a Conpherence thread.
- Sent mail to a poll.
- This isn't every type of supported object but it's enough of them that I'm pretty confident I didn't break the whole flow.
- Sent mail to an object I could not view (got an error).
- As a non-user, sent mail to several "create an object..." addresses.
- Addresses with a default user worked (e.g., created a task).
- Addresses without a default user did not work.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13066, T7477
Differential Revision: https://secure.phabricator.com/D19952
Summary:
D19940 removed this file entirely, which has led to at least one user who was unsure how to proceed now that `cluster.mailers` is required for outbound mail: https://discourse.phabricator-community.org/t/invalid-argument-supplied-for-foreach-phabricatormetamtamail-php/2287
This isn't //always// a setup issue for installs that don't care about sending mail, but this at least this gives a sporting chance to users who don't follow the changelogs.
Also, I'm not sure if there's a way to use `pht()` to generate links; right now the phurl is just in plain text.
Test Plan: Removed `cluster.mailers` config; observed expected setup issue.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D19964
Summary:
Ref T7477. This option was added in D842 in 2011, to support a specific narrow use case at Quora with community moderators using some kind of weird Gmail config.
I don't recall it ever coming up since then, and a survey of a subset of hosted instances (see T11760) reveals that no instances are using this option today. Presumably, even Quora has completed the onboarding discussed in D842, if they still use Phabricator. This option generally does not seem very useful outside of very unusual/narrow cases like the one Quora had.
This would be relatively easy to restore as a local patch if installs //do// need it, but I suspect this has no use cases anywhere.
Test Plan: Grepped for option, blame-delved to figure out why we added it in the first place, surveyed instances for usage.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T7477
Differential Revision: https://secure.phabricator.com/D19949
Summary:
Ref T12509.
- Remove the "phabricator.csrf-key" configuration option in favor of automatically generating an HMAC key.
- Upgrade two hasher callsites (one in CSRF itself, one in providing a CSRF secret for logged-out users) to SHA256.
- Extract the CSRF logic from `PhabricatorUser` to a standalone engine.
I was originally going to do this as two changes (extract logic, then upgrade hashes) but the logic had a couple of very silly pieces to it that made faithful extraction a little silly.
For example, it computed `time_block = (epoch + (offset * cycle_frequency)) / cycle_frequency` instead of `time_block = (epoch / cycle_frequency) + offset`. These are equivalent but the former was kind of silly.
It also computed `substr(hmac(substr(hmac(secret)).salt))` instead of `substr(hmac(secret.salt))`. These have the same overall effect but the former is, again, kind of silly (and a little bit materially worse, in this case).
This will cause a one-time compatibility break: pages loaded before the upgrade won't be able to submit contained forms after the upgrade, unless they're open for long enough for the Javascript to refresh the CSRF token (an hour, I think?). I'll note this in the changelog.
Test Plan:
- As a logged-in user, submitted forms normally (worked).
- As a logged-in user, submitted forms with a bad CSRF value (error, as expected).
- As a logged-out user, hit the success and error cases.
- Visually inspected tokens for correct format.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D19946
Summary:
Ref T12509.
- Upgrade an old SHA1 to SHA256.
- Replace an old manually configurable HMAC key with an automatically generated one.
This is generally both simpler (less configuration) and more secure (you now get a unique value automatically).
This causes a one-time compatibility break that invalidates old "Reply-To" addresses. I'll note this in the changelog.
If you leaked a bunch of addresses, you could force a change here by mucking around with `phabricator_auth.auth_hmackey`, but AFAIK no one has ever used this value to react to any sort of security issue.
(I'll note the possibility that we might want to provide/document this "manually force HMAC keys to regenerate" stuff some day in T6994.)
Test Plan: Grepped for removed config. I'll vet this pathway more heavily in upcoming changes.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D19945
Summary:
Ref T920. This simplifies mail configuration.
The "metamta.domain" option is only used to generate Thread-ID values, and we just need something that looks like a bit like a domain in order to make GMail happy. Just use the install domain. In most cases, this is almost certainly the configured value anyway. In some cases, this may cause a one-time threading break for existing threads; I'll call this out in the changelog.
The "metamta.placeholder-to-recipient" is used to put some null value in "To:" when a mail only has CCs. This is so that if you write a local client mail rule like "when I'm in CC, burn the message in a fire" it works even if all the "to" addresses have elected not to receive the mail. Instead: just send it to an unlikely address at our own domain.
I'll add some additional handling for the possiblity that we may receive this email ourselves in the next change, but it overlaps with T7477.
Test Plan: Grepped for these configuration values.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T920
Differential Revision: https://secure.phabricator.com/D19942
Summary:
Ref T12509. This upgrades a `weakDigest()` callsite to SHA256-HMAC and removes three config options:
- `celerity.resource-hash`: Now hard-coded, since the use case for ever adjusting it was very weak.
- `celerity.enable-deflate`: Intended to make cache inspection easier, but we haven't needed to inspect caches in ~forever.
- `celerity.minify`: Intended to make debugging minification easier, but we haven't needed to debug this in ~forever.
In the latter two cases, the options were purely developer-focused, and it's easy to go add an `&& false` somewhere in the code if we need to disable these features to debug something, but the relevant parts of the code basically work properly and never need debugging. These options were excessively paranoid, based on the static resource enviroment at Facebook being far more perilous.
The first case theoretically had end-user utility for fixing stuck content caches. In modern Phabricator, it's not intuitive that you'd go adjust a Config option to fix this. I don't recall any users ever actually running into problems here, though.
(An earlier version of this change did more magic with `celerity.resource-hash`, but this ended up with a more substantial simplification.)
Test Plan: Grepped for removed configuration options.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D19941
Summary:
Ref T920. About a year ago (in 2018 Week 6, see D19003) we moved from individually configured mailers to `cluster.mailers`, primarily to support fallback across multiple mail providers.
Since this has been stable for quite a while, drop support for the older options.
Test Plan: Grepped for all removed options.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T920
Differential Revision: https://secure.phabricator.com/D19940
Summary:
Depends on D19579. Fixes T10003. These have been deprecated with a setup warning about their impending removal for about two and a half years.
Ref T13164. See PHI642. My overall goal here is to simplify how we handle transactions which have special policy behaviors. In particular, I'm hoping to replace `ApplicationTransactionEditor->requireCapabilities()` with a new, more clear policy check.
A problem with `requireCapabilities()` is that it doesn't actually enforce any policies in almost all cases: the default is "nothing", not CAN_EDIT. So it ends up looking like it's the right place to specialize policy checks, but it usually isn't.
For "Disable", I need to be able to weaken the check selectively (you can disable users if you have the permission, even if you can't edit them otherwise). We have a handful of other edits which work like this (notably, leaving and joining projects) but they're very rare.
Test Plan: Grepped for all removed classes. Edited a Maniphest task.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13164, T10003
Differential Revision: https://secure.phabricator.com/D19581
Summary: Ref T12164. Defines a new manual activity that suggests rebuilding repository identities before Phabricator begins to rely on them.
Test Plan:
- Ran migration, observed expected setup issue: {F5788217}
- Ran `bin/config done identities` and observed setup issue get marked as done.
- Ran `/bin/storage upgrade --apply phabricator:20170912.ferret.01.activity.php` to make sure I didn't break the reindex migration; observed reindex setup issue appear as expected.
- Ran `./bin/config done reindex` and observed reindex issue cleared as expected.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T12164
Differential Revision: https://secure.phabricator.com/D19497
Summary:
Fixes T12994. We need `MYSQLI_ASYNC` to implement client-side query timeouts, and we need MySQLi + MySQL Native Driver to get `MYSQLI_ASYNC`.
Recommend users install MySQLi and MySQL Native Driver if they don't have them. These are generally the defaults and best practice anyway, but Ubuntu makes it easy to use the older stuff.
All the cases we're currently aware of stem from `apt-get install php5-mysql` (which explicitly selects the non-native driver) so issue particular guidance about `php5-mysqlnd`.
Test Plan:
- Faked both issues locally, reviewed the text.
- Will deploy to `secure`, which currently has the non-native driver.
Maniphest Tasks: T12994
Differential Revision: https://secure.phabricator.com/D19216
Summary: Ref T12677. Skip these checks if we're doing the new stuff. Also, allow priority to be unspecified.
Test Plan: Will deploy.
Maniphest Tasks: T12677
Differential Revision: https://secure.phabricator.com/D19043
Summary: Help for GD SetupChcek was missing the "how to install extension" content.
Test Plan: Uninstalled gd, validated extension installation instructions were present in Setup Issue.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D18922
Summary: Noticed a couple of typos in the docs, and then things got out of hand.
Test Plan:
- Stared at the words until my eyes watered and the letters began to swim on the screen.
- Consulted a dictionary.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18693
Summary:
Ref T12966. See that task for a description and reproduction steps.
If you put Phabricator in a master/replica configuration and then restart it, we may fatal here if the master is unreachable. Instead, we should survive setup checks.
Test Plan: Put Phabricator in a master/replica configuration, explicitly disabled the master by misconfiguring the port, restarted Phabricator. Before: fatal; after: login screen in read-only mode.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12966
Differential Revision: https://secure.phabricator.com/D18442
Summary:
Ref T12965. See that task for discussion, and PHI36 for context.
This sweeps the fatal under the rug by skipping it, letting things move forward for now.
Test Plan: Followed instructions in T12965, got a read-only recovery after restart instead of a fatal.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12965
Differential Revision: https://secure.phabricator.com/D18440
Summary:
Fixes T12942.
- Adds binary version and path information to {nav Config > Version Information}.
- Replaces old code all over the place with new consolidated code.
Test Plan:
{F5073531}
Also faked some cases of missing binaries, bad versions, etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12942
Differential Revision: https://secure.phabricator.com/D18306
Summary: Ref T12509. This encourages code to move away from HMAC+SHA1 by making the method name more obviously undesirable.
Test Plan: `grep`, browsed around.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12509
Differential Revision: https://secure.phabricator.com/D17632
Summary: Ref T12450. These are like 95% my fault, but Elastic appears to spell the name "Elasticsearch" consistently in their branding.
Test Plan: `grep ElasticSearch`
Reviewers: chad, 20after4
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17601
Summary:
[ ] Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
[ ] Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
[ ] We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From T9893 it seems like we may //only// have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
[ ] Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically seen users choosing Elastic when they aren't actually trying to solve any specific problem.
[ ] When users search for fulltext results in Maniphest and hit too many documents, the current behavior is approximately silent failure (see T12443). D17384 has also lowered the ceiling for ElasticSearch, although previous changes lowered it for MySQL search. We should not fail silently, and ideally should build toward T12003.
[ ] D17384 added a new "keywords" field, but MySQL does not search it (I think?). The behavior should be as consistent across MySQL and Elastic as we can make it. Likely cleaner is giving "Project" objects a body, with "slugs" and "description" separated by newlines?
[ ] `PhabricatorSearchEngineTestCase` is now pointless and only detects local misconfigurations.
[ ] It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them. The upstream test could run against MySQL, and some `bin/search test` could run against a configured engine like ElasticSearch. This would make it easier to make sure that behavior was as uniform as possible across engine implementations.
[ ] Does every assigned task now match "user" in ElasticSearch?
[x] `PhabricatorElasticFulltextStorageEngine` has a `json_encode()` which should be `phutil_json_encode()`.
[ ] `PhabricatorSearchService` throws an untranslated exception.
[ ] When a search cluster is down, we probably don't degrade with much grace (unhandled exception)?
[ ] I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
[ ] bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
[ ] CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
[ ] It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
[ ] When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
[x] Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.
[x] Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
[x] Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
[x] Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization in ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
[x] The "index incorrect" warning UI uses inconsistent title case.
[x] The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).
refs T12450
Test Plan:
* Stared blankly at the code.
* Disabled 'write' role on mysql fulltext service.
* Edited a task, ran search indexer, verified that the mysql index wasn't being updated.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T12450
Differential Revision: https://secure.phabricator.com/D17564
Summary:
The goal is to make fulltext search back-ends more extensible, configurable and robust.
When this is finished it will be possible to have multiple search storage back-ends and
potentially multiple instances of each.
Individual instances can be configured with roles such as 'read', 'write' which control
which hosts will receive writes to the index and which hosts will respond to queries.
These two roles make it possible to have any combination of:
* read-only
* write-only
* read-write
* disabled
This 'roles' mechanism is extensible to add new roles should that be needed in the future.
In addition to supporting multiple elasticsearch and mysql search instances, this refactors
the connection health monitoring infrastructure from PhabricatorDatabaseHealthRecord and
utilizes the same system for monitoring the health of elasticsearch nodes. This will
allow Wikimedia's phabricator to be redundant across data centers (mysql already is,
elasticsearch should be as well).
The real-world use-case I have in mind here is writing to two indexes (two elasticsearch clusters
in different data centers) but reading from only one. Then toggling the 'read' property when
we want to migrate to the other data center (and when we migrate from elasticsearch 2.x to 5.x)
Hopefully this is useful in the upstream as well.
Remaining TODO:
* test cases
* documentation
Test Plan:
(WARNING) This will most likely require the elasticsearch index to be deleted and re-created due to schema changes.
Tested with elasticsearch versions 2.4 and 5.2 using the following config:
```lang=json
"cluster.search": [
{
"type": "elasticsearch",
"hosts": [
{
"host": "localhost",
"roles": { "read": true, "write": true }
}
],
"port": 9200,
"protocol": "http",
"path": "/phabricator",
"version": 5
},
{
"type": "mysql",
"roles": { "write": true }
}
]
Also deployed the same changes to Wikimedia's production Phabricator instance without any issues whatsoever.
```
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Tags: #elasticsearch, #clusters, #wikimedia
Differential Revision: https://secure.phabricator.com/D17384
Summary:
Fixes T12306. Currently, we warn about daemons not running even if they're in normal "alive" states, particularly "waiting to restart after a failure".
This check was made more strict in D12088, back when we tried to version check running daemons. Since we implemented auto-restart-after-config-change we don't do this anymore, so it should be fine to make this more lax again.
Test Plan:
- Faked an exception for all tasks.
- Before patch: reloading the daemon setup error sometimes raised a false positive ("waiting" daemon detected as dead).
- After patch: daemon setup error no longer triggers.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12306
Differential Revision: https://secure.phabricator.com/D17397
Summary: Ref T9640. On 7.0 we had signal handling issues so we can never support it, but async signals should resolve them on 7.1 or newer.
Test Plan: On PHP 7.1, got through the setup warning.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9640
Differential Revision: https://secure.phabricator.com/D17197
Summary: Ref T9640. This option was removed in PHP7, so there's no reason to warn about it.
Test Plan: No longer saw a setup warning on PHP7.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9640
Differential Revision: https://secure.phabricator.com/D17196
Summary:
- Fixes T11995. This got moved but I missed renaming this callsite.
- Fixes T11993. If you have valid credentials, but haven't run `storage upgrade` yet, we can hit this exception during setup. Just ignore it instead.
Test Plan:
- Saved global settings, no more fatal.
- Changed `storage-namespace` to junk, loaded web UI with valid database credentials.
{F2106358}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11993, T11995
Differential Revision: https://secure.phabricator.com/D17024
Summary: Fixes T11544. Attempt to detect if we're on a tiny, burstable-CPU AWS instance and complain.
Test Plan:
- Completely faked this locally.
- Hit the URI on an EC2 instance to check that it's correct (got back "m3.large", since that was the instance class).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11544
Differential Revision: https://secure.phabricator.com/D17014
Summary:
Ref T11553. With some regularity, users make various configuration mistakes which we can detect by making a request to ourselves.
I use a magical header to make this request because we want to test everything else (parameters, path).
- Fixes T4854, probably. Tries to detect mod_pagespeed by looking for a header. This is a documentation-based "fix", I didn't actually install mod_pagespeed or formally test this.
- Fixes T6866. We now test for parameters (e.g., user somehow lost "QSA").
- Ref T6709. We now test that stuff is decoded exactly once (e.g., user somehow lost "B").
- Fixes T4921. We now test that Authorization survives the request.
- Fixes T2226. Adds a setup check to determine whether gzip is enabled on the web server, and attempts to enable it at the PHP level.
- Fixes `<space space newline newline space><?php` in `preamble.php`.
Test Plan: Tested all of these setup warnings, although mostly by faking them.
Reviewers: joshuaspence, chad
Reviewed By: chad
Subscribers: Korvin
Maniphest Tasks: T4854, T4921, T6709, T6866, T11553, T2226
Differential Revision: https://secure.phabricator.com/D12622
Summary:
Ref T11922. After updating to HEAD of `master`, you need to manually rebuild the index. We don't do this during `bin/storage upgrade` because it can take a very long time (`secure.phabricator.com` took roughly an hour) and can happen while Phabricator is running.
However, if we don't warn users about this they'll just get a broken index unless they go read the changelog (or file an issue, then we tell them to go read the changelog).
This adds a very simple table for notes to administrators so we can write a "you need to go rebuild the index" note, then adds one.
Administrators clear the note by completing the activity and running `bin/config done reindex`. This isn't automatic because there are various strategies you can use to approach the issue, which I'll discuss in greater detail in the linked documentation.
Also, fix an issue where `bin/storage upgrade --apply <patch>` could try to re-mark an already-applied patch as applied.
Test Plan:
- Ran storage ugrades.
- Got instructions to rebuild search index.
- Cleared instructions with `bin/config done reindex`.
Reviewers: chad
Reviewed By: chad
Subscribers: avivey
Maniphest Tasks: T11922
Differential Revision: https://secure.phabricator.com/D16965
Summary:
Ref T11741. This makes everything work if we switch to InnoDB, but never actually switches yet.
Since the default minimum word length (3) and stopword list (36 common English words) in InnoDB are generally pretty reasonable, I just didn't add any setup advice for them. I figure we're better off with simpler setup until we identify some real problem that the builtin stopwords create.
Test Plan: Swapped the `false` to `true`, ran `storage adjust`, got InnoDB fulltext indexes, searched for stuff, got default "AND" behavior.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11741
Differential Revision: https://secure.phabricator.com/D16942
Summary:
Ref T11741. Fixes T10642. Parse and compile user queries with a consistent ruleset, then submit queries to the backend using whatever ruleset MySQL is configured with.
This means that `ft_boolean_syntax` no longer needs to be configured (we'll just do the right thing in all cases).
This should improve behavior with RDS immediately (T10642), and allow us to improve behavior with InnoDB in the future (T11741).
Test Plan:
- Ran various queries in the UI, saw the expected results.
- Ran bad queries, got useful errors.
- Searched threads in Conpherence.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10642, T11741
Differential Revision: https://secure.phabricator.com/D16939
Summary:
Ref T11044. This was old Facebook cruft for reading configuration from SMC (and maybe doing some other questionable things). See D183.
(See also D175 for discussion of this from 2011.)
In modern Phabricator, you can subclass `SiteConfig` to provide dynamic configuration, and we do so in the Phacility cluster. This lets you change any config, and change in response to requests (e.g., for instancing) and is generally more powerful than this mechanism was.
This configuration provider theoretically let you roll your own replication or partitioning, but in practice I believe no one ever did, and no one ever could have anyway without more support in the upstream (for migrations, read-after-write, etc).
Test Plan:
- Grepped for removed option.
- Browsed around with clustering off.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16911
Summary:
Ref T11044. One popular tool in a modern operations environment is Puppet. The primary purpose of this tool is to randomly revert hosts to older or different configurations.
Introducing an element of chaotic unpredictability into operations trains staff to be on high alert at all times, rather than lulled into complacency by predictability or consistency.
When Puppet reverts a Phabricator host's configuration to an older version, we might start writing data to a lot of crazy places where it shouldn't go. This will create a big sticky mess that is virtually impossible to undo, mostly because we'll get two files with ID 123 or two tasks with ID 456 or whatever else and good luck with that.
Instead, after changing the partition layout, require `bin/storage partition` to be run. This writes a copy of the config everywhere.
Then, when we start serving web requests, make sure every database has the exact same config. This will foil Puppet by refusing to run requests on hosts it has reverted.
Test Plan:
- Changed partition configuration.
- Ran Phabricator.
- FOILED!
- Ran `bin/storage partition` to sync config.
- Things worked again.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16910
Summary:
Ref T11044. Fixes T10931. This option has essentially never been useful for anything, and we've picked the best implementation for a long time (MySQLi if available, MySQL if not).
I am not aware of any reason to ever set this manually. If someone comes up with some bizarre but legitimate use case that I haven't thought of, we can modularize it.
Test Plan: Browsed around. Grepped for `mysql.implementation`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10931, T11044
Differential Revision: https://secure.phabricator.com/D16909
Summary:
Fixes T10759. Fixes T11817. This runs all the general sanity/configuration checks on all the active servers.
None of these warnings are very important, and this doesn't change any logical stuff.
Depends on D16904.
Test Plan: Painstakingly triggered each warning, verified that they rendered correctly and that messages told me which host was affected.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759, T11817
Differential Revision: https://secure.phabricator.com/D16905
Summary:
Ref T10759. Check master/replica status during startup.
After D16903, this also means that we check this status after a database comes back online after being unreachable.
If a master is replicating, fatal (since this can do a million kinds of bad things).
If a replica is not replicating, warn (this just means the replica is behind so some data is at risk).
Also: if your masters were actually configured properly (mine weren't until this change detected it), we would throw away patches as we applied them, so they would only apply to the //first// master. Instead, properly apply all migration patches to all masters.
Test Plan:
- Started Phabricator with a replicating master, got a fatal.
- Stopped replication on a replica, got a warning.
- With two non-replicating masters, upgraded storage.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759
Differential Revision: https://secure.phabricator.com/D16904
Summary:
Ref T10759. We may "discover" the presence of a fatal setup error later, after starting Phabricator.
This can happen in a few ways, but most are unlikely. The one I'm immediately concerned about is:
- Phabricator starts up during a disaster with some databases unreachable.
- We start with warnings (unreachable databases are generally not fatal, since it's OK for some subset of hosts to be down in replicated/partitioned setups).
- The unreachable databases later recover and become accessible again.
- When we run checks against them, we discover that they are misconfigured.
Currently, "fatal" setup issues are not truly fatal if we're "in flight" -- we've survived setup checks at least once in the past. This is bad in the scenario above.
Especially with partitioning, it could lead to mangled data in a disaster scenario where operations staff makes a small configuration mistake while trying to get things running again.
Instead, if we "discover" a fatal error while already "in flight", reset the whole setup process as though the webserver had just restarted. Don't serve requests again until we can make it through setup without hitting fatals.
Test Plan:
- Started Phabricator with multiple masters, one of which was down and broken.
- Got a warning about the bad master.
- Revived the master.
- Before: Phabricator detects the fatal, but keeps serving requests.
- After: Phabricator detects the fatal, resets the webserver, and stops serving requests until the fatal is resolved.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759
Differential Revision: https://secure.phabricator.com/D16903
Summary:
Ref T10759. Currently, these checks run only against configured masters. Instead, check every host.
These checks also sort of cheat through restart during a recovery, when some hosts will be unreachable: they test for "disaster" by seeing if no masters are reachable, and just skip all the checks in that case.
This is bad for at least two reasons:
- After recent changes, it is possible that //some// masters are dead but it's still OK to start. For example, "slowvote" may have no master, but everything else is reachable. We can safely run without slowvote.
- It's possible to start during a disaster and miss important setup checks completely, since we skip them, get a clean bill of health, and never re-test them.
Instead:
- Test each host individually.
- Fundamental problems (lack of InnoDB, bad schema) are fatal on any host.
- If we can't connect, raise it as a //warning// to make sure we check it later. If you start during a disaster, we still want to make sure that schemata are up to date if you later recover a host.
In particular, I'm going to add these checks soon:
- Fatal if a "master" is replicating.
- Fatal if a "replica" is not replicating.
- Fatal if a database partition config differs from web partition config.
- When we let a database off with a warning because it's down, and later upgrade it to a fatal because we discover it is broken after it comes up again, fatal everything. Currently, we keep running if we "discover" the presence of new fatals after surviving setup checks for the first time.
Test Plan:
- Configured with multiple masters, intentionally broke one (simulating a disaster where one master is lost), saw Phabricator still startup.
- Tested individual setup checks by intentionally breaking them.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10759
Differential Revision: https://secure.phabricator.com/D16902
Summary:
Ref T11044. This moves toward partitioned application databases:
- You can define multiple masters.
- Convert all the easily-convertible code to become multi-master aware.
This doesn't convert most of `bin/storage` or "Config > Database (Stuff)" yet, as both are quite involved. They still work for now, but only operate on the first master instead of all masters.
Test Plan: Configured multiple masters, browsed around, ran `bin/storage` commands, ran `bin/storage --host ...`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16115
Summary:
Ref T11672. At low loads, this causes us to use more connections, which is pushing some installs over the default limits.
Rather than trying to walk users through changing `max_connections`, `open_files_limit`, `fs.file-max`, `ulimit`, etc., just put things back for now. After T11044 we should have headroom to use persistent connections within the default limits on all reasonable systems..
Test Plan: Loaded Phabricator, poked around.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11672
Differential Revision: https://secure.phabricator.com/D16591
Summary:
Fixes T11683. Likely as a result of the persitent connections change, more users are seeing MySQL connection limit errors.
The persistent connections change means we use //fewer// connections at the high end, but I'm guessing PHP is keeping some more connections around in the pool, so while high-traffic hosts use fewer connections, low-traffic hosts now use more.
Raise an explicit setup warning about this. Users should be adjusting it anyway, there's no value to leaving it at extremely low default and connections are baiscally free until you run out of outbound ports.
Test Plan: {F1844630}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11683
Differential Revision: https://secure.phabricator.com/D16586