Summary:
Fixes T11746. The opcache docs are on a different page, so point there if we're raising opcache issues.
(It's possible for a setup issue to say "configure X, or configure Y", where X is opcache and Y is non-opcache, so we may want to render both links.)
Test Plan: {F1867109}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11746
Differential Revision: https://secure.phabricator.com/D16685
Summary: Creates a background that renders inside the Quicksand frame, through sorcery.
Test Plan: Turn on Quicksand, visit lots of pages. See correct background colors. This probably blows something up I'm not testing.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16642
Summary:
Ref T11672. At low loads, this causes us to use more connections, which is pushing some installs over the default limits.
Rather than trying to walk users through changing `max_connections`, `open_files_limit`, `fs.file-max`, `ulimit`, etc., just put things back for now. After T11044 we should have headroom to use persistent connections within the default limits on all reasonable systems..
Test Plan: Loaded Phabricator, poked around.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11672
Differential Revision: https://secure.phabricator.com/D16591
Summary:
Fixes T11683. Likely as a result of the persitent connections change, more users are seeing MySQL connection limit errors.
The persistent connections change means we use //fewer// connections at the high end, but I'm guessing PHP is keeping some more connections around in the pool, so while high-traffic hosts use fewer connections, low-traffic hosts now use more.
Raise an explicit setup warning about this. Users should be adjusting it anyway, there's no value to leaving it at extremely low default and connections are baiscally free until you run out of outbound ports.
Test Plan: {F1844630}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11683
Differential Revision: https://secure.phabricator.com/D16586
Summary:
The commit which added checks for the old homepage options (now in
Dashboard) in rP9d9a47e9cf, added them to the auth section, where they
would present:
This option has been migrated to the "Auth" application. Your old
configuration is still in effect, but now stored in "Auth" instead of
configuration. Going forward, you can manage authentication from the
web UI.
Remove them from the moved-to-Auth list, and coalesce the multiple
definitions of the help text into one.
Test Plan:
- set maniphest.priorities.unbreak-now to something
- observe the setup issue reported
- hope it tells you the right thing
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley, chad
Differential Revision: https://secure.phabricator.com/D16576
Summary:
Fixes T11627.
Beyond being complex, I have no real reason to believe these checks even work (and they don't test repositories, file storage, logfiles, etc).
Test Plan:
Faked the error:
{F1813433}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11627
Differential Revision: https://secure.phabricator.com/D16544
Summary:
Ref T11613. In D16503/T11598 I refined the setup flow to improve messaging for early-stage setup issues, but failed to fully untangle things.
We sometimes still try to access a cache which uses configuration before we build configuration, which causes an error.
Instead, store "are we in flight / has setup ever worked?" in a separate cache which doesn't use the cache namespace. This stops us from trying to read config before building config.
Test Plan:
Hit bad extension error with a fake extension, got a proper setup help page:
{F1812803}
Solved the error, reloaded, broke things again, got a "friendly" page:
{F1812805}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11613
Differential Revision: https://secure.phabricator.com/D16542
Summary:
Ref T11589. When we hit a fatal setup issue (essentially always a connection failure) //after// we've already survived them on at least one request, we can be pretty sure a server went down and that the problem is not a setup/configuration issue.
In this case, show a friendlier error page instead of the fairly detailed technical one.
Test Plan:
- Broke MySQL config.
- Restarted Apache.
- Got the "admin/setup" error page:
{F1803268}
- Fixed the MySQL config.
- Loaded any page, to put us "in flight".
- Broke MySQL config.
- Loaded any page.
- Got the friendly "in flight" error page:
{F1803271}
If you want to design this better, easiest way to get to it is:
- Set `mysql.port` to `9999` in `conf/local/local.json`.
- Reload any page while already running (don't restart).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11589
Differential Revision: https://secure.phabricator.com/D16503
Summary:
Ref T11589. Previously, when we failed to load database configuration we just continued anyway, in order to get to setup checks so we could raise a better error.
There was a small chance that this could lead to pages running in a broken state, where ONLY that connection failed and everything else worked. This was accidentally fixed by narrowing the exceptions we continue on in D16489.
However, this "fix" meant that users no longer got helpful setup instructions. Instead:
- Keep throwing these exceptions: it's bad to continue if we've failed to connect to the database.
- However, catch them and turn them into setup errors.
- Share all the setup code so these errors and setup check errors work the same way.
Test Plan:
- Intentionally broke `mysql.host` and `mysql.pass`.
- Loaded pages.
- Got good setup errors.
- Hit normal setup errors too.
- Put everything back.
- Swapped into cluster mode.
- Intentionally broke cluster mode, saw failover to readonly.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11589
Differential Revision: https://secure.phabricator.com/D16501
Summary:
Ref T11589. This runs:
- preflight checks (critical checks: PHP version stuff, extensions);
- configuration;
- normal checks.
The PHP checks are split into critical ("bad version") and noncritical ("sub-optimal config").
I tidied up the extension checks slightly, we realistically depend on `cURL` nowadays.
Test Plan:
- Faked a preflight failure.
- Hit preflight check.
- Got expected error screen.
- Loaded normal pages.
- Hit a normal setup check.
- Used DarkConsole "Startup" tab to verify that preflight checks take <1ms to run (we run them on every page without caching, at least for now, but they only do trivial checks like PHP versions).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11589
Differential Revision: https://secure.phabricator.com/D16500
Summary:
Ref T11589. Currently, initialization order is a bit tangled: we load configuration from the database, then later test if we can connect to the database.
Instead, I'm going to do: preflight checks ("PHP Version OK?", "Extensions installed?"), then configuration, then normal setup checks.
To prepare for this, flag core checks as "preflight" and add a setup panel to visually confirm that I didn't miss anything.
Test Plan: {F1803210}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11589
Differential Revision: https://secure.phabricator.com/D16499
Summary:
Fixes T11590. Currently, we incorrectly consider cluster repository versions that are (or were) on devices which are no longer part of the active cluster service when building this status screen.
Instead, ignore them. This is just a display bug; the actual `ClusterEngine` already had similar logic.
Test Plan:
- Added a bad leader record to `repository_workingcopyversion`.
- Before patch, got a bad "Partial (1w)" sync:
{F1802292}
- After patch, got a good "Sycnchronized":
{F1802293}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11590
Differential Revision: https://secure.phabricator.com/D16492
Summary: Ref T11132, swaps in new UI for welcome page using guide modules
Test Plan: Test instance and non instance guides. Test each setting. Unclear on how to test people / Phacility. Just change the URL link?
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T11132
Differential Revision: https://secure.phabricator.com/D16482
Summary: Ref T11132. Adds a text panel to feed if no stories are present and the user is an admin. Seems ok-ish for 15 minutes. Happy to take content suggestions.
Test Plan: Make a new install, see panel. Log in as new user, don't see panel.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T11132
Differential Revision: https://secure.phabricator.com/D16479
Summary: This adds status icons, locked, hidden, editable, customized, to the list of options in config. Makes it easier to read and assertain state.
Test Plan:
View a hidden, customized, editable, and locked.
{F1796320}
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16475
Summary: Ref T11559. This makes managing large numbers of repositories slightly easier.
Test Plan: {F1796119}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11559
Differential Revision: https://secure.phabricator.com/D16472
Summary: Ref T11132, significantly cleans up the Config app, new layout, icons, spacing, etc. Some minor todos around re-designing "issues", mobile support, and maybe another pass at actual Group pages.
Test Plan: Visit and test every page in the config app, set new items, resolve setup issues, etc.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam, Korvin
Maniphest Tasks: T11132
Differential Revision: https://secure.phabricator.com/D16468
Summary: Ref T11132. This gets rid of the red bar for admins and instead shows a new menu item next to notifications/chat if there are unresolved configuration issues. Menu goes away if there are no issues. May move this later into the bell icon, but think think might be the right place to start especially for NUX and updates. Maybe limit the number of items?
Test Plan:
Tested with some, lots, and no config issues.
{F1790156}
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T11132
Differential Revision: https://secure.phabricator.com/D16461
Summary:
Fixes T9235. When the stars align, PHP 5.6 or newer emits a deprecation warning on startup about "always_populate_raw_post_data" which occurs too early for us to intercept and can break responses by adding garbage to the output.
These settings appear to be sufficient:
```
always_populate_raw_post_data = 1
display_errors = 1
display_startup_errors = 1
error_reporting = -1
```
Then make a request with an unusual content type:
```
$ curl -X POST -H "Content-Type: application/json" -d "{foo: bar}" http://phabricator.example.com/
```
This triggers the warning:
```
<br />
<b>Deprecated</b>: Automatically populating $HTTP_RAW_POST_DATA is deprecated and will be removed in a future version. To avoid this warning set 'always_populate_raw_post_data' to '-1' in php.ini and use the php://input stream instead. in <b>Unknown</b> on line <b>0</b><br />
<br />
...
```
To avoid this, just instruct administrators to set this value to "-1", which completely disables the feature and silences the warning.
Test Plan:
- Reproduced this issue by following the instructions above.
- Triggered the setup issue locally and read all the captivating prose:
{F1786911}
- Made the configuration change it directed me to, saw the setup issue resolve.
Reviewers: jcox
Reviewed By: jcox
Maniphest Tasks: T9235
Differential Revision: https://secure.phabricator.com/D16454
Summary: Fixes T8850. Previously, if a user's preamble script mangled `$_SERVER['REMOTE_ADDR']` or somehow set it to `null`, the user would get errors when performing certain actions. Now those errors shouldn't occur, and instead the user will be warned that there is a setup issue related to their preamble script.
Test Plan: Create a preamble script that contains `$_SERVER['REMOTE_ADDR'] = null;` then navigate to /config/issue/. There should be a warning there about `REMOTE_ADDR` not being available.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, yelirekim, epriestley
Maniphest Tasks: T8850
Differential Revision: https://secure.phabricator.com/D16450
Summary: Switches over to new property UI boxes, splits core and apps into separate pages. Move Versions into "All Settings". I think there is some docs I likely need to update here as well.
Test Plan: Click on each item in the sidebar, see new headers.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16429
Summary: Fixes T11501. Let's you pass in a full PHUIIconView or just the icon name to give ObjectListItem a large icon.
Test Plan: Alamanac, Applications, Drydock, Settings, Search Typeahead, Config page...
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T11501
Differential Revision: https://secure.phabricator.com/D16421
Summary:
When I wrote this the first time, only hosted repositories could be clustered.
This check wasn't removed when I allowed observed repositories to be clustered in D15986.
Test Plan:
Reloaded {nav Config > Repository Servers} page, saw more stuff locally.
Reviewed the cardinal digits between 1 and 17, inclusive.
Reviewers: chad, avivey
Reviewed By: avivey
Differential Revision: https://secure.phabricator.com/D16392
Summary:
Fixes T11453. Currently, commit message summaries are limited to 80 bytes. This may only be 20-40 characters for CJK languages or langauges with Cyrillic script.
Increase storage size to 255, then truncate to the shorter of 255 bytes or 80 glyphs. This preserves the same behavior for latin languages, but is less tight for Russian, etc.
Some minor additional changes:
- Provide a way to ask "how much data fits in this column?" so we don't have to duplicate column lengths across summary checks or UI errors like "title too long".
- Remove the `text80` datatype, since no other columns use it and we have no use cases (or likely use cases) for it.
Test Plan:
- Made a commit with a Cyrillic title, saw reasonable summarization in UI:
{F1757522}
- Added and ran unit tests.
- Grepped for removed `SUMMARY_MAX_LENGTH` constant.
- Grepped for removed `text80` data type.
Reviewers: avivey, chad
Reviewed By: avivey
Subscribers: avivey
Maniphest Tasks: T11453
Differential Revision: https://secure.phabricator.com/D16385
Summary: Converts final call site to PHUIDocumentViewPro.
Test Plan: grep for PHUIDocumentView, view new Welcome Page
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D16379
Summary: Fixes T11437. Provides a normal form for configuring this, instead of weird "look up the PHID and adjust things in the database" stuff.
Test Plan:
{F1753651}
{F1753652}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11437
Differential Revision: https://secure.phabricator.com/D16377
Summary:
This updates the eye logo and removes the formal wordmark "Phabricator" as an image. Instead we'll use the new updated eye logo and plain text for "Phabricator", both of which are more friendly and less industrial.
Installs that already use the `header-logo` customization setting will need to rebuild their logo to 80px x 80px. They will then also get to use plain text to whitebox their install as they see fit.
Test Plan:
Tested new logo at desktop, tablet, and mobile sizes. Set a random instance name, saw new wordmark. Created a really long wordmark of MMMMMMMMMMMM, saw text cut off so UI doesn't break. May need some additional tweaking, but I think we covered the most edge cases here.
{F1751791, size=full}
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: edibiase, bjshively, yelirekim, Korvin
Maniphest Tasks: T4214, T11096
Differential Revision: https://secure.phabricator.com/D16373
Summary: Ref T9640. Fixes T9888. Decline to support PHP 7 until the async signal handling issue in T11270 is resolved.
Test Plan: Faked local version, got helpful error message.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9640, T9888
Differential Revision: https://secure.phabricator.com/D16231
Summary:
Ref T11140. This makes encryption actually work:
- Provide a new configuation option, `keyring`, for specifying encryption keys.
- One key may be marked as `default`. This activates AES256 encryption for Files.
- Add `bin/files generate-key`. This is helps when generating valid encryption keys.
- Add `bin/files encode`. This changes the storage encoding of a file, and helps test encodings and migrate existing data.
- Add `bin/files cycle`. This re-encodes the block key with a new master key, if your master key leaks or you're just paraonid.
- Document all these options and behaviors.
Test Plan:
- Configured a bad `keyring`, hit a bunch of different errors.
- Used `bin/files generate-key` to try to generate bad keys, got appropriate errors ("raw doesn't support keys", etc).
- Used `bin/files generate-key` to generate an AES256 key.
- Put the new AES256 key into the `keyring`, without `default`.
- Uploaded a new file, verified it still uploaded as raw data (no `default` key yet).
- Used `bin/files encode` to change a file to ROT13 and back to raw. Verified old data got deleted and new data got stored properly.
- Used `bin/files encode --key ...` to explicitly convert a file to AES256 with my non-default key.
- Forced a re-encode of an AES256 file, verified the old data was deleted and a new key and IV were generated.
- Used `bin/files cycle` to try to cycle raw/rot13 files, got errors.
- Used `bin/files cycle` to cycle AES256 files. Verified metadata changed but file data did not. Verified file data was still decryptable with metadata.
- Ran `bin/files cycle --all`.
- Ran `encode` and `cycle` on chunked files, saw commands fail properly. These commands operate on the underlying data blocks, not the chunk metadata.
- Set key to `default`, uploaded a file, saw it stored as AES256.
- Read documentation.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11140
Differential Revision: https://secure.phabricator.com/D16127
Summary: Ref T11098. There is no reason to maintain these as separate values now that they can be configured in global settings.
Test Plan:
- Hit and read setup issue.
- Fiddled with settings.
- I'll vet this more throughly in the next diff since I need to fix an issue with global defaults in mail and can explicitly test this at the same time.
Reviewers: chad
Reviewed By: chad
Subscribers: eadler
Maniphest Tasks: T11098
Differential Revision: https://secure.phabricator.com/D16117
Summary: Ref T11120. If this works, I'll just remove this option completely.
Test Plan: ¯\_(ツ)_/¯
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11120
Differential Revision: https://secure.phabricator.com/D16095
Summary:
Ref T4103. This doesn't get everything, but takes care of most of the easy stuff.
The tricky-ish bit here is that I need to move timezones, pronouns and translations to proper settings. I expect to pursue that next.
Test Plan:
- Grepped for `loadPreferences` to identify callsites.
- Changed start-of-week setting, loaded Calendar, saw correct start.
- Visited welcome page, read "Adjust Settings" point.
- Loaded Conpherence -- I changed behavior here slightly (switching threads drops the title glyph) but it wasn't consistent to start with and this seems like a good thing to push to the next version of Conpherence.
- Enabled Filetree, toggled in Differential.
- Disabled Filetree, no longer visible in Differential.
- Changed "Unified Diffs" preference to "Small Screens" vs "Always".
- Toggled filetree in Diffusion.
- Edited a task, saw sensible projects in policy dropdown.
- Viewed user profile, uncollapsed/collapsed side nav, reloaded page, sticky'd.
- Toggled "monospaced textareas", used a comment box, got appropriate fonts.
- Toggled durable column.
- Disabled title glyphs.
- Changed monospaced font to 18px/36px impact.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4103
Differential Revision: https://secure.phabricator.com/D16004
Summary:
Fixes T11043. This page was still reading the old information directly instead of going through the cluster-aware stuff.
Have it ask the cluster-aware stuff for information instead.
Test Plan:
- Nuked MySQL on localhost.
- Configured cluster hosts.
- Loaded "Database Status" page -- worked after patch.
- Grepped for any remaining `mysql.configuration-provider` stragglers, came up empty.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11043
Differential Revision: https://secure.phabricator.com/D15982
Summary:
Ref T10939. I'm not //totally// opposed to the existence of this element, but I think it's the kind of thing that would never make it upstream today. I think this should just be a T418 custom sort of thing in the long run, not a mainline upstream feature.
Overall, I think this thing is nearly useless and just adds visual clutter. My dashboard is about 100% red. This also sort of teaches users that it's fine to let revisions sit for a couple of days, which isn't what I'd like the UI to teach. Finally, removing it helps the UI feel a little less cluttered after the visually busy changes in D15926.
Test Plan: Grepped for removed config. Viewed revision list.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10939
Differential Revision: https://secure.phabricator.com/D15927
Summary: Ref T10694. This setting no longer has any effect: we always show a limited amount of context now.
Test Plan: `grep`
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10694
Differential Revision: https://secure.phabricator.com/D15886
Summary:
Fixes T10876. Currently, we can end up with a setup warning banner sticking on each web device, since the state is stored in local cache.
Instead:
- When we actually run the setup checks, save the current state in the database.
- Before we show a cached banner, make sure the database still says the checks are a problem.
This could lead to some inconsistencies if setup checks legitimately pass on some hosts but not on others. For example, if you have `git` installed on one machine but not on another, we may raise a setup warning ("No Git Binary!") about it on one host only.
For now, assume users have their operational environments in some sort of reasonable shape and can install the same stuff everywhere. In the future, we could split the issues into "global" and "per-host" issues if we run into problems with this.
Test Plan:
This is somewhat tricky to test locally since you really need multiple webservers to test it properly, but I:
- Created some setup issues, saw banner.
- Ignored/cleared them, saw banner go away.
- Verified database cache writes were occurring properly.
Then I sort of faked it like this:
- Created a setup issue.
- Manually set the database cache value to `[]` ("no issues").
- Reloaded page.
- No more banner.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10876
Differential Revision: https://secure.phabricator.com/D15802
Summary: Ref T4292. This adds a new high-level overview panel.
Test Plan: {F1238854}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15772
Summary:
Ref T10832. In practice, `git --version` is not a useful test for this issue:
- Vendors like Debian have backported the patch into custom versions like `0.0.0.1-debian-lots-of-patches.3232`.
- Vendors like Ubuntu distribute multiple different versions which report the same string from `git --version`, some of which are patched and some of which are not.
In other cases, we can perform an empirical test for the vulnerability. Here, we can not, because we can't write a 2GB path in a reasonable amount of time.
Since vendors (other than Apple) //generally// seem to be on top of this and any warning we try to raise based on `git --version` will frequently be incorrect, don't raise this warning.
I'll note this in the changelog instead.
Test Plan: Looked at setup issues, no more warning for vulnerable git version.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10832
Differential Revision: https://secure.phabricator.com/D15756
Summary: Ref T10832. Raise a setup warning for out-of-date versions of `git`.
Test Plan: {F1224632}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10832
Differential Revision: https://secure.phabricator.com/D15745
Summary:
Fixes T9385. This was accidentally mangled a bit a long time ago by D12797, which was a 1,000-file change which got almost everything right.
Simplify the message and fix all the `%s` conversions and how they map to parameters.
Test Plan: {F1220400}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9385
Differential Revision: https://secure.phabricator.com/D15722
Summary:
Fixes T10697. This finishes bringing the rest of the config up to cluster power levels.
Phabricator is now given an arbitrarily long list of notification servers.
Each Aphlict server is given an arbitrarily long list of ports to run services on.
Users are free to make them meet in the middle by proxying whatever they want to whatever else they want.
This should also accommodate clustering fairly easily in the future.
Also rewrote the status UI and changed a million other things. 🐗
Test Plan:
{F1217864}
{F1217865}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15703
Summary: Ref T10697. Mostly straightforward. Also allow the server to have multiple logs and log options in the future (e.g., different verbosities or separate admin/client logs or whatever). No specific plans for this, but the default log is pretty noisy today.
Test Plan: Set up a couple of logs, started server, saw it log to them.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15702
Summary: Ref T10697. This isn't everything but starts generalizing options and moving us toward a cluster-ready state of affairs.
Test Plan: Started server in various configurations, hit most (all?) of the error cases with bad configs, sent test notifications.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15701
Summary:
Ref T10784. Currently, if you terminate SSL at a load balancer (very common) and use HTTP beyond that, you have to fiddle with this setting in your premable or a `SiteConfig`.
On the balance I think this makes stuff much harder to configure without any real security benefit, so don't apply this option to intracluster requests.
Also document a lot of stuff.
Test Plan: Poked around locally but this is hard to test outside of a production cluster, I'll vet it more thoroughly on `secure`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10784
Differential Revision: https://secure.phabricator.com/D15696
Summary:
Ref T4292. This adds some very basic cluster/device data to the new management view. Nothing interesting yet.
Also deal with disabled bindings a little more cleanly.
Test Plan: {F1214619}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15685
Summary:
Ref T4571. Write more of the missing documentation sections and clarify a few things.
Since the "replicating master" check needs a special permission, imposes a performance penalty, is probably very difficult to misconfigure, and likely not a big deal anyway, just drop the idea of trying to automatically detect + prevent it. We still show if it's an issue on the status page, provided we have permission to check.
When you don't have any cluster databases configured, never stop trying to connect to the default master database. We might want to do this eventually as load reduction, but just don't muddy the waters too much for now while things stabilize.
Test Plan:
- Tested functionality in cluster, non-cluster, and degraded-cluster modes.
- Used status console to monitor a health check cycle.
- Read docs.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15679
Summary:
Ref T4571. When a database goes down briefly, we fall back to replicas.
However, this fallback is slow (not good for users) and keeps sending a lot of traffic to the master (might be bad if the root cause is load-related).
Keep track of recent connections and fully degrade into "severed" mode if we see a sequence of failures over a reasonable period of time. In this mode, we send much less traffic to the master (faster for users; less load for the database).
We do send a little bit of traffic still, and if the master recovers we'll recover back into normal mode seeing several connections in a row succeed.
This is similar to what most load balancers do when pulling web servers in and out of pools.
For now, the specific numbers are:
- We do at most one health check every 3 seconds.
- If 5 checks in a row fail or succeed, we sever or un-sever the database (so it takes about 15 seconds to switch modes).
- If the database is currently marked unhealthy, we reduce timeouts and retries when connecting to it.
Test Plan:
- Configured a bad `master`.
- Browsed around for a bit, initially saw "unrechable master" errors.
- After about 15 seconds, saw "major interruption" errors instead.
- Fixed the config for `master`.
- Browsed around for a while longer.
- After about 15 seconds, things recovered.
- Used "Cluster Databases" console to keep an eye on health checks: it now shows how many recent health checks were good:
{F1213397}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15677