1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-23 22:10:55 +01:00
Commit graph

12336 commits

Author SHA1 Message Date
epriestley
cd8491ae93 Fix "daemons running as wrong user" setup issue
Summary:
Fixes T9385. This was accidentally mangled a bit a long time ago by D12797, which was a 1,000-file change which got almost everything right.

Simplify the message and fix all the `%s` conversions and how they map to parameters.

Test Plan: {F1220400}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9385

Differential Revision: https://secure.phabricator.com/D15722
2016-04-15 10:03:15 -07:00
epriestley
20bad9a4ba Reset umask to 022 for all Phabricator processes
Summary:
Fixes T7475. If you do something like:

  $ umask 123
  $ ./bin/phd start

...the daemons might inherit the weird umask, do a `git fetch` with the weird umask, and end up creating files with weird permissions in repositories.

Instead, just normalize the umask to 022 in all cases. This is overwhelmingly the most common setting, and the one we assume things are configured with.

(When we want to force permissions to a certain setting, we do so explicitly.)

Test Plan:
  - Added `var_dump(umask())` to observe umask.
  - Ran `bin/phd`, saw proper umask (`18`, which is decimal of `022` octal).
  - Set `umask 123`, then ran `bin/phd`, saw it correct properly again.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7475

Differential Revision: https://secure.phabricator.com/D15721
2016-04-15 10:03:01 -07:00
Povilas Balzaravicius Pawka
f05c3e41b9 Fixed localcommits include on getDiffDict
Summary: Ref T10808

Test Plan: Call `differential.querydiffs` method and expect 'local:commits' property be added to the result.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T10808

Differential Revision: https://secure.phabricator.com/D15710
2016-04-15 18:06:32 +03:00
lkassianik
899856a1d4 Links on badge card should be accessible
Summary: Ref T10710

Test Plan: Open user profile with badge, flip badge card, open awarder link.

Reviewers: epriestley, chad, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: chad, Korvin

Maniphest Tasks: T10710

Differential Revision: https://secure.phabricator.com/D15699
2016-04-15 07:55:31 -07:00
epriestley
686c02d54a Add a "memory.hint" parameter to Aphlict
Summary:
Ref T10696. By default, `node` uses 1.5GB, which is enormous overkill for this service and can crowd out other services if it's running next to things like a database on the same host.

Provide a configuration option to adjust it via `--max-old-space-size` and default to 256MB. It only seems to need about 30M locally, so this should be plenty of headroom.

Test Plan:
Ran `bin/aphlict debug`, things seemed OK.

It takes a long time (days?) to grow to 1.5GB so I can't easily test this locally without a lot of work, but I'll keep an eye on it in production.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10696

Differential Revision: https://secure.phabricator.com/D15720
2016-04-15 07:14:45 -07:00
epriestley
0534002894 Add coverage tooltips in Diffusion file browse mode
Summary: Fixes T10816. The way these work is a little unusual since these chunks of file-rendering code are unusuall performance-sensitive, so the Differential version doesn't adapt directly to Diffusion. Both can possibly be unified at some point in the future, although they do slightly different things.

Test Plan: {F1220170}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10816

Differential Revision: https://secure.phabricator.com/D15719
2016-04-15 06:59:38 -07:00
epriestley
fad9e043c3 Fix bad cluster.database in documentation
Summary: This is mistaken.

Test Plan: Careful inspection, `grep` for other mistakes.

Reviewers: chad, eadler

Reviewed By: eadler

Differential Revision: https://secure.phabricator.com/D15717
2016-04-14 15:00:03 -07:00
epriestley
7852ec1619 Use --master-data, not --dump-slave, in bin/storage dump
Summary: These flags do slightly different things, I actually want --master-data here. My test databases are setup half-weird and work with either statement, which is why I missed this.

Test Plan: Ran a dump against master, got the right CHANGE MASTER statement with no warnings.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D15716
2016-04-14 14:56:21 -07:00
epriestley
bbb321395a Support Aphlict clustering
Summary:
Ref T6915. This allows multiple notification servers to talk to each other:

  - Every server has a list of every other server, including itself.
  - Every server generates a unique fingerprint at startup, like "XjeHuPKPBKHUmXkB".
  - Every time a server gets a message, it marks it with its personal fingerprint, then sends it to every other server.
  - Servers do not retransmit messages that they've already seen (already marked with their fingerprint).
  - Servers learn other servers' fingerprints after they send them a message, and stop sending them messages they've already seen.

This is pretty crude, and the first message to a cluster will transmit N^2 times, but N is going to be like 3 or 4 in even the most extreme cases for a very long time.

The fingerprinting stops cycles, and stops servers from sending themselves copies of messages.

We don't need to do anything more sophisticated than this because it's fine if some notifications get lost when a server dies. Clients will reconnect after a short period of time and life will continue.

Test Plan:
  - Wrote two server configs.
  - Started two servers.
  - Told Phabricator about all four services.
  - Loaded Chrome and Safari.
  - Saw them connect to different servers.
  - Sent messages in one, got notifications in the other (magic!).
  - Saw the fingerprinting stuff work on the console, no infinite retransmission of messages, etc.

(This pretty much just worked when I ran it the first time so I probably missed something?)

{F1218835}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6915

Differential Revision: https://secure.phabricator.com/D15711
2016-04-14 13:26:30 -07:00
epriestley
5a0b7398ca Give bin/storage some replica-aware options
Summary:
Fixes T10758.

  - Adds a "--host" flag. If you specify this, we read your cluster config. This lets you dump from a replica.
  - Adds a "--for-replica" flag to `storage dump`. This makes `mysqldump` include a `CHANGE MASTER ...` statement in the output, which is useful when setting up a replica for the first time.

Test Plan:
  - Dumped master and replica cluster databases.
  - Dumped non-cluster databases.
  - Ran various other commands (help, status, etc).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10758

Differential Revision: https://secure.phabricator.com/D15714
2016-04-14 13:23:35 -07:00
epriestley
1b2b84ce1f Use monospaced font in Passphrase "Reveal Secret" dialog
Summary: Fixes T10812. Make it easier to disambiguate great passwords like `iI|l1oO()thenumber1nospellitout`.

Test Plan: {F1219074}

Reviewers: chad, yelirekim

Reviewed By: yelirekim

Maniphest Tasks: T10812

Differential Revision: https://secure.phabricator.com/D15715
2016-04-14 13:09:52 -07:00
Chad Little
7b16f5d807 Convert Project Pages to new UI
Summary: Updating the subproject and member pages in Projects to new UI

Test Plan: Visit a subproject parent page, visit members pages

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15687
2016-04-14 18:26:43 +00:00
Chad Little
eae82c51f5 Minor quality of life updates to Phurl
Summary: Default to "All" (maybe "Active" in the future). Adds more info to results.

Test Plan: visit /phurl/, see additional information about URL

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15713
2016-04-14 18:25:22 +00:00
epriestley
07fc8f17cc Support "ssl.chain" in Aphlict configuration
Summary: Fixes T10806. Although browsers don't seem to care about this, it's more correct to support it, and the new test console uses normal `cURL` and does care.

Test Plan:
  - Hit the error case for providing a chain but no key/cert.
  - Used `openssl s_client -connect localhost:22280` to connect to local Aphlict servers.
  - With SSL but no chain, saw `openssl` fail to verify the remote.
  - With SSL and a chain, saw `openssl` verify the identify of the remote.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10806

Differential Revision: https://secure.phabricator.com/D15709
2016-04-14 10:41:21 -07:00
epriestley
383ae7621f Fix a "websokket" typo
Summary: Typo fix from D15703 that I overlooked.

Test Plan: Careful inspection.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D15708
2016-04-14 10:40:54 -07:00
epriestley
ca6da4c2be When proxying the SVN protocol, don't mutate URIs in protocol frames if we're an intracluster proxy
Summary:
Ref T10809. Currently, both the proxy and target may mutate URIs (rewriting "svn+ssh://x/diffusion/Y/" to a path on disk).

I believe this previously worked by fate/chance/luck since both URI variants contain the repository information, but the algorithms were tightened up recently with callsign removal.

Stop rewriting them if we're the intracluster proxy -- they only need to be rewritten on the target host.

Test Plan:
  - Checked out a proxied SVN repository, with and without a callsign.
  - Checked out an unproxied SVN repository, with and without a callsign.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10809

Differential Revision: https://secure.phabricator.com/D15712
2016-04-14 10:15:23 -07:00
epriestley
d4bf2a147b Make paths and Aphlict instance names less ambiguous
Summary:
Fixes T10783 (what little of it remains). Ref T10697.

Aphlict currently uses request paths for two different things:

  - multi-tenant instancing in the Phacility cluster (each instance gets its own namespace within an Aphlict server);
  - some users configure nginx and apache to do proxying or SSL termination based on the path.

Currently, these can collide.

Put a "~" before the instance name to make it unambiguous. At some point we can possibly just use a GET parameter, but I think there was some reason I didn't do that originally and this sequence of changes is disruptive enough already.

Test Plan: Saw local Aphlict unambiguously recognize "local.phacility.com" as instance "local", with a "~"-style URI.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697, T10783

Differential Revision: https://secure.phabricator.com/D15705
2016-04-14 04:57:21 -07:00
epriestley
2930733ac9 Complete modernization of Aphlict configuration
Summary:
Fixes T10697. This finishes bringing the rest of the config up to cluster power levels.

Phabricator is now given an arbitrarily long list of notification servers.

Each Aphlict server is given an arbitrarily long list of ports to run services on.

Users are free to make them meet in the middle by proxying whatever they want to whatever else they want.

This should also accommodate clustering fairly easily in the future.

Also rewrote the status UI and changed a million other things. 🐗

Test Plan:
{F1217864}

{F1217865}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697

Differential Revision: https://secure.phabricator.com/D15703
2016-04-14 04:57:00 -07:00
epriestley
c6b0925954 Move Aphlict logging and PID configuration options to config file
Summary: Ref T10697. Mostly straightforward. Also allow the server to have multiple logs and log options in the future (e.g., different verbosities or separate admin/client logs or whatever). No specific plans for this, but the default log is pretty noisy today.

Test Plan: Set up a couple of logs, started server, saw it log to them.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697

Differential Revision: https://secure.phabricator.com/D15702
2016-04-14 04:55:19 -07:00
epriestley
c84dee522b Move server-related Aphlict options to a configuration file
Summary: Ref T10697. This isn't everything but starts generalizing options and moving us toward a cluster-ready state of affairs.

Test Plan: Started server in various configurations, hit most (all?) of the error cases with bad configs, sent test notifications.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697

Differential Revision: https://secure.phabricator.com/D15701
2016-04-14 04:54:42 -07:00
epriestley
e32ce529d7 Begin generalizing Aphlict server to prepare for clustering/sensible config file
Summary:
Ref T10697. Currently, `aphlict` takes a ton of command line flags to configure exactly one admin server and exactly one client server.

I want to replace this with a config file. Additionally, I plan to support:

  - arbitrary numbers of listening client ports;
  - arbitrary numbers of listening admin ports;
  - SSL on any port.

For now, just transform the arguments to look like they're a config file. In the future, I'll load from a config file instead.

This greater generality will allow you to do stuff like run separate HTTP and HTTPS admin ports if you really want. I don't think there's a ton of use for this, but it tends to make the code cleaner anyway and there may be some weird cross-datacneter cases for it. Certainly, we undershot with the initial design and lots of users want to terminate SSL in nginx and run only HTTP on this server.

(Some sort-of-plausible use cases are running separate HTTP and HTTPS client servers, if your Phabricator install supports both, or running multiple HTTPS servers with different certificates if you have a bizarre VPN.)

Test Plan: Started Aphlict, connected to it, sent myself test notifications, viewed status page, reviewed logfile.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697

Differential Revision: https://secure.phabricator.com/D15700
2016-04-14 04:54:20 -07:00
epriestley
a2588d62e7 Minor bin/aphlict cleanup
Summary: Ref T10697. This just improves a couple of minor `bin/aphlict` things: make argument parsing more explicit/consistent, consolidate a little bit of duplicated code.

Test Plan: Ran all `bin/aphlict` commands.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10697

Differential Revision: https://secure.phabricator.com/D15698
2016-04-14 04:53:55 -07:00
June Rhodes
7150aa8e19 Use Conduit in PhabricatorRepositoryGitCommitChangeParserWorker
Summary:
Ref T2783.  This allows this worker to run on a machine different to the one that stores the repository, by routing the execution of Git over Conduit calls.

This API method is super gross, but fixing it isn't straightforward and it runs into other complicated considerations. We can fix it later; for now, just define it as "internal" to limit how much mess this creates.

"Internal" methods do not appear on the console.

Test Plan: Ran `bin/repository reparse --change <commit> --trace` on several commits, saw daemons make a Conduit call instead of running a `git` command.

Reviewers: hach-que, chad

Reviewed By: chad

Subscribers: joshuaspence, Korvin, epriestley

Maniphest Tasks: T2783

Differential Revision: https://secure.phabricator.com/D11874
2016-04-14 04:53:03 -07:00
epriestley
0379cc10ac Fixes T10805. When clustering is not configured, this check should just
return.

Auditors: chad
2016-04-14 04:32:20 -07:00
Chad Little
91479e2832 Fix header in Badges
Summary: Use normal casing

Test Plan: Read

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15707
2016-04-13 20:44:43 -07:00
epriestley
c0428b4d6d Make Git prompt for passwords when the user provides a username but not a password
Summary: Fixes T10797. This seems to fix things on my local system.

Test Plan:
  - Cloned with a username, got prompted for a password.
  - Cloned with a username + password.
  - Cloned with a username + bad password (error).

Reviewers: chad

Reviewed By: chad

Subscribers: Grimeh

Maniphest Tasks: T10797

Differential Revision: https://secure.phabricator.com/D15706
2016-04-13 18:59:45 -07:00
Andy Reitz
3876d6b439 Fix some typos in the new cluster docs
Summary:
While reading the new cluster docs, I noticed a few minor typos, and one
section that seemed to be incomplete and redundant, so I just removed it.

Test Plan: none.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: chad, Korvin, jshirley

Differential Revision: https://secure.phabricator.com/D15704
2016-04-13 16:30:40 -07:00
epriestley
66366137ff Don't apply security.require-https to intracluster requests
Summary:
Ref T10784. Currently, if you terminate SSL at a load balancer (very common) and use HTTP beyond that, you have to fiddle with this setting in your premable or a `SiteConfig`.

On the balance I think this makes stuff much harder to configure without any real security benefit, so don't apply this option to intracluster requests.

Also document a lot of stuff.

Test Plan: Poked around locally but this is hard to test outside of a production cluster, I'll vet it more thoroughly on `secure`.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10784

Differential Revision: https://secure.phabricator.com/D15696
2016-04-13 12:51:41 -07:00
epriestley
99be132ea2 Allow public users to make intracluster API requests
Summary:
Ref T10784. On `secure`, logged-out users currently can't browse repositories when cluster/service mode is enabled because they aren't permitted to make intracluster requests.

We don't allow totally public external requests (they're hard to rate limit and users might write bots that polled `feed.query` or whatever which we'd have no way to easily disable) but it's fine to allow intracluster public requests.

Test Plan: Browsed a clustered repository while logged out locally.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10784

Differential Revision: https://secure.phabricator.com/D15695
2016-04-13 12:51:16 -07:00
Chad Little
abf37aa979 Fix Passphrase Credential dialog
Summary: Fixes T10772, not sure why this fails, but reverting the code back to old dialog call works.

Test Plan:
  - Try to add a new credential when importing a repository.
  - Also created a new credential normally, via Passphrase.
  - Also edited a credential.

Reviewers: chad

Reviewed By: chad

Subscribers: Korvin

Maniphest Tasks: T10772

Differential Revision: https://secure.phabricator.com/D15691
2016-04-12 20:09:55 -07:00
epriestley
afb0f7c7af Clean up some old cluster-ish documentation
Summary:
Ref T10751. We currently have a placeholder Almanac document, and a fairly-bad-advice section in Daemons.

Pull these into the modern cluster documentation.

Test Plan: 17 phabricator PHDs

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10751

Differential Revision: https://secure.phabricator.com/D15689
2016-04-12 19:14:08 -07:00
epriestley
33060d1652 Ignore post-write repository synchronization if no devices are configured
Summary: Fixes T10789. If we aren't configured with a device, we never grabbed a lock in the first place, and should not expect one to be held.

Test Plan: Pushed non-cluster-configured Git SSH repository.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10789

Differential Revision: https://secure.phabricator.com/D15692
2016-04-12 18:31:11 -07:00
epriestley
110223c1a7 Fix pretty drawings
Summary: Changes elsewhere which support spaces before "|" when defining a table so that tables quote properly also accidentally changed these beautiful drawings into remarkup tables.

Test Plan: (( o.O ))

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D15690
2016-04-12 10:55:03 -07:00
epriestley
4244cad990 Move toward multi-master replicated repositories
Summary:
Ref T4292. This mostly implements the locking/versioning logic for multi-master repositories. It is only active on Git SSH pathways, and doesn't actually do anything useful yet: it just does bookkeeping so far.

When we read (e.g., `git fetch`) the logic goes like this:

- Get the read lock (unique to device + repository).
  - Read all the versions of the repository on every other device.
  - If any node has a newer version:
    - Fetch the newer version.
    - Increment our version to be the same as the version we fetched.
- Release the read lock.
- Actually do the fetch.

This makes sure that any time you do a read, you always read the most recently acknowledged write. You may have to wait for an internal fetch to happen (this isn't actually implemented yet) but the operation will always work like you expect it to.

When we write (e.g., `git push`) the logic goes like this:

- Get the write lock (unique to the repository).
  - Do all the read steps so we're up to date.
  - Mark a write pending.
    - Do the actual write.
  - Bump our version and mark our write finished.
- Release the write lock.

This allows you to write to any replica. Again, you might have to wait for a fetch first, but everything will work like you expect.

There's one notable failure mode here: if the network connection between the repository node and the database fails during the write, the write lock might be released even though a write is ongoing.

The "isWriting" column protects against that, by staying locked if we lose our connection to the database. This will currently "freeze" the repository (prevent any new writes) until an administrator can sort things out, since it'd dangerous to continue doing writes (we may lose data).

(Since we won't actually acknowledge the write, I think, we could probably smooth this out a bit and make it self-healing //most// of the time: basically, have the broken node rewind itself by updating from another good node. But that's a little more complex.)

Test Plan:
  - Pushed changes to a cluster-mode repository.
  - Viewed web interface, saw "writing" flag and version changes.
  - Pulled changes.
  - Faked various failures, got sensible states.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15688
2016-04-12 08:57:57 -07:00
epriestley
58eef68b7c Rough cut of repository cluster status panel
Summary:
Ref T4292. This adds some very basic cluster/device data to the new management view. Nothing interesting yet.

Also deal with disabled bindings a little more cleanly.

Test Plan: {F1214619}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15685
2016-04-12 05:38:10 -07:00
epriestley
8a153c1fe9 Rough cut at new "pro" Diffusion edit UI skeleton
Summary:
Ref T4292. This puts a very rough skeleton in place for the new "Manage Repository" UI, somewhat similar to the "Settings" UI.

Right now, it has one panel with no content, and is not reachable from the UI.

Test Plan: {F1214525}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15683
2016-04-12 05:37:54 -07:00
epriestley
0216fac30a Make PullLocal smart about which repositories it should pull
Summary:
Ref T10756. When repositories are properly configured for the cluster (which is hard to set up today), be smart about which repositories are expected to exist on the current host, and only pull them.

This generally allows daemons to pretty much do the right thing no matter how many copies are running, although there may still be some lock contention issues that need to be sorted out.

Test Plan: {F1214483}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10756

Differential Revision: https://secure.phabricator.com/D15682
2016-04-12 05:37:30 -07:00
Chad Little
3f9e8e675b Tidy up project panel errors
Summary: Makes the fallback a little cleaner on long titles. Wow we have a lot of error states here.

Test Plan: New Milestone with no tasks.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15686
2016-04-11 23:13:18 +00:00
Chad Little
75ba6058b9 Update to FontAwesome 4.6.0
Summary: Updates to the latest icon packages

Test Plan: View UIExamples, find new Icons

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15684
2016-04-11 11:34:59 -07:00
lkassianik
85d2fda082 First stab at a badges typeahead
Summary: Ref T10702

Test Plan: Open a user profile, attempt to award an archived or previously awarded badge, badges dialog should provide a typeahead, and the suggestions should offer details about whether a badge is archived or already awarded.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T10702

Differential Revision: https://secure.phabricator.com/D15665
2016-04-11 10:33:07 -07:00
Chad Little
6b40cfaa60 Fix spelling error
Summary: Ran into this, correct spelling.

Test Plan: read

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D15681
2016-04-11 09:26:08 -07:00
epriestley
ac35246d0d Never sever non-cluster database; write more read-only documentation
Summary:
Ref T4571. Write more of the missing documentation sections and clarify a few things.

Since the "replicating master" check needs a special permission, imposes a performance penalty, is probably very difficult to misconfigure, and likely not a big deal anyway, just drop the idea of trying to automatically detect + prevent it. We still show if it's an issue on the status page, provided we have permission to check.

When you don't have any cluster databases configured, never stop trying to connect to the default master database. We might want to do this eventually as load reduction, but just don't muddy the waters too much for now while things stabilize.

Test Plan:
  - Tested functionality in cluster, non-cluster, and degraded-cluster modes.
  - Used status console to monitor a health check cycle.
  - Read docs.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15679
2016-04-11 08:44:11 -07:00
epriestley
ebff07d019 Automatically sever databases after prolonged unreachability
Summary:
Ref T4571. When a database goes down briefly, we fall back to replicas.

However, this fallback is slow (not good for users) and keeps sending a lot of traffic to the master (might be bad if the root cause is load-related).

Keep track of recent connections and fully degrade into "severed" mode if we see a sequence of failures over a reasonable period of time. In this mode, we send much less traffic to the master (faster for users; less load for the database).

We do send a little bit of traffic still, and if the master recovers we'll recover back into normal mode seeing several connections in a row succeed.

This is similar to what most load balancers do when pulling web servers in and out of pools.

For now, the specific numbers are:

  - We do at most one health check every 3 seconds.
  - If 5 checks in a row fail or succeed, we sever or un-sever the database (so it takes about 15 seconds to switch modes).
  - If the database is currently marked unhealthy, we reduce timeouts and retries when connecting to it.

Test Plan:
  - Configured a bad `master`.
  - Browsed around for a bit, initially saw "unrechable master" errors.
  - After about 15 seconds, saw "major interruption" errors instead.
  - Fixed the config for `master`.
  - Browsed around for a while longer.
  - After about 15 seconds, things recovered.
  - Used "Cluster Databases" console to keep an eye on health checks: it now shows how many recent health checks were good:

{F1213397}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15677
2016-04-11 08:43:52 -07:00
epriestley
5cf09f567a Fix an issue with date parsing when viewer timezone differs from server timezone
Summary:
The way `DateTime` works with epochs is weird, I goofed this by having my server/viewer timezone the same and not noticing.

Also fix an issue where you do `?epoch=...` and then manually fiddle with the control: the control should win.

Test Plan:
  - Set viewer and server timezone to different vlaues.
  - Created a countdown using `?epoch=...`.
  - Created a countdown using `?epoch=...` and fiddling with date controls.
  - Created and edited a countdown using date/time control.
  - Poked around Calendar to make sure I didn't ruin anything this time (browsed, created event, edited event).

Reviewers: lpriestley, chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D15680
2016-04-11 07:47:37 -07:00
epriestley
146fb646f9 Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.

If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.

Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.

This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.

Test Plan:
  - Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
  - Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
  - Made stuff work, hit some good pages.
  - Hit some non-cluster pages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 12:20:13 -07:00
epriestley
e0a8cac703 When no master database is configured, automatically degrade to read-only mode
Summary: Ref T4571. If `cluster.databases` is configured but only has replicas, implicitly drop to read-only mode and send writes to a replica.

Test Plan:
  - Disabled the `master`, saw Phabricator automatically degrade into read-only mode against replicas.
  - (Also tested: explicit read-only mode, non-cluster mode, properly configured cluster mode).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15672
2016-04-10 12:19:55 -07:00
epriestley
071741c61d When Phabricator is in read-only mode, explain why
Summary:
Ref T4571. Allows users to click the "read-only mode" notification to get more information about why an install is in read-only mode.

Installs can be in this mode for several reasons (explicit administrative action, no masters defined, no masters reachable), and it's useful to be able to tell the difference.

Test Plan: {F1212930}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15671
2016-04-10 12:19:18 -07:00
epriestley
c178f29cdb Use new first-class MySQL timeout support in Phabricator
Summary: Fixes T6710. After D15669, we support a proper timeout parameter, so we don't need this hack anymore.

Test Plan: See D15669: forced a MySQL connector, set a low timeout, set a bad database, saw fast failures.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6710

Differential Revision: https://secure.phabricator.com/D15670
2016-04-10 12:19:00 -07:00
epriestley
6a4a9bb2d2 When cluster.databases is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:

  - When `cluster.databases` is set up, most things ignore `mysql.host` now.
  - You can `bin/storage upgrade` and stuff works.
  - You can browse around in the web UI and stuff works.

There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).

Test Plan:
  - Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
  - Ran `bin/storage upgrade`, got a new install setup on them properly.
  - Survived setup warnings, browsed around.
  - Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
  - Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571, T10758, T10759

Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 12:18:42 -07:00
epriestley
0439645d5b Add a "Database Cluster Status" console in Config
Summary: Ref T4571. The configuration option still doesn't do anything, but add a status panel for basic setup monitoring.

Test Plan:
Here's what a good version looks like:

{F1212291}

Also faked most of the errors it can detect and got helpful diagnostic messages like this:

{F1212292}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4571

Differential Revision: https://secure.phabricator.com/D15667
2016-04-09 20:34:13 -07:00