Summary:
Ref T6915. This allows multiple notification servers to talk to each other:
- Every server has a list of every other server, including itself.
- Every server generates a unique fingerprint at startup, like "XjeHuPKPBKHUmXkB".
- Every time a server gets a message, it marks it with its personal fingerprint, then sends it to every other server.
- Servers do not retransmit messages that they've already seen (already marked with their fingerprint).
- Servers learn other servers' fingerprints after they send them a message, and stop sending them messages they've already seen.
This is pretty crude, and the first message to a cluster will transmit N^2 times, but N is going to be like 3 or 4 in even the most extreme cases for a very long time.
The fingerprinting stops cycles, and stops servers from sending themselves copies of messages.
We don't need to do anything more sophisticated than this because it's fine if some notifications get lost when a server dies. Clients will reconnect after a short period of time and life will continue.
Test Plan:
- Wrote two server configs.
- Started two servers.
- Told Phabricator about all four services.
- Loaded Chrome and Safari.
- Saw them connect to different servers.
- Sent messages in one, got notifications in the other (magic!).
- Saw the fingerprinting stuff work on the console, no infinite retransmission of messages, etc.
(This pretty much just worked when I ran it the first time so I probably missed something?)
{F1218835}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6915
Differential Revision: https://secure.phabricator.com/D15711
Summary:
Fixes T10758.
- Adds a "--host" flag. If you specify this, we read your cluster config. This lets you dump from a replica.
- Adds a "--for-replica" flag to `storage dump`. This makes `mysqldump` include a `CHANGE MASTER ...` statement in the output, which is useful when setting up a replica for the first time.
Test Plan:
- Dumped master and replica cluster databases.
- Dumped non-cluster databases.
- Ran various other commands (help, status, etc).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10758
Differential Revision: https://secure.phabricator.com/D15714
Summary: Fixes T10812. Make it easier to disambiguate great passwords like `iI|l1oO()thenumber1nospellitout`.
Test Plan: {F1219074}
Reviewers: chad, yelirekim
Reviewed By: yelirekim
Maniphest Tasks: T10812
Differential Revision: https://secure.phabricator.com/D15715
Summary: Updating the subproject and member pages in Projects to new UI
Test Plan: Visit a subproject parent page, visit members pages
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15687
Summary: Default to "All" (maybe "Active" in the future). Adds more info to results.
Test Plan: visit /phurl/, see additional information about URL
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15713
Summary: Fixes T10806. Although browsers don't seem to care about this, it's more correct to support it, and the new test console uses normal `cURL` and does care.
Test Plan:
- Hit the error case for providing a chain but no key/cert.
- Used `openssl s_client -connect localhost:22280` to connect to local Aphlict servers.
- With SSL but no chain, saw `openssl` fail to verify the remote.
- With SSL and a chain, saw `openssl` verify the identify of the remote.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10806
Differential Revision: https://secure.phabricator.com/D15709
Summary: Typo fix from D15703 that I overlooked.
Test Plan: Careful inspection.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D15708
Summary:
Ref T10809. Currently, both the proxy and target may mutate URIs (rewriting "svn+ssh://x/diffusion/Y/" to a path on disk).
I believe this previously worked by fate/chance/luck since both URI variants contain the repository information, but the algorithms were tightened up recently with callsign removal.
Stop rewriting them if we're the intracluster proxy -- they only need to be rewritten on the target host.
Test Plan:
- Checked out a proxied SVN repository, with and without a callsign.
- Checked out an unproxied SVN repository, with and without a callsign.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10809
Differential Revision: https://secure.phabricator.com/D15712
Summary:
Fixes T10783 (what little of it remains). Ref T10697.
Aphlict currently uses request paths for two different things:
- multi-tenant instancing in the Phacility cluster (each instance gets its own namespace within an Aphlict server);
- some users configure nginx and apache to do proxying or SSL termination based on the path.
Currently, these can collide.
Put a "~" before the instance name to make it unambiguous. At some point we can possibly just use a GET parameter, but I think there was some reason I didn't do that originally and this sequence of changes is disruptive enough already.
Test Plan: Saw local Aphlict unambiguously recognize "local.phacility.com" as instance "local", with a "~"-style URI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697, T10783
Differential Revision: https://secure.phabricator.com/D15705
Summary:
Fixes T10697. This finishes bringing the rest of the config up to cluster power levels.
Phabricator is now given an arbitrarily long list of notification servers.
Each Aphlict server is given an arbitrarily long list of ports to run services on.
Users are free to make them meet in the middle by proxying whatever they want to whatever else they want.
This should also accommodate clustering fairly easily in the future.
Also rewrote the status UI and changed a million other things. 🐗
Test Plan:
{F1217864}
{F1217865}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15703
Summary: Ref T10697. Mostly straightforward. Also allow the server to have multiple logs and log options in the future (e.g., different verbosities or separate admin/client logs or whatever). No specific plans for this, but the default log is pretty noisy today.
Test Plan: Set up a couple of logs, started server, saw it log to them.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15702
Summary: Ref T10697. This isn't everything but starts generalizing options and moving us toward a cluster-ready state of affairs.
Test Plan: Started server in various configurations, hit most (all?) of the error cases with bad configs, sent test notifications.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15701
Summary: Ref T10697. This just improves a couple of minor `bin/aphlict` things: make argument parsing more explicit/consistent, consolidate a little bit of duplicated code.
Test Plan: Ran all `bin/aphlict` commands.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10697
Differential Revision: https://secure.phabricator.com/D15698
Summary:
Ref T2783. This allows this worker to run on a machine different to the one that stores the repository, by routing the execution of Git over Conduit calls.
This API method is super gross, but fixing it isn't straightforward and it runs into other complicated considerations. We can fix it later; for now, just define it as "internal" to limit how much mess this creates.
"Internal" methods do not appear on the console.
Test Plan: Ran `bin/repository reparse --change <commit> --trace` on several commits, saw daemons make a Conduit call instead of running a `git` command.
Reviewers: hach-que, chad
Reviewed By: chad
Subscribers: joshuaspence, Korvin, epriestley
Maniphest Tasks: T2783
Differential Revision: https://secure.phabricator.com/D11874
Summary: Fixes T10797. This seems to fix things on my local system.
Test Plan:
- Cloned with a username, got prompted for a password.
- Cloned with a username + password.
- Cloned with a username + bad password (error).
Reviewers: chad
Reviewed By: chad
Subscribers: Grimeh
Maniphest Tasks: T10797
Differential Revision: https://secure.phabricator.com/D15706
Summary:
While reading the new cluster docs, I noticed a few minor typos, and one
section that seemed to be incomplete and redundant, so I just removed it.
Test Plan: none.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: chad, Korvin, jshirley
Differential Revision: https://secure.phabricator.com/D15704
Summary:
Ref T10784. Currently, if you terminate SSL at a load balancer (very common) and use HTTP beyond that, you have to fiddle with this setting in your premable or a `SiteConfig`.
On the balance I think this makes stuff much harder to configure without any real security benefit, so don't apply this option to intracluster requests.
Also document a lot of stuff.
Test Plan: Poked around locally but this is hard to test outside of a production cluster, I'll vet it more thoroughly on `secure`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10784
Differential Revision: https://secure.phabricator.com/D15696
Summary:
Ref T10784. On `secure`, logged-out users currently can't browse repositories when cluster/service mode is enabled because they aren't permitted to make intracluster requests.
We don't allow totally public external requests (they're hard to rate limit and users might write bots that polled `feed.query` or whatever which we'd have no way to easily disable) but it's fine to allow intracluster public requests.
Test Plan: Browsed a clustered repository while logged out locally.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10784
Differential Revision: https://secure.phabricator.com/D15695
Summary: Fixes T10772, not sure why this fails, but reverting the code back to old dialog call works.
Test Plan:
- Try to add a new credential when importing a repository.
- Also created a new credential normally, via Passphrase.
- Also edited a credential.
Reviewers: chad
Reviewed By: chad
Subscribers: Korvin
Maniphest Tasks: T10772
Differential Revision: https://secure.phabricator.com/D15691
Summary:
Ref T10751. We currently have a placeholder Almanac document, and a fairly-bad-advice section in Daemons.
Pull these into the modern cluster documentation.
Test Plan: 17 phabricator PHDs
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10751
Differential Revision: https://secure.phabricator.com/D15689
Summary: Fixes T10789. If we aren't configured with a device, we never grabbed a lock in the first place, and should not expect one to be held.
Test Plan: Pushed non-cluster-configured Git SSH repository.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10789
Differential Revision: https://secure.phabricator.com/D15692
Summary: Changes elsewhere which support spaces before "|" when defining a table so that tables quote properly also accidentally changed these beautiful drawings into remarkup tables.
Test Plan: (( o.O ))
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D15690
Summary:
Ref T4292. This mostly implements the locking/versioning logic for multi-master repositories. It is only active on Git SSH pathways, and doesn't actually do anything useful yet: it just does bookkeeping so far.
When we read (e.g., `git fetch`) the logic goes like this:
- Get the read lock (unique to device + repository).
- Read all the versions of the repository on every other device.
- If any node has a newer version:
- Fetch the newer version.
- Increment our version to be the same as the version we fetched.
- Release the read lock.
- Actually do the fetch.
This makes sure that any time you do a read, you always read the most recently acknowledged write. You may have to wait for an internal fetch to happen (this isn't actually implemented yet) but the operation will always work like you expect it to.
When we write (e.g., `git push`) the logic goes like this:
- Get the write lock (unique to the repository).
- Do all the read steps so we're up to date.
- Mark a write pending.
- Do the actual write.
- Bump our version and mark our write finished.
- Release the write lock.
This allows you to write to any replica. Again, you might have to wait for a fetch first, but everything will work like you expect.
There's one notable failure mode here: if the network connection between the repository node and the database fails during the write, the write lock might be released even though a write is ongoing.
The "isWriting" column protects against that, by staying locked if we lose our connection to the database. This will currently "freeze" the repository (prevent any new writes) until an administrator can sort things out, since it'd dangerous to continue doing writes (we may lose data).
(Since we won't actually acknowledge the write, I think, we could probably smooth this out a bit and make it self-healing //most// of the time: basically, have the broken node rewind itself by updating from another good node. But that's a little more complex.)
Test Plan:
- Pushed changes to a cluster-mode repository.
- Viewed web interface, saw "writing" flag and version changes.
- Pulled changes.
- Faked various failures, got sensible states.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15688
Summary:
Ref T4292. This adds some very basic cluster/device data to the new management view. Nothing interesting yet.
Also deal with disabled bindings a little more cleanly.
Test Plan: {F1214619}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15685
Summary:
Ref T4292. This puts a very rough skeleton in place for the new "Manage Repository" UI, somewhat similar to the "Settings" UI.
Right now, it has one panel with no content, and is not reachable from the UI.
Test Plan: {F1214525}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15683
Summary:
Ref T10756. When repositories are properly configured for the cluster (which is hard to set up today), be smart about which repositories are expected to exist on the current host, and only pull them.
This generally allows daemons to pretty much do the right thing no matter how many copies are running, although there may still be some lock contention issues that need to be sorted out.
Test Plan: {F1214483}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10756
Differential Revision: https://secure.phabricator.com/D15682
Summary: Ref T10702
Test Plan: Open a user profile, attempt to award an archived or previously awarded badge, badges dialog should provide a typeahead, and the suggestions should offer details about whether a badge is archived or already awarded.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T10702
Differential Revision: https://secure.phabricator.com/D15665
Summary:
Ref T4571. Write more of the missing documentation sections and clarify a few things.
Since the "replicating master" check needs a special permission, imposes a performance penalty, is probably very difficult to misconfigure, and likely not a big deal anyway, just drop the idea of trying to automatically detect + prevent it. We still show if it's an issue on the status page, provided we have permission to check.
When you don't have any cluster databases configured, never stop trying to connect to the default master database. We might want to do this eventually as load reduction, but just don't muddy the waters too much for now while things stabilize.
Test Plan:
- Tested functionality in cluster, non-cluster, and degraded-cluster modes.
- Used status console to monitor a health check cycle.
- Read docs.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15679
Summary:
Ref T4571. When a database goes down briefly, we fall back to replicas.
However, this fallback is slow (not good for users) and keeps sending a lot of traffic to the master (might be bad if the root cause is load-related).
Keep track of recent connections and fully degrade into "severed" mode if we see a sequence of failures over a reasonable period of time. In this mode, we send much less traffic to the master (faster for users; less load for the database).
We do send a little bit of traffic still, and if the master recovers we'll recover back into normal mode seeing several connections in a row succeed.
This is similar to what most load balancers do when pulling web servers in and out of pools.
For now, the specific numbers are:
- We do at most one health check every 3 seconds.
- If 5 checks in a row fail or succeed, we sever or un-sever the database (so it takes about 15 seconds to switch modes).
- If the database is currently marked unhealthy, we reduce timeouts and retries when connecting to it.
Test Plan:
- Configured a bad `master`.
- Browsed around for a bit, initially saw "unrechable master" errors.
- After about 15 seconds, saw "major interruption" errors instead.
- Fixed the config for `master`.
- Browsed around for a while longer.
- After about 15 seconds, things recovered.
- Used "Cluster Databases" console to keep an eye on health checks: it now shows how many recent health checks were good:
{F1213397}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15677
Summary:
The way `DateTime` works with epochs is weird, I goofed this by having my server/viewer timezone the same and not noticing.
Also fix an issue where you do `?epoch=...` and then manually fiddle with the control: the control should win.
Test Plan:
- Set viewer and server timezone to different vlaues.
- Created a countdown using `?epoch=...`.
- Created a countdown using `?epoch=...` and fiddling with date controls.
- Created and edited a countdown using date/time control.
- Poked around Calendar to make sure I didn't ruin anything this time (browsed, created event, edited event).
Reviewers: lpriestley, chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D15680
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
Summary: Ref T4571. If `cluster.databases` is configured but only has replicas, implicitly drop to read-only mode and send writes to a replica.
Test Plan:
- Disabled the `master`, saw Phabricator automatically degrade into read-only mode against replicas.
- (Also tested: explicit read-only mode, non-cluster mode, properly configured cluster mode).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15672
Summary:
Ref T4571. Allows users to click the "read-only mode" notification to get more information about why an install is in read-only mode.
Installs can be in this mode for several reasons (explicit administrative action, no masters defined, no masters reachable), and it's useful to be able to tell the difference.
Test Plan: {F1212930}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15671
Summary: Fixes T6710. After D15669, we support a proper timeout parameter, so we don't need this hack anymore.
Test Plan: See D15669: forced a MySQL connector, set a low timeout, set a bad database, saw fast failures.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6710
Differential Revision: https://secure.phabricator.com/D15670
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
Summary: Ref T4571. The configuration option still doesn't do anything, but add a status panel for basic setup monitoring.
Test Plan:
Here's what a good version looks like:
{F1212291}
Also faked most of the errors it can detect and got helpful diagnostic messages like this:
{F1212292}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15667
Summary:
Ref T4571. This adds a new option which allows you to upgrade your one-host configuration to a multi-host configuration by configuring it.
Doing this currently does nothing. I wrote a lot of words about what it is //supposed// to do in the future, though.
Test Plan:
- Tried to configure the option in all the possible bad ways, got errors.
- Read documentation.
Reviewers: chad
Reviewed By: chad
Subscribers: eadler
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15663
Summary:
Ref T4571. There will be a very long path beyond this, but add a basic read-only mode. You can explicitly enable this to put Phabricator in a sort of "maintenance" mode today if you're swapping databases or something.
In the long term, we'll automatically degrade into this mode if the master database is down.
Test Plan:
- Enabled read-only mode.
- Browsed around.
- Didn't immediately see anything that was totally 100% broken.
Most stuff is 80-90% broken right now. For example:
- Stuff like submitting comments doesn't work, and gives you a confusing, unhelpful error.
- None of the UI really knows that it's read-only. EditEngine stuff should all hide itself and say "you can't add new comments while an install is in read-only mode", for example, but currently does not.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15662
Summary: Recurring events will fatal a Calendar with this not set. `newDateTime` requires a date and time to be called property. I think this is correct fix? Fixes T10766
Test Plan: Build a recurring event, pull up /calendar/, see recurring events as expected. Previously, fatal.
Reviewers: lpriestley, epriestley
Reviewed By: epriestley
Subscribers: CodeMouse92, Korvin
Maniphest Tasks: T10766
Differential Revision: https://secure.phabricator.com/D15666
Summary: Testing out a new 'nav' layout in Settings / Config. Spent a few days here and couldn't find much better overall.
Test Plan: View each page in Settings and in Config. Save some config options. Test mobile, desktop, tablet.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D15659
Summary:
I think this fixes the Mercurial + HTTP cluster issue. PHP adds `HTTP_` but we were not stripping it, so we would convert an `X-Whatever-Zebra` header into an `Http-X-Whatever-Zebra` header.
I don't think this behavior has changed? So maybe it just never worked? Git is more popular than Mercurial and SSH is easier to configure than HTTP, so it's plausible. I'll keep a careful eye on this when it deploys.
Test Plan:
- Set up local service-based Mercurial repository.
- Tried to clone, got similar error to cluster.
- Applied patch, clean clone.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D15660
Summary: Fixes T5813, while I'm in here...
Test Plan: Sorted stuff by end date.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T5813
Differential Revision: https://secure.phabricator.com/D15657
Summary: Fixes T10684. Fixes T10520. This primarily implements a date/epoch field, and then does a bunch of standard plumbing.
Test Plan:
- Created countdowns.
- Edited countdowns.
- Used HTTP prefilling.
- Created a countdown ending on "Christmas Morning", etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10520, T10684
Differential Revision: https://secure.phabricator.com/D15655
Summary: Closes T10690
Test Plan: Open Badges application, go to Advanced Search, search for a badge by its name and see result.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: Korvin
Maniphest Tasks: T10690
Differential Revision: https://secure.phabricator.com/D15656