1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-26 16:52:41 +01:00

Make "bin/repository thaw" workflow more clear when devices are disabled

Summary:
Ref T13216. See PHI943. If autoscale lightning strikes all your servers at once and destroys them, the path to recovery can be unclear. You're "supposed" to:

  - demote all the devices;
  - disable the bindings;
  - bind the new servers;
  - put whatever working copies you can scrape up back on disk;
  - promote one of the new servers.

However, the documentation is a bit misleading (it was sort of written with "you lost one or two devices" in mind, not "you lost every device") and demote-before-disable is unnecessary and slightly risky if servers come back online. There's also a missing guardrail before the promote step which lets you accidentally skip the demotion step and end up in a confusing state. Instead:

  - Add a guard rail: when you try to promote a new server, warn if inactive devices still have versions and tell the user to demote them.
  - Allow demotion of inactive devices: the order "disable, demote" is safer and more intuitive than "demote, disable" and there's no reason to require the unintuitive order.
  - Make the "cluster already has leaders" message more clear.
  - Make the documentation more clear.

Test Plan:
  - Bound a repository to two devices.
  - Wrote to A to make it a leader, then disabled it (simulating a lightning strike).
  - Tried to promote B. Got a new, useful error ("demote A first").
  - Demoted A (before: error about demoting inactive devices; now: works fine).
  - Promoted B. This worked.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19793
This commit is contained in:
epriestley 2018-11-07 10:08:51 -08:00
parent c32fa06266
commit 2a7ac8e388
2 changed files with 74 additions and 40 deletions

View file

@ -120,33 +120,71 @@ final class PhabricatorRepositoryManagementThawWorkflow
$repository->getDisplayName())); $repository->getDisplayName()));
} }
if ($promote) {
// You can only promote active devices. (You may demote active or
// inactive devices.)
$bindings = $service->getActiveBindings(); $bindings = $service->getActiveBindings();
$bindings = mpull($bindings, null, 'getDevicePHID'); $bindings = mpull($bindings, null, 'getDevicePHID');
if (empty($bindings[$device->getPHID()])) { if (empty($bindings[$device->getPHID()])) {
throw new PhutilArgumentUsageException( throw new PhutilArgumentUsageException(
pht( pht(
'Repository "%s" has no active binding to device "%s". Only '. 'Repository "%s" has no active binding to device "%s". Only '.
'actively bound devices can be promoted or demoted.', 'actively bound devices can be promoted.',
$repository->getDisplayName(), $repository->getDisplayName(),
$device->getName())); $device->getName()));
} }
$versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions( $versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions(
$repository->getPHID()); $repository->getPHID());
$versions = mpull($versions, null, 'getDevicePHID'); $versions = mpull($versions, null, 'getDevicePHID');
$versions = array_select_keys($versions, array_keys($bindings));
if ($versions && $promote) { // Before we promote, make sure there are no outstanding versions on
throw new PhutilArgumentUsageException( // devices with inactive bindings. If there are, you need to demote
pht( // these first.
'Unable to promote "%s" for repository "%s": the leaders for '. $inactive = array();
'this cluster are not ambiguous.', foreach ($versions as $device_phid => $version) {
$device->getName(), if (isset($bindings[$device_phid])) {
$repository->getDisplayName())); continue;
}
$inactive[$device_phid] = $version;
}
if ($inactive) {
$handles = $viewer->loadHandles(array_keys($inactive));
$handle_list = iterator_to_array($handles);
$handle_list = mpull($handle_list, 'getName');
$handle_list = implode(', ', $handle_list);
throw new PhutilArgumentUsageException(
pht(
'Repository "%s" has versions on inactive devices. Demote '.
'(or reactivate) these devices before promoting a new '.
'leader: %s.',
$repository->getDisplayName(),
$handle_list));
}
// Now, make sure there are no outstanding versions on devices with
// active bindings. These also need to be demoted (or promoting is a
// mistake or already happened).
$active = array_select_keys($versions, array_keys($bindings));
if ($active) {
$handles = $viewer->loadHandles(array_keys($active));
$handle_list = iterator_to_array($handles);
$handle_list = mpull($handle_list, 'getName');
$handle_list = implode(', ', $handle_list);
throw new PhutilArgumentUsageException(
pht(
'Unable to promote "%s" for repository "%s" because this '.
'cluster already has one or more unambiguous leaders: %s.',
$device->getName(),
$repository->getDisplayName(),
$handle_list));
} }
if ($promote) {
PhabricatorRepositoryWorkingCopyVersion::updateVersion( PhabricatorRepositoryWorkingCopyVersion::updateVersion(
$repository->getPHID(), $repository->getPHID(),
$device->getPHID(), $device->getPHID(),

View file

@ -414,28 +414,24 @@ present on the leaders but not present on the followers by examining the
push logs. push logs.
If you are comfortable discarding these changes, you can instruct Phabricator If you are comfortable discarding these changes, you can instruct Phabricator
that it can forget about the leaders in two ways: disable the service bindings that it can forget about the leaders by doing this:
to all of the leader devices so they are no longer part of the cluster, or use
`bin/repository thaw` to `--demote` the leaders explicitly.
If you do this, **you will lose data**. Either action will discard any changes - Disable the service bindings to all of the leader devices so they are no
on the affected leaders which have not replicated to other devices in the longer part of the cluster.
cluster. - Then, use `bin/repository thaw` to `--demote` the leaders explicitly.
To remove a device from the cluster, disable all of the bindings to it To demote a device, run this command:
in Almanac, using the web UI.
{icon exclamation-triangle, color="red"} Any data which is only present on
the disabled device will be lost.
To demote a device without removing it from the cluster, run this command:
``` ```
phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net
``` ```
{icon exclamation-triangle, color="red"} Any data which is only present on {icon exclamation-triangle, color="red"} Any data which is only present on
**this** device will be lost. the demoted device will be lost.
If you do this, **you will lose unreplicated data**. You will discard any
changes on the affected leaders which have not replicated to other devices
in the cluster.
Ambiguous Leaders Ambiguous Leaders