phorge-phorge/src/docs/user/cluster/cluster_repositories.diviner

@title Cluster: Repositories
@group cluster

Configuring Phabricator to use multiple repository hosts.

Overview
========

If you use Git, you can deploy Phabricator with multiple repository hosts,
configured so that each host is readable and writable. The advantages of doing
this are:

  - you can completely survive the loss of repository hosts;
  - reads and writes can scale across multiple machines; and
  - read and write performance across multiple geographic regions may improve.

This configuration is complex, and many installs do not need to pursue it.

This configuration is not currently supported with Subversion or Mercurial.


How Reads and Writes Work
=========================

Phabricator repository replicas are multi-master: every node is readable and
writable, and a cluster of nodes can (almost always) survive the loss of any
arbitrary subset of nodes so long as at least one node is still alive.

Phabricator maintains an internal version for each repository, and increments
it when the repository is mutated.

Before responding to a read, replicas make sure their version of the repository
is up to date (no node in the cluster has a newer version of the repository).
If it isn't, they block the read until they can complete a fetch.

Before responding to a write, replicas obtain a global lock, perform the same
version check and fetch if necessary, then allow the write to continue.

Additionally, repositories passively check other nodes for updates and
replicate changes in the background. After you push a change to a repository,
it will usually spread passively to all other repository nodes within a few
minutes.

Even if passive replication is slow, the active replication makes acknowledged
changes sequential to all observers: after a write is acknowledged, all
subsequent reads are guaranteed to see it. The system does not permit stale
reads, and you do not need to wait for a replication delay to see a consistent
view of the repository no matter which node you ask.


HTTP vs HTTPS
=============

Intracluster requests (from the daemons to repository servers, or from
webservers to repository servers) are permitted to use HTTP, even if you have
set `security.require-https` in your configuration.

It is common to terminate SSL at a load balancer and use plain HTTP beyond
that, and the `security.require-https` feature is primarily focused on making
client browser behavior more convenient for users, so it does not apply to
intracluster traffic.

Using HTTP within the cluster leaves you vulnerable to attackers who can
observe traffic within a datacenter, or observe traffic between datacenters.
This is normally very difficult, but within reach for state-level adversaries
like the NSA.

If you are concerned about these attackers, you can terminate HTTPS on
repository hosts and bind to them with the "https" protocol. Just be aware that
the `security.require-https` setting won't prevent you from making
configuration mistakes, as it doesn't cover intracluster traffic.

Other mitigations are possible, but securing a network against the NSA and
similar agents of other rogue nations is beyond the scope of this document.


Repository Hosts
================

Repository hosts must run a complete, fully configured copy of Phabricator,
including a webserver. They must also run a properly configured `sshd`.

If you are converting existing hosts into cluster hosts, you may need to
revisit @{article:Diffusion User Guide: Repository Hosting} and make sure
the system user accounts have all the necessary `sudo` permissions. In
particular, cluster devices need `sudo` access to `ssh` so they can read
device keys.

Generally, these hosts will run the same set of services and configuration that
web hosts run. If you prefer, you can overlay these services and put web and
repository services on the same hosts. See @{article:Clustering Introduction}
for some guidance on overlaying services.

When a user requests information about a repository that can only be satisfied
by examining a repository working copy, the webserver receiving the request
will make an HTTP service call to a repository server which hosts the
repository to retrieve the data it needs. It will use the result of this query
to respond to the user.


Setting up a Cluster Services
=============================

To set up clustering, first register the devices that you want to use as part
of the cluster with Almanac. For details, see @{article:Cluster: Devices}.

NOTE: Once you create a service, new repositories will immediately allocate
on it. You may want to disable repository creation during initial setup.

Once the hosts are registered as devices, you can create a new service in
Almanac:

  - First, register at least one device according to the device clustering
    instructions.
  - Create a new service of type **Phabricator Cluster: Repository** in
    Almanac.
  - Bind this service to all the interfaces on the device or devices.
  - For each binding, add a `protocol` key with one of these values:
    `ssh`, `http`, `https`.

For example, a service might look like this:

  - Service: `repos001.mycompany.net`
    - Binding: `repo001.mycompany.net:80`, `protocol=http`
    - Binding: `repo001.mycompany.net:2222`, `protocol=ssh`

The service itself has a `closed` property. You can set this to `true` to
disable new repository allocations on this service (for example, if it is
reaching capacity).


Migrating to Clustered Services
===============================

To convert existing repositories on an install into cluster repositories, you
will generally perform these steps:

  - Register the existing host as a cluster device.
  - Configure a single host repository service using //only// that host.

This puts you in a transitional state where repositories on the host can work
as either on-host repositories or cluster repositories. You can move forward
from here slowly and make sure services still work, with a quick path back to
safety if you run into trouble.

To move forward, migrate one repository to the service and make sure things
work correctly. If you run into issues, you can back out by migrating the
repository off the service.

To migrate a repository onto a cluster service, use this command:

```
$ ./bin/repository clusterize <repository> --service <service>
```

To migrate a repository back off a service, use this command:

```
$ ./bin/repository clusterize <repository> --remove-service
```

This command only changes how Phabricator connects to the repository; it does
not move any data or make any complex structural changes.

When Phabricator needs information about a non-clustered repository, it just
runs a command like `git log` directly on disk. When Phabricator needs
information about a clustered repository, it instead makes a service call to
another server, asking that server to run `git log` instead.

In a single-host cluster the server will make this service call to itself, so
nothing will really change. But this //is// an effective test for most
possible configuration mistakes.

If your canary repository works well, you can migrate the rest of your
repositories when ready (you can use `bin/repository list` to quickly get a
list of all repository monograms).

Once all repositories are migrated, you've reached a stable state and can
remain here as long as you want. This state is sufficient to convert daemons,
SSH, and web services into clustered versions and spread them across multiple
machines if those goals are more interesting.

Obviously, your single-device "cluster" will not be able to survive the loss of
the single repository host, but you can take as long as you want to expand the
cluster and add redundancy.

After creating a service, you do not need to `clusterize` new repositories:
they will automatically allocate onto an open service.

When you're ready to expand the cluster, continue below.


Expanding a Cluster
===================

To expand an existing cluster, follow these general steps:

  - Register new devices in Almanac.
  - Add bindings to the new devices to the repository service, also in Almanac.
  - Start the daemons on the new devices.

For instructions on configuring and registering devices, see
@{article:Cluster: Devices}.

As soon as you add active bindings to a service, Phabricator will begin
synchronizing repositories and sending traffic to the new device. You do not
need to copy any repository data to the device: Phabricator will automatically
synchronize it.

If you have a large amount of repository data, you may want to help this
process along by copying the repository directory from an existing cluster
device before bringing the new host online. This is optional, but can reduce
the amount of time required to fully synchronize the cluster.

You do not need to synchronize the most up-to-date data or stop writes during
this process. For example, loading the most recent backup snapshot onto the new
device will substantially reduce the amount of data that needs to be
synchronized.


Contracting a Cluster
=====================

If you want to remove working devices from a cluster (for example, to take
hosts down for maintenance), first do this for each device:

  - Change the `writable` property on the bindings to "Prevent Writes".
  - Wait a few moments until the cluster synchronizes (see
    "Monitoring Services" below).

This will ensure that the device you're about to remove is not the only cluster
leader, even if the cluster is receiving a high write volume. You can skip this
step if the device isn't working property to start with.

Once you've stopped writes and waited for synchronization (or if the hosts are
not working in the first place) do this for each device:

  - Disable the bindings from the service to the device in Almanac.

If you are removing a device because it failed abruptly (or removing several
devices at once; or you skip the "Prevent Writes" step), it is possible that
some repositories will have lost all their leaders. See "Loss of Leaders" below
to understand and resolve this.

If you want to put the hosts back in service later:

  - Enable the bindings again.
  - Change `writable` back to "Allow Writes".

This will restore the cluster to the original state.


Monitoring Services
===================

You can get an overview of repository cluster status from the
{nav Config > Repository Servers} screen. This table shows a high-level
overview of all active repository services.

**Repos**: The number of repositories hosted on this service.

**Sync**: Synchronization status of repositories on this service. This is an
at-a-glance view of service health, and can show these values:

  - **Synchronized**: All nodes are fully synchronized and have the latest
    version of all repositories.
  - **Partial**: All repositories either have at least two leaders, or have
    a very recent write which is not expected to have propagated yet.
  - **Unsynchronized**: At least one repository has changes which are
    only available on one node and were not pushed very recently. Data may
    be at risk.
  - **No Repositories**: This service has no repositories.
  - **Ambiguous Leader**: At least one repository has an ambiguous leader.

If this screen identifies problems, you can drill down into repository details
to get more information about them. See the next section for details.


Monitoring Repositories
=======================

You can get a more detailed view the current status of a specific repository on
cluster devices in {nav Diffusion > (Repository) > Manage Repository > Cluster
Configuration}.

This screen shows all the configured devices which are hosting the repository
and the available version on that device.

**Version**: When a repository is mutated by a push, Phabricator increases
an internal version number for the repository. This column shows which version
is on disk on the corresponding device.

After a change is pushed, the device which received the change will have a
larger version number than the other devices. The change should be passively
replicated to the remaining devices after a brief period of time, although this
can take a while if the change was large or the network connection between
devices is slow or unreliable.

You can click the version number to see the corresponding push logs for that
change. The logs contain details about what was changed, and can help you
identify if replication is slow because a change is large or for some other
reason.

**Writing**: This shows that the device is currently holding a write lock. This
normally means that it is actively receiving a push, but can also mean that
there was a write interruption. See "Write Interruptions" below for details.

**Last Writer**: This column identifies the user who most recently pushed a
change to this device. If the write lock is currently held, this user is
the user whose change is holding the lock.

**Last Write At**: When the most recent write started. If the write lock is
currently held, this shows when the lock was acquired.


Cluster Failure Modes
=====================

There are three major cluster failure modes:

  - **Write Interruptions**: A write started but did not complete, leaving
    the disk state and cluster state out of sync.
  - **Loss of Leaders**: None of the devices with the most up-to-date data
    are reachable.
  - **Ambiguous Leaders**: The internal state of the repository is unclear.

Phabricator can detect these issues, and responds by freezing the repository
(usually preventing all reads and writes) until the issue is resolved. These
conditions are normally rare and very little data is at risk, but Phabricator
errs on the side of caution and requires decisions which may result in data
loss to be confirmed by a human.

The next sections cover these failure modes and appropriate responses in
more detail. In general, you will respond to these issues by assessing the
situation and then possibly choosing to discard some data.


Write Interruptions
===================

A repository cluster can be put into an inconsistent state by an interruption
in a brief window during and immediately after a write. This looks like this:

  - A change is pushed to a server.
  - The server acquires a write lock and begins writing the change.
  - During or immediately after the write, lightning strikes the server
    and destroys it.

Phabricator can not commit changes to a working copy (stored on disk) and to
the global state (stored in a database) atomically, so there is necessarily a
narrow window between committing these two different states when some tragedy
can befall a server, leaving the global and local views of the repository state
possibly divergent.

In these cases, Phabricator fails into a frozen state where further writes
are not permitted until the failure is investigated and resolved. When a
repository is frozen in this way it remains readable.

You can use the monitoring console to review the state of a frozen repository
with a held write lock. The **Writing** column will show which device is
holding the lock, and whoever is named in the **Last Writer** column may be
able to help you figure out what happened by providing more information about
what they were doing and what they observed.

Because the push was not acknowledged, it is normally safe to resolve this
issue by demoting the device. Demoting the device will undo any changes
committed by the push, and they will be lost forever.

However, the user should have received an error anyway, and should not expect
their push to have worked. Still, data is technically at risk and you may want
to investigate further and try to understand the issue in more detail before
continuing.

There is no way to explicitly keep the write, but if it was committed to disk
you can recover it manually from the working copy on the device (for example,
by using `git format-patch`) and then push it again after recovering.

If you demote the device, the in-process write will be thrown away, even if it
was complete on disk. To demote the device and release the write lock, run this
command:

```
phabricator/ $ ./bin/repository thaw <repository> --demote <device>
```

{icon exclamation-triangle, color="yellow"} Any committed but unacknowledged
data on the device will be lost.


Loss of Leaders
===============

A more straightforward failure condition is the loss of all servers in a
cluster which have the most up-to-date copy of a repository. This looks like
this:

  - There is a cluster setup with two devices, X and Y.
  - A new change is pushed to server X.
  - Before the change can propagate to server Y, lightning strikes server X
    and destroys it.

Here, all of the "leader" devices with the most up-to-date copy of the
repository have been lost. Phabricator will freeze the repository refuse to
serve requests because it can not serve reads consistently and can not accept
new writes without data loss.

The most straightforward way to resolve this issue is to restore any leader to
service. The change will be able to replicate to other devices once a leader
comes back online.

If you are unable to restore a leader or unsure that you can restore one
quickly, you can use the monitoring console to review which changes are
present on the leaders but not present on the followers by examining the
push logs.

If you are comfortable discarding these changes, you can instruct Phabricator
that it can forget about the leaders by doing this:

  - Disable the service bindings to all of the leader devices so they are no
    longer part of the cluster.
  - Then, use `bin/repository thaw` to `--demote` the leaders explicitly.

To demote a device, run this command:

```
phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net
```

{icon exclamation-triangle, color="red"} Any data which is only present on
the demoted device will be lost.

If you do this, **you will lose unreplicated data**. You will discard any
changes on the affected leaders which have not replicated to other devices
in the cluster.


Ambiguous Leaders
=================

Repository clusters can also freeze if the leader devices are ambiguous. This
can happen if you replace an entire cluster with new devices suddenly, or make
a mistake with the `--demote` flag. This may arise from some kind of operator
error, like these:

  - Someone accidentally uses `bin/repository thaw ... --demote` to demote
    every device in a cluster.
  - Someone accidentally deletes all the version information for a repository
    from the database by making a mistake with a `DELETE` or `UPDATE` query.
  - Someone accidentally disables all of the devices in a cluster, then adds
    entirely new ones before repositories can propagate.

If you are moving repositories into cluster services, you can also reach this
state if you use `clusterize` to associate a repository with a service that is
bound to multiple active devices. In this case, Phabricator will not know which
device or devices have up-to-date information.

When Phabricator can not tell which device in a cluster is a leader, it freezes
the cluster because it is possible that some devices have less data and others
have more, and if it chooses a leader arbitrarily it may destroy some data
which you would prefer to retain.

To resolve this, you need to tell Phabricator which device has the most
up-to-date data and promote that device to become a leader. If you know all
devices have the same data, you are free to promote any device.

If you promote a device, **you may lose data** if you promote the wrong device
and some other device really had more up-to-date data. If you want to double
check, you can examine the working copies on disk before promoting by
connecting to the machines and using commands like `git log` to inspect state.

Once you have identified a device which has data you're happy with, use
`bin/repository thaw` to `--promote` the device. The data on the chosen
device will become authoritative:

```
phabricator/ $ ./bin/repository thaw rXYZ --promote repo002.corp.net
```

{icon exclamation-triangle, color="red"} Any data which is only present on
**other** devices will be lost.


Backups
======

Even if you configure clustering, you should still consider retaining separate
backup snapshots. Replicas protect you from data loss if you lose a host, but
they do not let you rewind time to recover from data mutation mistakes.

If something issues a `--force` push that destroys branch heads, the mutation
will propagate to the replicas.

You may be able to manually restore the branches by using tools like the
Phabricator push log or the Git reflog so it is less important to retain
repository snapshots than database snapshots, but it is still possible for
data to be lost permanently, especially if you don't notice the problem for
some time.

Retaining separate backup snapshots will improve your ability to recover more
data more easily in a wider range of disaster situations.


Next Steps
==========

Continue by:

  - returning to @{article:Clustering Introduction}.