1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-31 08:58:20 +01:00

Add slightly more cluster repository documentation

Summary: Ref T10751. There are still some missing support tools here, but explain some of this a little better.

Test Plan: Read documentation.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10751

Differential Revision: https://secure.phabricator.com/D15764
This commit is contained in:
epriestley 2016-04-19 20:15:39 -07:00
parent bab3690b54
commit 48b015a3fa

View file

@ -19,19 +19,19 @@ advantages of doing this are:
This configuration is complex, and many installs do not need to pursue it.
This configuration is not currently supported with Subversion.
This configuration is not currently supported with Subversion or Mercurial.
Repository Hosts
================
Repository hosts must run a complete, fully configured copy of Phabricator,
including a webserver. If you make repositories available over SSH, they must
also run a properly configured `sshd`.
including a webserver. They must also run a properly configured `sshd`.
Generally, these hosts will run the same set of services and configuration that
web hosts run. If you prefer, you can overlay these services and put web and
repository services on the same hosts.
repository services on the same hosts. See @{article:Clustering Introduction}
for some guidance on overlaying services.
When a user requests information about a repository that can only be satisfied
by examining a repository working copy, the webserver receiving the request
@ -57,6 +57,17 @@ If it isn't, they block the read until they can complete a fetch.
Before responding to a write, replicas obtain a global lock, perform the same
version check and fetch if necessary, then allow the write to continue.
Additionally, repositories passively check other nodes for updates and
replicate changes in the background. After you push a change to a repositroy,
it will usually spread passively to all other repository nodes within a few
minutes.
Even if passive replication is slow, the active replication makes acknowledged
changes sequential to all observers: after a write is acknowledged, all
subsequent reads are guaranteed to see it. The system does not permit stale
reads, and you do not need to wait for a replication delay to see a consistent
view of the repository no matter which node you ask.
HTTP vs HTTPS
=============
@ -84,6 +95,81 @@ Other mitigations are possible, but securing a network against the NSA and
similar agents of other rogue nations is beyond the scope of this document.
Monitoring Replication
======================
You can review the current status of a repository on cluster nodes in
{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.
This screen shows all the configured devices which are hosting the repository
and the available version.
**Version**: When a repository is mutated by a push, Phabricator increases
an internal version number for the repository. This column shows which version
is on disk on the corresponding node.
After a change is pushed, the node which received the change will have a larger
version number than the other nodes. The change should be passively replicated
to the remaining nodes after a brief period of time, although this can take
a while if the change was large or the network connection between nodes is
slow or unreliable.
You can click the version number to see the corresponding push logs for that
change. The logs contain details about what was changed, and can help you
identify if replication is slow because a change is large or for some other
reason.
**Writing**: This shows that the node is currently holding a write lock. This
normally means that it is actively receiving a push, but can also mean that
there was a write interruption. See "Write Interruptions" below for details.
Write Interruptions
===================
A repository cluster can be put into an inconsistent state by an interruption
in a brief window immediately after a write.
Phabricator can not commit changes to a working copy (stored on disk) and to
the global state (stored in a database) atomically, so there is a narrow window
between committing these two different states when some tragedy (like a
lightning strike) can befall a server, leaving the global and local views of
the repository state divergent.
In these cases, Phabricator fails into a "frozen" state where further writes
are not permitted until the failure is investigated and resolved.
TODO: Complete the support tooling and provide recovery instructions.
Loss of Leaders
===============
A more straightforward failure condition is the loss of all servers in a
cluster which have the most up-to-date copy of a repository. This looks like
this:
- There is a cluster setup with two nodes, X and Y.
- A new change is pushed to server X.
- Before the change can propagate to server Y, lightning strikes server X
and destroys it.
Here, all of the "leader" nodes with the most up-to-date copy of the repository
have been lost. Phabricator will refuse to serve this repository because it
can not serve it consistently, and can not accept writes without data loss.
The most straightforward way to resolve this issue is to restore any leader to
service. The change will be able to replicate to other nodes once a leader
comes back online.
If you are unable to restore a leader or unsure that you can restore one
quickly, you can use the monitoring console to review which changes are
present on the leaders but not present on the followers by examining the
push logs.
TODO: Complete the support tooling and provide recovery instructions.
Backups
======