mirror of
https://we.phorge.it/source/phorge.git
synced 2025-01-31 08:58:20 +01:00
Add slightly more cluster repository documentation
Summary: Ref T10751. There are still some missing support tools here, but explain some of this a little better. Test Plan: Read documentation. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10751 Differential Revision: https://secure.phabricator.com/D15764
This commit is contained in:
parent
bab3690b54
commit
48b015a3fa
1 changed files with 90 additions and 4 deletions
|
@ -19,19 +19,19 @@ advantages of doing this are:
|
|||
|
||||
This configuration is complex, and many installs do not need to pursue it.
|
||||
|
||||
This configuration is not currently supported with Subversion.
|
||||
This configuration is not currently supported with Subversion or Mercurial.
|
||||
|
||||
|
||||
Repository Hosts
|
||||
================
|
||||
|
||||
Repository hosts must run a complete, fully configured copy of Phabricator,
|
||||
including a webserver. If you make repositories available over SSH, they must
|
||||
also run a properly configured `sshd`.
|
||||
including a webserver. They must also run a properly configured `sshd`.
|
||||
|
||||
Generally, these hosts will run the same set of services and configuration that
|
||||
web hosts run. If you prefer, you can overlay these services and put web and
|
||||
repository services on the same hosts.
|
||||
repository services on the same hosts. See @{article:Clustering Introduction}
|
||||
for some guidance on overlaying services.
|
||||
|
||||
When a user requests information about a repository that can only be satisfied
|
||||
by examining a repository working copy, the webserver receiving the request
|
||||
|
@ -57,6 +57,17 @@ If it isn't, they block the read until they can complete a fetch.
|
|||
Before responding to a write, replicas obtain a global lock, perform the same
|
||||
version check and fetch if necessary, then allow the write to continue.
|
||||
|
||||
Additionally, repositories passively check other nodes for updates and
|
||||
replicate changes in the background. After you push a change to a repositroy,
|
||||
it will usually spread passively to all other repository nodes within a few
|
||||
minutes.
|
||||
|
||||
Even if passive replication is slow, the active replication makes acknowledged
|
||||
changes sequential to all observers: after a write is acknowledged, all
|
||||
subsequent reads are guaranteed to see it. The system does not permit stale
|
||||
reads, and you do not need to wait for a replication delay to see a consistent
|
||||
view of the repository no matter which node you ask.
|
||||
|
||||
|
||||
HTTP vs HTTPS
|
||||
=============
|
||||
|
@ -84,6 +95,81 @@ Other mitigations are possible, but securing a network against the NSA and
|
|||
similar agents of other rogue nations is beyond the scope of this document.
|
||||
|
||||
|
||||
Monitoring Replication
|
||||
======================
|
||||
|
||||
You can review the current status of a repository on cluster nodes in
|
||||
{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.
|
||||
|
||||
This screen shows all the configured devices which are hosting the repository
|
||||
and the available version.
|
||||
|
||||
**Version**: When a repository is mutated by a push, Phabricator increases
|
||||
an internal version number for the repository. This column shows which version
|
||||
is on disk on the corresponding node.
|
||||
|
||||
After a change is pushed, the node which received the change will have a larger
|
||||
version number than the other nodes. The change should be passively replicated
|
||||
to the remaining nodes after a brief period of time, although this can take
|
||||
a while if the change was large or the network connection between nodes is
|
||||
slow or unreliable.
|
||||
|
||||
You can click the version number to see the corresponding push logs for that
|
||||
change. The logs contain details about what was changed, and can help you
|
||||
identify if replication is slow because a change is large or for some other
|
||||
reason.
|
||||
|
||||
**Writing**: This shows that the node is currently holding a write lock. This
|
||||
normally means that it is actively receiving a push, but can also mean that
|
||||
there was a write interruption. See "Write Interruptions" below for details.
|
||||
|
||||
|
||||
Write Interruptions
|
||||
===================
|
||||
|
||||
A repository cluster can be put into an inconsistent state by an interruption
|
||||
in a brief window immediately after a write.
|
||||
|
||||
Phabricator can not commit changes to a working copy (stored on disk) and to
|
||||
the global state (stored in a database) atomically, so there is a narrow window
|
||||
between committing these two different states when some tragedy (like a
|
||||
lightning strike) can befall a server, leaving the global and local views of
|
||||
the repository state divergent.
|
||||
|
||||
In these cases, Phabricator fails into a "frozen" state where further writes
|
||||
are not permitted until the failure is investigated and resolved.
|
||||
|
||||
TODO: Complete the support tooling and provide recovery instructions.
|
||||
|
||||
|
||||
Loss of Leaders
|
||||
===============
|
||||
|
||||
A more straightforward failure condition is the loss of all servers in a
|
||||
cluster which have the most up-to-date copy of a repository. This looks like
|
||||
this:
|
||||
|
||||
- There is a cluster setup with two nodes, X and Y.
|
||||
- A new change is pushed to server X.
|
||||
- Before the change can propagate to server Y, lightning strikes server X
|
||||
and destroys it.
|
||||
|
||||
Here, all of the "leader" nodes with the most up-to-date copy of the repository
|
||||
have been lost. Phabricator will refuse to serve this repository because it
|
||||
can not serve it consistently, and can not accept writes without data loss.
|
||||
|
||||
The most straightforward way to resolve this issue is to restore any leader to
|
||||
service. The change will be able to replicate to other nodes once a leader
|
||||
comes back online.
|
||||
|
||||
If you are unable to restore a leader or unsure that you can restore one
|
||||
quickly, you can use the monitoring console to review which changes are
|
||||
present on the leaders but not present on the followers by examining the
|
||||
push logs.
|
||||
|
||||
TODO: Complete the support tooling and provide recovery instructions.
|
||||
|
||||
|
||||
Backups
|
||||
======
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue