1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-19 13:22:42 +01:00

Fill in missing cluster database documentation

Summary:
Ref T10751. Provide some guidance on replicas and promotion.

I'm not trying to walk administrators through the gritty details of this. It's not too complex, they should understand it, and the MySQL documentation is pretty thorough.

Test Plan: Read documentation.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10751

Differential Revision: https://secure.phabricator.com/D15763
This commit is contained in:
epriestley 2016-04-19 19:31:47 -07:00
parent 1344dda756
commit bab3690b54

View file

@ -6,31 +6,76 @@ Configuring Phabricator to use multiple database hosts.
Overview
========
WARNING: This feature is a very early prototype; the features this document
describes are mostly speculative fantasy.
You can deploy Phabricator with multiple database hosts, configured as a master
and a set of replicas. The advantages of doing this are:
- faster recovery from disasters by promoting a replica;
- graceful degradation if the master fails;
- reduced load on the master; and
- graceful degradation if the master fails; and
- some tools to help monitor and manage replica health.
This configuration is complex, and many installs do not need to pursue it.
Phabricator can not currently be configured into a multi-master mode, nor can
it be configured to automatically promote a replica to become the new master.
If you lose the master, Phabricator can degrade automatically into read-only
mode and remain available, but can not fully recover without operational
intervention unless the master recovers on its own.
Phabricator will not currently send read traffic to replicas unless the master
has failed, so configuring a replica will not currently spread any load away
from the master. Future versions of Phabricator are expected to be able to
distribute some read traffic to replicas.
Phabricator can not currently be configured into a multi-master mode, nor can
it be configured to automatically promote a replica to become the new master.
There are no current plans to support multi-master mode or autonomous failover,
although this may change in the future.
Setting up MySQL Replication
============================
TODO: Write this section.
To begin, set up a replica database server and configure MySQL replication.
If you aren't sure how to do this, refer to the MySQL manual for instructions.
The MySQL documentation is comprehensive and walks through the steps and
options in good detail. You should understand MySQL replication before
deploying it in production: Phabricator layers on top of it, and does not
attempt to abstract it away.
Some useful notes for configuring replication for Phabricator:
**Binlog Format**: Phabricator issues some queries which MySQL will detect as
unsafe if you use the `STATEMENT` binlog format (the default). Instead, use
`MIXED` (recommended) or `ROW` as the `binlog_format`.
**Grant `REPLICATION CLIENT` Privilege**: If you give the user that Phabricator
will use to connect to the replica database server the `REPLICATION CLIENT`
privilege, Phabricator's status console can give you more information about
replica health and state.
**Copying Data to Replicas**: Phabricator currently uses a mixture of MyISAM
and InnoDB tables, so it can be difficult to guarantee that a dump is wholly
consistent and suitable for loading into a replica because MySQL uses different
consistency mechanisms for the different storage engines.
An approach you may want to consider to limit downtime but still produce a
consistent dump is to leave Phabricator running but configured in read-only
mode while dumping:
- Stop all the daemons.
- Set `cluster.read-only` to `true` and deploy the new configuration. The
web UI should now show that Phabricator is in "Read Only" mode.
- Dump the database. You can do this with `bin/storage dump --for-replica`
to add the `--master-data` flag to the underlying command and include a
`CHANGE MASTER ...` statement in the dump.
- Once the dump finishes, turn `cluster.read-only` off again to restore
service. Continue loading the dump into the replica normally.
**Log Expiration**: You can configure MySQL to automatically clean up old
binary logs on startup with the `expire_logs_days` option. If you do not
configure this and do not explicitly purge old logs with `PURGE BINARY LOGS`,
the binary logs on disk will grow unboundedly and relatively quickly.
Once you have a working replica, continue below to tell Phabricator about it.
Configuring Replicas
@ -207,7 +252,38 @@ the new master. See the next section, "Promoting a Replica", for details.
Promoting a Replica
===================
TODO: Write this section.
If you lose access to the master database, Phabricator will degrade into
read-only mode. This is described in greater detail below.
The easiest way to get out of read-only mode is to restore the master database.
If the database recovers on its own or operations staff can revive it,
Phabricator will return to full working order after a few moments.
If you can't restore the master or are unsure you will be able to restore the
master quickly, you can promote a replica to become the new master instead.
Before doing this, you should first assess how far behind the master the
replica was when the link died. Any data which was not replicated will either
be lost or become very difficult to recover after you promote a replica.
For example, if some `T1234` had been created on the master but had not yet
replicated and you promote the replica, a new `T1234` may be created on the
replica after promotion. Even if you can recover the master later, merging
the data will be difficult because each database may have conflicting changes
which can not be merged easily.
If there was a significant replication delay at the time of the failure, you
may wait to try harder or spend more time attempting to recover the master
before choosing to promote.
If you have made a choice to promote, disable replication on the replica and
mark it as the `master` in `cluster.databases`. Remove the original master and
deploy the configuration change to all surviving hosts.
Once write service is restored, you should provision, deploy, and configure a
new replica by following the steps you took the first time around. You are
critically vulnerable to a second disruption until you have restored the
redundancy.
Unreachable Masters