Clean up some old cluster-ish documentation

Summary: Ref T10751. We currently have a placeholder Almanac document, and a fairly-bad-advice section in Daemons. Pull these into the modern cluster documentation. Test Plan: 17 phabricator PHDs Reviewers: chad Reviewed By: chad Maniphest Tasks: T10751 Differential Revision: https://secure.phabricator.com/D15689
2025-01-11 15:21:03 +01:00 · 2016-04-12 10:46:19 -07:00 · 2016-04-12 10:46:19 -07:00 · afb0f7c7af
commit afb0f7c7af
parent 33060d1652
7 changed files with 234 additions and 69 deletions
--- a/src/applications/almanac/application/PhabricatorAlmanacApplication.php
+++ b/src/applications/almanac/application/PhabricatorAlmanacApplication.php
@ -83,8 +83,7 @@ final class PhabricatorAlmanacApplication extends PhabricatorApplication {
      phutil_tag(
        'a',
        array(
-          'href' => PhabricatorEnv::getDoclink(
-            'User Guide: Phabricator Clusters'),
+          'href' => PhabricatorEnv::getDoclink('Clustering Introduction'),
          'target' => '_blank',
        ),
        pht('Learn More')));
--- a/src/applications/almanac/controller/AlmanacController.php
+++ b/src/applications/almanac/controller/AlmanacController.php
@ -178,7 +178,7 @@ abstract class AlmanacController
      'a',
      array(
        'href' => PhabricatorEnv::getDoclink(
-          'User Guide: Phabricator Clusters'),
+          'Clustering Introduction'),
        'target' => '_blank',
      ),
      pht('Learn More'));
--- a/src/docs/user/cluster/cluster.diviner
+++ b/src/docs/user/cluster/cluster.diviner
@ -26,6 +26,9 @@ operations personnel who need this high degree of flexibility.
 The remainder of this document summarizes how to add redundancy to each
 service and where your efforts are likely to have the greatest impact.

+For additional guidance on setting up a cluster, see "Overlaying Services"
+and "Cluster Recipes" at the bottom of this document.
+

 Cluster: Databases
 =================
@ -44,7 +47,8 @@ For details, see @{article:Cluster: Databases}.
 Cluster: Repositories
 =====================

-Configuring multiple repository hosts is complex.
+Configuring multiple repository hosts is complex, but is required before you
+can add multiple daemon or web hosts.

 Repository replicas are important for availability if you host repositories
 on Phabricator, but less important if you host repositories elsewhere
@ -55,3 +59,123 @@ naturally somewhat resistant to data loss: every clone of a repository includes
 the entire history.

 For details, see @{article:Cluster: Repositories}.
+
+
+Cluster: Daemons
+================
+
+Configuring multiple daemon hosts is straightforward, but you must configure
+repositories first.
+
+With daemons running on multiple hosts, you can transparently survive the loss
+of any subset of hosts without an interruption to daemon services, as long as
+at least one host remains alive. Daemons are stateless, so spreading daemons
+across multiple hosts provides no resistance to data loss.
+
+For details, see @{article:Cluster: Daemons}.
+
+
+Cluster: Web Servers
+====================
+
+Configuring multiple web hosts is straightforward, but you must configure
+repositories first.
+
+With multiple web hosts, you can transparently survive the loss of any subset
+of hosts as long as at least one host remains alive. Web hosts are stateless,
+so putting multiple hosts in service provides no resistance to data loss.
+
+For details, see @{article:Cluster: Web Servers}.
+
+
+Overlaying Services
+===================
+
+Although hosts can run a single dedicated service type, certain groups of
+services work well together. Phabricator clusters usually do not need to be
+very large, so deploying a small number of hosts with multiple services is a
+good place to start.
+
+In planning a cluster, consider these blended host types:
+
+**Everything**: Run HTTP, SSH, MySQL, repositories and daemons on a single
+host. This is the starting point for single-node setups, and usually also the
+best configuration when adding the second node.
+
+**Everything Except Databases**: Run HTTP, SSH, repositories and daemons on one
+host, and MySQL on a different host. MySQL uses many of the same resources that
+other services use. It's also simpler to separate than other services, and
+tends to benefit the most from dedicated hardware.
+
+**Just Databases**: Separating MySQL onto dedicated nodes
+
+Database nodes tend to benefit the most from
+
+**Repositories and Daemons**: Run repositories and daemons on the same host.
+Repository hosts //must// run daemons, and it normally makes sense to
+completely overlay repositories and daemons. These services tend to use
+different resources (repositories are heavier on I/O and lighter on CPU/RAM;
+daemons are heavier on CPU/RAM and lighter on I/O).
+
+Repositories and daemons are also both less latency sensitive than other
+service types, so there's a wider margin of error for underprovisioning them
+before performance is noticably affected.
+
+These nodes tend to use system resources in a balanced way. Individual nodes
+in this class do not need to be particularly powerful.
+
+**Frontend Servers**: Run HTTP and SSH on the same host. These are easy to set
+up, stateless, and you can scale the pool up or down easily to meet demand.
+Routing both types of ingress traffic through the same initial tier can
+simplify load balancing.
+
+These nodes tend to need relatively little RAM.
+
+
+Cluster Recipes
+===============
+
+This section provides some guidance on reasonable ways to scale up a cluster.
+
+The smallest possible cluster is **two hosts**. Run everything (web, ssh,
+database, repositories, and daemons) on each host. One host will serve as the
+master; the other will serve as a replica.
+
+Ideally, you should physically separate these hosts to reduce the chance that a
+natural disaster or infrastructure disruption could disable or destroy both
+hosts at the same time.
+
+From here, you can choose how you expand the cluster.
+
+To improve **scalability and performance**, separate loaded services onto
+dedicated hosts and then add more hosts of that type to increase capacity. If
+you have a two-node cluster, the best way to improve scalability by adding one
+host is likely to separate the master database onto its own host.
+
+Note that increasing scale may //decrease// availability by leaving you with
+too little capacity after a failure. If you have three hosts handling traffic
+and one datacenter fails, too much traffic may be sent to the single remaining
+host in the surviving datacenter. You can hedge against this by mirroring new
+hosts in other datacenters (for example, also separate the replica database
+onto its own host).
+
+After separating databases, separating repository + daemon nodes is likely
+the next step.
+
+To improve **availability**, add another copy of everything you run in one
+datacenter to a new datacenter. For example, if you have a two-node cluster,
+the best way to improve availability is to run everything on a third host in a
+third datacenter. If you have a 6-node cluster with a web node, a database node
+and a repo + daemon node in two datacenters, add 3 more nodes to create a copy
+of each node in a third datacenter.
+
+You can continue adding hosts until you run out of hosts.
+
+
+Next Steps
+==========
+
+Continue by:
+
+  - learning how Phacility configures and operates a large, multi-tenant
+    production cluster in ((cluster)).
--- a/src/docs/user/cluster/cluster_daemons.diviner
+++ b/src/docs/user/cluster/cluster_daemons.diviner
@ -0,0 +1,59 @@
+@title Cluster: Daemons
+@group intro
+
+Configuring Phabricator to use multiple daemon hosts.
+
+Overview
+========
+
+WARNING: This feature is a very early prototype; the features this document
+describes are mostly speculative fantasy.
+
+You can run daemons on multiple hosts. The advantages of doing this are:
+
+  - you can completely survive the loss of multiple daemon hosts; and
+  - worker queue throughput may improve.
+
+This configuration is simple, but you must configure repositories first. For
+details, see @{article:Cluster: Repositories}.
+
+Since repository hosts must run daemons anyway, you usually do not need to do
+any additional work and can skip this entirely.
+
+
+Adding Daemon Hosts
+===================
+
+After configuring repositories for clustering, launch daemons on every
+repository host according to the documentation in
+@{article:Cluster: Repositories}. These daemons are necessary: repositories
+will not fetch, update, or synchronize properly without them.
+
+If your repository clustering is redundant (you have at least two repsoitory
+hosts), these daemons are also likely to be sufficient in most cases. If you
+want to launch additional hosts anyway (for example, to increase queue capacity
+for unusual workloads), see "Dedicated Daemon Hosts" below.
+
+
+Dedicated Daemon Hosts
+======================
+
+You can launch additional daemon hosts without any special configuration.
+Daemon hosts must be able to reach other hosts on the network, but do not need
+to run any services (like HTTP or SSH). Simply deploy the Phabricator software
+and configuration and start the daemons.
+
+Normally, there is little reason to deploy dedicated daemon hosts. They can
+improve queue capacity, but generally do not improve availability or increase
+resistance to data loss on their own. Instead, consider deploying more
+repository hosts: repository hosts run daemons, so this will increase queue
+capacity but also improve repository availability and cluster resistance.
+
+
+Next Steps
+==========
+
+Continue by:
+
+  - returning to @{article:Clustering Introduction}; or
+  - configuring repositories first with @{article:Cluster: Repositories}.
--- a/src/docs/user/cluster/cluster_webservers.diviner
+++ b/src/docs/user/cluster/cluster_webservers.diviner
@ -0,0 +1,42 @@
+@title Cluster: Web Servers
+@group intro
+
+Configuring Phabricator to use multiple web servers.
+
+Overview
+========
+
+WARNING: This feature is a very early prototype; the features this document
+describes are mostly speculative fantasy.
+
+You can run Phabricator on multiple web servers. The advantages of doing this
+are:
+
+  - you can completely survive the loss of multiple web hosts; and
+  - performance and capacity may improve.
+
+This configuration is simple, but you must configure repositories first. For
+details, see @{article:Cluster: Repositories}.
+
+
+Adding Web Hosts
+================
+
+After configuring repositories in cluster mode, you can add more web hosts
+at any time: simply deploy the Phabricator software and configuration to a
+host, start the web server, and then add the host to the load balancer pool.
+
+Phabricator web servers are stateless, so you can pull them in and out of
+production freely.
+
+You may also want to run SSH services on these hosts, since the service is very
+similar to HTTP, also stateless, and it may be simpler to load balance the
+services together.
+
+
+Next Steps
+==========
+
+Continue by:
+
+  - returning to @{article:Clustering Introduction}.
--- a/src/docs/user/configuration/cluster.diviner
+++ b/src/docs/user/configuration/cluster.diviner
@ -1,50 +0,0 @@
-@title User Guide: Phabricator Clusters
-@group config
-
-Guide on scaling Phabricator across multiple machines.
-
-Overview
-========
-
-IMPORTANT: Phabricator clustering is in its infancy and does not work at all
-yet. This document is mostly a placeholder.
-
-IMPORTANT: DO NOT CONFIGURE CLUSTER SERVICES UNLESS YOU HAVE **TWENTY YEARS OF
-EXPERIENCE WITH PHABRICATOR** AND **A MINIMUM OF 17 PHABRICATOR PHDs**. YOU
-WILL BREAK YOUR INSTALL AND BE UNABLE TO REPAIR IT.
-
-See also @{article:Almanac User Guide}.
-
-
-Managing Cluster Configuration
-==============================
-
-Cluster configuration is managed primarily from the **Almanac** application.
-
-To define cluster services and create or edit cluster configuration, you must
-have the **Can Manage Cluster Services** application permission in Almanac. If
-you do not have this permission, all cluster services and all connected devices
-will be locked and not editable.
-
-The **Can Manage Cluster Services** permission is stronger than service and
-device policies, and overrides them. You can never edit a cluster service if
-you don't have this permission, even if the **Can Edit** policy on the service
-itself is very permissive.
-
-
-Locking Cluster Configuration
-=============================
-
-IMPORTANT: Managing cluster services is **dangerous** and **fragile**.
-
-If you make a mistake, you can break your install. Because the install is
-broken, you will be unable to load the web interface in order to repair it.
-
-IMPORTANT: Currently, broken clusters must be repaired by manually fixing them
-in the database. There are no instructions available on how to do this, and no
-tools to help you. Do not configure cluster services.
-
-If an attacker gains access to an account with permission to manage cluster
-services, they can add devices they control as database servers. These servers
-will then receive sensitive data and traffic, and allow the attacker to
-escalate their access and completely compromise an install.
--- a/src/docs/user/configuration/managing_daemons.diviner
+++ b/src/docs/user/configuration/managing_daemons.diviner
@ -113,25 +113,16 @@ This daemon will daemonize and run normally.
  - See @{article:Diffusion User Guide} for details about tuning the repository
    daemon.

-== Multiple Machines ==

-If you have multiple machines, you should use `phd launch` to tweak which
-daemons launch, and split daemons across machines like this:
+Multiple Hosts
+==============

-  - `PhabricatorRepositoryPullLocalDaemon`: Run one copy on any machine.
-    On each web frontend which is not running a normal copy, run a copy
-    with the `--no-discovery` flag.
-  - `PhabricatorTriggerDaemon`: Run one copy on any machine.
-  - `PhabricatorTaskmasterDaemon`: Run as many copies as you need to keep
-    tasks from backing up. You can run them all on one machine or split them
-    across machines.
+For information about running daemons on multiple hosts, see
+@{article:Cluster: Daemons}.

-A gratuitously wasteful install might have a dedicated daemon machine which
-runs `phd start` with a large pool of taskmasters set in the config, and then
-runs `phd launch PhabricatorRepositoryPullLocalDaemon -- --no-discovery` on each
-web server. This is grossly excessive in normal cases.

-= Next Steps =
+Next Steps
+==========

 Continue by: