From 287e708c4d3ecdec3af77f5c409d0aa9f118ef94 Mon Sep 17 00:00:00 2001 From: epriestley Date: Sun, 2 Apr 2017 12:55:38 -0700 Subject: [PATCH] Adjust and wordsmith Search documentation Summary: Ref T12450. General adjustments: - Try to make "Cluster: Search" more about "stuff in common + types" instead of pretty much all being Elastic-specific, so we can add Solr or whatever later. - Provide guidance about rebuilding indexes after making a change. - Simplify the basic examples, then provide a more advanced example at the ed. - Really try to avoid suggesting anyone configure Elasticsearch ever for any reason. Test Plan: Read documents, previewed in remarkup. Reviewers: chad, 20after4 Reviewed By: 20after4 Maniphest Tasks: T12450 Differential Revision: https://secure.phabricator.com/D17602 --- src/docs/user/cluster/cluster.diviner | 26 +-- src/docs/user/cluster/cluster_search.diviner | 234 +++++++++++++++---- 2 files changed, 191 insertions(+), 69 deletions(-) diff --git a/src/docs/user/cluster/cluster.diviner b/src/docs/user/cluster/cluster.diviner index 15eed86eb4..30ad53efb8 100644 --- a/src/docs/user/cluster/cluster.diviner +++ b/src/docs/user/cluster/cluster.diviner @@ -47,7 +47,7 @@ will have on availability, resistance to data loss, and scalability. | **SSH Servers** | Minimal | Low | No Risk | Low | **Web Servers** | Minimal | **High** | No Risk | Moderate | **Notifications** | Minimal | Low | No Risk | Low -| **Fulltext Search** | Moderate | **High** | Minimal Risk | Moderate +| **Fulltext Search** | Minimal | Low | No Risk | Low See below for a walkthrough of these services in greater detail. @@ -241,26 +241,14 @@ For details, see @{article:Cluster: Notifications}. Cluster: Fulltext Search ======================== -At a certain scale, you may begin to bump up against the limitations of MySQL's -built-in fulltext search capabilities. We have seen this with very large -installations with several million objects in the database and very many -simultaneous requests. At this point you may consider adding Elasticsearch -hosts to your cluster to reduce the load on your MySQL hosts. +Configuring search services is relatively simple and has no pre-requisites. -Elasticsearch has the ability to spread the load across multiple hosts and can -handle very large indexes by sharding. +By default, Phabricator uses MySQL as a fulltext search engine, so deploying +multiple database hosts will effectively also deploy multiple fulltext search +hosts. -Search does not involve any risk of data lost because it's always possible to -rebuild the search index from the original database objects. This process can -be very time consuming, however, especially when the database grows very large. - -With multiple Elasticsearch hosts, you can survive the loss of a single host -with minimal disruption as Phabricator will detect the problem and direct -queries to one of the remaining hosts. - -Phabricator supports writing to multiple indexing servers. This Simplifies -Elasticsearch upgrades and makes it possible to recover more quickly from -problems with the search index. +Search indexes can be completely rebuilt from the database, so there is no +risk of data loss no matter how fulltext search is configured. For details, see @{article:Cluster: Search}. diff --git a/src/docs/user/cluster/cluster_search.diviner b/src/docs/user/cluster/cluster_search.diviner index 662abecbc3..c658f50db4 100644 --- a/src/docs/user/cluster/cluster_search.diviner +++ b/src/docs/user/cluster/cluster_search.diviner @@ -4,73 +4,207 @@ Overview ======== -You can configure phabricator to connect to one or more fulltext search clusters -running either Elasticsearch or MySQL. By default and without further -configuration, Phabricator will use MySQL for fulltext search. This will be -adequate for the vast majority of users. Installs with a very large number of -objects or specialized search needs can consider enabling Elasticsearch for -better scalability and potentially better search results. +You can configure Phabricator to connect to one or more fulltext search +services. + +By default, Phabricator will use MySQL for fulltext search. This is suitable +for most installs. However, alternate engines are supported. + Configuring Search Services =========================== -To configure an Elasticsearch service, use the `cluster.search` configuration -option. A typical Elasticsearch configuration will probably look similar to -the following example: +To configure search services, adjust the `cluster.search` configuration +option. This option contains a list of one or more fulltext search services, +like this: + +```lang=json +[ + { + "type": "...", + "hosts": [ + ... + ], + "roles": { + "read": true, + "write": true + } + } +] +``` + +When a user makes a change to a document, Phabricator writes the updated +document into every configured, writable fulltext service. + +When a user issues a query, Phabricator tries configured, readable services +in order until it is able to execute the query successfully. + +These options are supported by all service types: + +| Key | Description | +|---|---| +| `type` | Constant identifying the service type, like `mysql`. +| `roles` | Dictionary of role settings, for enabling reads and writes. +| `hosts` | List of hosts for this service. + +Some service types support additional options. + +Available Service Types +======================= + +These service types are supported: + +| Service | Key | Description | +|---|---|---| +| MySQL | `mysql` | Default MySQL fulltext index. +| Elasticsearch | `elasticsearch` | Use an external Elasticsearch service + + +Fulltext Service Roles +====================== + +These roles are supported: + +| Role | Key | Description +|---|---|---| +| Read | `read` | Allows the service to be queried when users search. +| Write | `write` | Allows documents to be published to the service. + + +Specifying Hosts +================ + +The `hosts` key should contain a list of dictionaries, each specifying the +details of a host. A service should normally have one or more hosts. + +When an option is set at the service level, it serves as a default for all +hosts. It may be overridden by changing the value for a particular host. + + +Service Type: MySQL +============== + +The `mysql` service type does not require any configuration, and does not +need to have hosts specified. This service uses the builtin database to +index and search documents. + +A typical `mysql` service configuration looks like this: ```lang=json { - "cluster.search": [ - { - "type": "elasticsearch", - "hosts": [ - { - "host": "127.0.0.1", - "roles": { "write": true, "read": true } - } - ], - "port": 9200, - "protocol": "http", - "path": "/phabricator", - "version": 5 - }, - ], + "type": "mysql" } ``` -Supported Options ------------------ -| Key | Type |Comments| -|`type` | String |Engine type. Currently, 'elasticsearch' or 'mysql'| -|`protocol`| String |Either 'http' or 'https'| -|`port`| Int |The TCP port that Elasticsearch is bound to| -|`path`| String |The path portion of the url for phabricator's index.| -|`version`| Int |The version of Elasticsearch server. Supports either 2 or 5.| -|`hosts`| List |A list of one or more Elasticsearch host names / addresses.| -Host Configuration ------------------- -Each search service must have one or more hosts associated with it. Each host -entry consists of a `host` key, a dictionary of roles and can optionally -override any of the options that are valid at the service level (see above). +Service Type: Elasticsearch +====================== -Currently supported roles are `read` and `write`. These can be individually -enabled or disabled on a per-host basis. A typical setup might include two -elasticsearch clusters in two separate datacenters. You can configure one -cluster for reads and both for writes. When one cluster is down for maintenance -you can simply swap the read role over to the backup cluster and then proceed -with maintenance without any service interruption. +The `elasticsearch` sevice type supports these options: + +| Key | Description | +|---|---| +| `protocol` | Either `"http"` (default) or `"https"`. +| `port` | Elasticsearch TCP port. +| `version` | Elasticsearch version, either `2` or `5` (default). +| `path` | Path for the index. Defaults to `/phabriator`. Advanced. + +A typical `elasticsearch` service configuration looks like this: + +```lang=json +{ + "type": "elasticsearch", + "hosts": [ + { + "protocol": "http", + "host": "127.0.0.1", + "port": 9200 + } + ] +} +``` Monitoring Search Services ========================== -You can monitor fulltext search in {nav Config > Search Servers}. This interface -shows you a quick overview of services and their health. +You can monitor fulltext search in {nav Config > Search Servers}. This +interface shows you a quick overview of services and their health. The table on this page shows some basic stats for each configured service, followed by the configuration and current status of each host. -NOTE: This page runs its diagnostics //from the web server that is serving the -request//. If you are recovering from a disaster, the view this page shows -may be partial or misleading, and two requests served by different servers may -see different views of the cluster. + +Rebuilding Indexes +================== + +After adding new search services, you will need to rebuild document indexes +on them. To do this, first initialize the services: + +``` +phabricator/ $ ./bin/search init +``` + +This will perform index setup steps and other one-time configuration. + +To populate documents in all indexes, run this command: + +``` +phabricator/ $ ./bin/search index --force --background --type all +``` + +This initiates an exhaustive rebuild of the document indexes. To get a more +detailed list of indexing options available, run: + +``` +phabricator/ $ ./bin/search help index +``` + + +Advanced Example +================ + +This is a more advanced example which shows a configuration with multiple +different services in different roles. In this example: + + - Phabricator is using an Elasticsearch 2 service as its primary fulltext + service. + - An Elasticsearch 5 service is online, but only receiving writes. + - The MySQL service is serving as a backup if Elasticsearch fails. + +This particular configuration may not be very useful. It is primarily +intended to show how to configure many different options. + + +```lang=json +[ + { + "type": "elasticsearch", + "version": 2, + "hosts": [ + { + "host": "elastic2.mycompany.com", + "port": 9200, + "protocol": "http" + } + ] + }, + { + "type": "elasticsearch", + "version": 5, + "hosts": [ + { + "host": "elastic5.mycompany.com", + "port": 9789, + "protocol": "https" + "roles": { + "read": false, + "write": true + } + } + ] + }, + { + "type": "mysql" + } +] +```