1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-22 20:51:10 +01:00

Adjust and wordsmith Search documentation

Summary:
Ref T12450. General adjustments:

  - Try to make "Cluster: Search" more about "stuff in common + types" instead of pretty much all being Elastic-specific, so we can add Solr or whatever later.
  - Provide guidance about rebuilding indexes after making a change.
  - Simplify the basic examples, then provide a more advanced example at the ed.
  - Really try to avoid suggesting anyone configure Elasticsearch ever for any reason.

Test Plan: Read documents, previewed in remarkup.

Reviewers: chad, 20after4

Reviewed By: 20after4

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17602
This commit is contained in:
epriestley 2017-04-02 12:55:38 -07:00
parent 64234535e3
commit 287e708c4d
2 changed files with 191 additions and 69 deletions

View file

@ -47,7 +47,7 @@ will have on availability, resistance to data loss, and scalability.
| **SSH Servers** | Minimal | Low | No Risk | Low
| **Web Servers** | Minimal | **High** | No Risk | Moderate
| **Notifications** | Minimal | Low | No Risk | Low
| **Fulltext Search** | Moderate | **High** | Minimal Risk | Moderate
| **Fulltext Search** | Minimal | Low | No Risk | Low
See below for a walkthrough of these services in greater detail.
@ -241,26 +241,14 @@ For details, see @{article:Cluster: Notifications}.
Cluster: Fulltext Search
========================
At a certain scale, you may begin to bump up against the limitations of MySQL's
built-in fulltext search capabilities. We have seen this with very large
installations with several million objects in the database and very many
simultaneous requests. At this point you may consider adding Elasticsearch
hosts to your cluster to reduce the load on your MySQL hosts.
Configuring search services is relatively simple and has no pre-requisites.
Elasticsearch has the ability to spread the load across multiple hosts and can
handle very large indexes by sharding.
By default, Phabricator uses MySQL as a fulltext search engine, so deploying
multiple database hosts will effectively also deploy multiple fulltext search
hosts.
Search does not involve any risk of data lost because it's always possible to
rebuild the search index from the original database objects. This process can
be very time consuming, however, especially when the database grows very large.
With multiple Elasticsearch hosts, you can survive the loss of a single host
with minimal disruption as Phabricator will detect the problem and direct
queries to one of the remaining hosts.
Phabricator supports writing to multiple indexing servers. This Simplifies
Elasticsearch upgrades and makes it possible to recover more quickly from
problems with the search index.
Search indexes can be completely rebuilt from the database, so there is no
risk of data loss no matter how fulltext search is configured.
For details, see @{article:Cluster: Search}.

View file

@ -4,73 +4,207 @@
Overview
========
You can configure phabricator to connect to one or more fulltext search clusters
running either Elasticsearch or MySQL. By default and without further
configuration, Phabricator will use MySQL for fulltext search. This will be
adequate for the vast majority of users. Installs with a very large number of
objects or specialized search needs can consider enabling Elasticsearch for
better scalability and potentially better search results.
You can configure Phabricator to connect to one or more fulltext search
services.
By default, Phabricator will use MySQL for fulltext search. This is suitable
for most installs. However, alternate engines are supported.
Configuring Search Services
===========================
To configure an Elasticsearch service, use the `cluster.search` configuration
option. A typical Elasticsearch configuration will probably look similar to
the following example:
To configure search services, adjust the `cluster.search` configuration
option. This option contains a list of one or more fulltext search services,
like this:
```lang=json
[
{
"type": "...",
"hosts": [
...
],
"roles": {
"read": true,
"write": true
}
}
]
```
When a user makes a change to a document, Phabricator writes the updated
document into every configured, writable fulltext service.
When a user issues a query, Phabricator tries configured, readable services
in order until it is able to execute the query successfully.
These options are supported by all service types:
| Key | Description |
|---|---|
| `type` | Constant identifying the service type, like `mysql`.
| `roles` | Dictionary of role settings, for enabling reads and writes.
| `hosts` | List of hosts for this service.
Some service types support additional options.
Available Service Types
=======================
These service types are supported:
| Service | Key | Description |
|---|---|---|
| MySQL | `mysql` | Default MySQL fulltext index.
| Elasticsearch | `elasticsearch` | Use an external Elasticsearch service
Fulltext Service Roles
======================
These roles are supported:
| Role | Key | Description
|---|---|---|
| Read | `read` | Allows the service to be queried when users search.
| Write | `write` | Allows documents to be published to the service.
Specifying Hosts
================
The `hosts` key should contain a list of dictionaries, each specifying the
details of a host. A service should normally have one or more hosts.
When an option is set at the service level, it serves as a default for all
hosts. It may be overridden by changing the value for a particular host.
Service Type: MySQL
==============
The `mysql` service type does not require any configuration, and does not
need to have hosts specified. This service uses the builtin database to
index and search documents.
A typical `mysql` service configuration looks like this:
```lang=json
{
"cluster.search": [
{
"type": "elasticsearch",
"hosts": [
{
"host": "127.0.0.1",
"roles": { "write": true, "read": true }
}
],
"port": 9200,
"protocol": "http",
"path": "/phabricator",
"version": 5
},
],
"type": "mysql"
}
```
Supported Options
-----------------
| Key | Type |Comments|
|`type` | String |Engine type. Currently, 'elasticsearch' or 'mysql'|
|`protocol`| String |Either 'http' or 'https'|
|`port`| Int |The TCP port that Elasticsearch is bound to|
|`path`| String |The path portion of the url for phabricator's index.|
|`version`| Int |The version of Elasticsearch server. Supports either 2 or 5.|
|`hosts`| List |A list of one or more Elasticsearch host names / addresses.|
Host Configuration
------------------
Each search service must have one or more hosts associated with it. Each host
entry consists of a `host` key, a dictionary of roles and can optionally
override any of the options that are valid at the service level (see above).
Service Type: Elasticsearch
======================
Currently supported roles are `read` and `write`. These can be individually
enabled or disabled on a per-host basis. A typical setup might include two
elasticsearch clusters in two separate datacenters. You can configure one
cluster for reads and both for writes. When one cluster is down for maintenance
you can simply swap the read role over to the backup cluster and then proceed
with maintenance without any service interruption.
The `elasticsearch` sevice type supports these options:
| Key | Description |
|---|---|
| `protocol` | Either `"http"` (default) or `"https"`.
| `port` | Elasticsearch TCP port.
| `version` | Elasticsearch version, either `2` or `5` (default).
| `path` | Path for the index. Defaults to `/phabriator`. Advanced.
A typical `elasticsearch` service configuration looks like this:
```lang=json
{
"type": "elasticsearch",
"hosts": [
{
"protocol": "http",
"host": "127.0.0.1",
"port": 9200
}
]
}
```
Monitoring Search Services
==========================
You can monitor fulltext search in {nav Config > Search Servers}. This interface
shows you a quick overview of services and their health.
You can monitor fulltext search in {nav Config > Search Servers}. This
interface shows you a quick overview of services and their health.
The table on this page shows some basic stats for each configured service,
followed by the configuration and current status of each host.
NOTE: This page runs its diagnostics //from the web server that is serving the
request//. If you are recovering from a disaster, the view this page shows
may be partial or misleading, and two requests served by different servers may
see different views of the cluster.
Rebuilding Indexes
==================
After adding new search services, you will need to rebuild document indexes
on them. To do this, first initialize the services:
```
phabricator/ $ ./bin/search init
```
This will perform index setup steps and other one-time configuration.
To populate documents in all indexes, run this command:
```
phabricator/ $ ./bin/search index --force --background --type all
```
This initiates an exhaustive rebuild of the document indexes. To get a more
detailed list of indexing options available, run:
```
phabricator/ $ ./bin/search help index
```
Advanced Example
================
This is a more advanced example which shows a configuration with multiple
different services in different roles. In this example:
- Phabricator is using an Elasticsearch 2 service as its primary fulltext
service.
- An Elasticsearch 5 service is online, but only receiving writes.
- The MySQL service is serving as a backup if Elasticsearch fails.
This particular configuration may not be very useful. It is primarily
intended to show how to configure many different options.
```lang=json
[
{
"type": "elasticsearch",
"version": 2,
"hosts": [
{
"host": "elastic2.mycompany.com",
"port": 9200,
"protocol": "http"
}
]
},
{
"type": "elasticsearch",
"version": 5,
"hosts": [
{
"host": "elastic5.mycompany.com",
"port": 9789,
"protocol": "https"
"roles": {
"read": false,
"write": true
}
}
]
},
{
"type": "mysql"
}
]
```