2016-04-10 00:21:29 +02:00
|
|
|
<?php
|
|
|
|
|
|
|
|
final class PhabricatorDatabaseRef
|
|
|
|
extends Phobject {
|
|
|
|
|
|
|
|
const STATUS_OKAY = 'okay';
|
|
|
|
const STATUS_FAIL = 'fail';
|
|
|
|
const STATUS_AUTH = 'auth';
|
|
|
|
const STATUS_REPLICATION_CLIENT = 'replication-client';
|
|
|
|
|
|
|
|
const REPLICATION_OKAY = 'okay';
|
|
|
|
const REPLICATION_MASTER_REPLICA = 'master-replica';
|
|
|
|
const REPLICATION_REPLICA_NONE = 'replica-none';
|
|
|
|
const REPLICATION_SLOW = 'replica-slow';
|
|
|
|
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
const KEY_REFS = 'cluster.db.refs';
|
|
|
|
|
2016-04-10 00:21:29 +02:00
|
|
|
private $host;
|
|
|
|
private $port;
|
|
|
|
private $user;
|
|
|
|
private $pass;
|
|
|
|
private $disabled;
|
|
|
|
private $isMaster;
|
|
|
|
|
|
|
|
private $connectionLatency;
|
|
|
|
private $connectionStatus;
|
|
|
|
private $connectionMessage;
|
|
|
|
|
|
|
|
private $replicaStatus;
|
|
|
|
private $replicaMessage;
|
|
|
|
private $replicaDelay;
|
|
|
|
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
private $didFailToConnect;
|
|
|
|
|
2016-04-10 00:21:29 +02:00
|
|
|
public function setHost($host) {
|
|
|
|
$this->host = $host;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getHost() {
|
|
|
|
return $this->host;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setPort($port) {
|
|
|
|
$this->port = $port;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getPort() {
|
|
|
|
return $this->port;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setUser($user) {
|
|
|
|
$this->user = $user;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getUser() {
|
|
|
|
return $this->user;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setPass(PhutilOpaqueEnvelope $pass) {
|
|
|
|
$this->pass = $pass;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getPass() {
|
|
|
|
return $this->pass;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setIsMaster($is_master) {
|
|
|
|
$this->isMaster = $is_master;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getIsMaster() {
|
|
|
|
return $this->isMaster;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setDisabled($disabled) {
|
|
|
|
$this->disabled = $disabled;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getDisabled() {
|
|
|
|
return $this->disabled;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setConnectionLatency($connection_latency) {
|
|
|
|
$this->connectionLatency = $connection_latency;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getConnectionLatency() {
|
|
|
|
return $this->connectionLatency;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setConnectionStatus($connection_status) {
|
|
|
|
$this->connectionStatus = $connection_status;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getConnectionStatus() {
|
|
|
|
if ($this->connectionStatus === null) {
|
|
|
|
throw new PhutilInvalidStateException('queryAll');
|
|
|
|
}
|
|
|
|
|
|
|
|
return $this->connectionStatus;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setConnectionMessage($connection_message) {
|
|
|
|
$this->connectionMessage = $connection_message;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getConnectionMessage() {
|
|
|
|
return $this->connectionMessage;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setReplicaStatus($replica_status) {
|
|
|
|
$this->replicaStatus = $replica_status;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getReplicaStatus() {
|
|
|
|
return $this->replicaStatus;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setReplicaMessage($replica_message) {
|
|
|
|
$this->replicaMessage = $replica_message;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getReplicaMessage() {
|
|
|
|
return $this->replicaMessage;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function setReplicaDelay($replica_delay) {
|
|
|
|
$this->replicaDelay = $replica_delay;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function getReplicaDelay() {
|
|
|
|
return $this->replicaDelay;
|
|
|
|
}
|
|
|
|
|
|
|
|
public static function getConnectionStatusMap() {
|
|
|
|
return array(
|
|
|
|
self::STATUS_OKAY => array(
|
|
|
|
'icon' => 'fa-exchange',
|
|
|
|
'color' => 'green',
|
|
|
|
'label' => pht('Okay'),
|
|
|
|
),
|
|
|
|
self::STATUS_FAIL => array(
|
|
|
|
'icon' => 'fa-times',
|
|
|
|
'color' => 'red',
|
|
|
|
'label' => pht('Failed'),
|
|
|
|
),
|
|
|
|
self::STATUS_AUTH => array(
|
|
|
|
'icon' => 'fa-key',
|
|
|
|
'color' => 'red',
|
|
|
|
'label' => pht('Invalid Credentials'),
|
|
|
|
),
|
|
|
|
self::STATUS_REPLICATION_CLIENT => array(
|
|
|
|
'icon' => 'fa-eye-slash',
|
|
|
|
'color' => 'yellow',
|
|
|
|
'label' => pht('Missing Permission'),
|
|
|
|
),
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
public static function getReplicaStatusMap() {
|
|
|
|
return array(
|
|
|
|
self::REPLICATION_OKAY => array(
|
|
|
|
'icon' => 'fa-download',
|
|
|
|
'color' => 'green',
|
|
|
|
'label' => pht('Okay'),
|
|
|
|
),
|
|
|
|
self::REPLICATION_MASTER_REPLICA => array(
|
|
|
|
'icon' => 'fa-database',
|
|
|
|
'color' => 'red',
|
|
|
|
'label' => pht('Replicating Master'),
|
|
|
|
),
|
|
|
|
self::REPLICATION_REPLICA_NONE => array(
|
|
|
|
'icon' => 'fa-download',
|
|
|
|
'color' => 'red',
|
|
|
|
'label' => pht('Not Replicating'),
|
|
|
|
),
|
|
|
|
self::REPLICATION_SLOW => array(
|
|
|
|
'icon' => 'fa-hourglass',
|
|
|
|
'color' => 'red',
|
|
|
|
'label' => pht('Slow Replication'),
|
|
|
|
),
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
public static function getLiveRefs() {
|
|
|
|
$cache = PhabricatorCaches::getRequestCache();
|
|
|
|
|
|
|
|
$refs = $cache->getKey(self::KEY_REFS);
|
|
|
|
if (!$refs) {
|
|
|
|
$refs = self::newRefs();
|
|
|
|
$cache->setKey(self::KEY_REFS, $refs);
|
|
|
|
}
|
|
|
|
|
|
|
|
return $refs;
|
|
|
|
}
|
|
|
|
|
|
|
|
public static function newRefs() {
|
2016-04-10 00:21:29 +02:00
|
|
|
$refs = array();
|
|
|
|
|
|
|
|
$default_port = PhabricatorEnv::getEnvConfig('mysql.port');
|
|
|
|
$default_port = nonempty($default_port, 3306);
|
|
|
|
|
|
|
|
$default_user = PhabricatorEnv::getEnvConfig('mysql.user');
|
|
|
|
|
|
|
|
$default_pass = PhabricatorEnv::getEnvConfig('mysql.pass');
|
|
|
|
$default_pass = new PhutilOpaqueEnvelope($default_pass);
|
|
|
|
|
|
|
|
$config = PhabricatorEnv::getEnvConfig('cluster.databases');
|
|
|
|
foreach ($config as $server) {
|
|
|
|
$host = $server['host'];
|
|
|
|
$port = idx($server, 'port', $default_port);
|
|
|
|
$user = idx($server, 'user', $default_user);
|
|
|
|
$disabled = idx($server, 'disabled', false);
|
|
|
|
|
|
|
|
$pass = idx($server, 'pass');
|
|
|
|
if ($pass) {
|
|
|
|
$pass = new PhutilOpaqueEnvelope($pass);
|
|
|
|
} else {
|
|
|
|
$pass = clone $default_pass;
|
|
|
|
}
|
|
|
|
|
|
|
|
$role = $server['role'];
|
|
|
|
|
|
|
|
$ref = id(new self())
|
|
|
|
->setHost($host)
|
|
|
|
->setPort($port)
|
|
|
|
->setUser($user)
|
|
|
|
->setPass($pass)
|
|
|
|
->setDisabled($disabled)
|
|
|
|
->setIsMaster(($role == 'master'));
|
|
|
|
|
|
|
|
$refs[] = $ref;
|
|
|
|
}
|
|
|
|
|
|
|
|
return $refs;
|
|
|
|
}
|
|
|
|
|
|
|
|
public static function queryAll() {
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
$refs = self::newRefs();
|
2016-04-10 00:21:29 +02:00
|
|
|
|
|
|
|
foreach ($refs as $ref) {
|
|
|
|
if ($ref->getDisabled()) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
$conn = $ref->newManagementConnection();
|
2016-04-10 00:21:29 +02:00
|
|
|
|
|
|
|
$t_start = microtime(true);
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
$replica_status = false;
|
2016-04-10 00:21:29 +02:00
|
|
|
try {
|
|
|
|
$replica_status = queryfx_one($conn, 'SHOW SLAVE STATUS');
|
|
|
|
$ref->setConnectionStatus(self::STATUS_OKAY);
|
|
|
|
} catch (AphrontAccessDeniedQueryException $ex) {
|
|
|
|
$ref->setConnectionStatus(self::STATUS_REPLICATION_CLIENT);
|
|
|
|
$ref->setConnectionMessage(
|
|
|
|
pht(
|
|
|
|
'No permission to run "SHOW SLAVE STATUS". Grant this user '.
|
|
|
|
'"REPLICATION CLIENT" permission to allow Phabricator to '.
|
|
|
|
'monitor replica health.'));
|
|
|
|
} catch (AphrontInvalidCredentialsQueryException $ex) {
|
|
|
|
$ref->setConnectionStatus(self::STATUS_AUTH);
|
|
|
|
$ref->setConnectionMessage($ex->getMessage());
|
|
|
|
} catch (AphrontQueryException $ex) {
|
|
|
|
$ref->setConnectionStatus(self::STATUS_FAIL);
|
|
|
|
|
|
|
|
$class = get_class($ex);
|
|
|
|
$message = $ex->getMessage();
|
|
|
|
$ref->setConnectionMessage(
|
|
|
|
pht(
|
|
|
|
'%s: %s',
|
|
|
|
get_class($ex),
|
|
|
|
$ex->getMessage()));
|
|
|
|
}
|
|
|
|
$t_end = microtime(true);
|
|
|
|
$ref->setConnectionLatency($t_end - $t_start);
|
|
|
|
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
if ($replica_status !== false) {
|
|
|
|
$is_replica = (bool)$replica_status;
|
|
|
|
if ($ref->getIsMaster() && $is_replica) {
|
|
|
|
$ref->setReplicaStatus(self::REPLICATION_MASTER_REPLICA);
|
2016-04-10 00:21:29 +02:00
|
|
|
$ref->setReplicaMessage(
|
|
|
|
pht(
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
'This host has a "master" role, but is replicating data from '.
|
|
|
|
'another host ("%s")!',
|
|
|
|
idx($replica_status, 'Master_Host')));
|
|
|
|
} else if (!$ref->getIsMaster() && !$is_replica) {
|
|
|
|
$ref->setReplicaStatus(self::REPLICATION_REPLICA_NONE);
|
|
|
|
$ref->setReplicaMessage(
|
|
|
|
pht(
|
|
|
|
'This host has a "replica" role, but is not replicating data '.
|
|
|
|
'from a master (no output from "SHOW SLAVE STATUS").'));
|
|
|
|
} else {
|
|
|
|
$ref->setReplicaStatus(self::REPLICATION_OKAY);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($is_replica) {
|
|
|
|
$latency = (int)idx($replica_status, 'Seconds_Behind_Master');
|
|
|
|
$ref->setReplicaDelay($latency);
|
|
|
|
if ($latency > 30) {
|
|
|
|
$ref->setReplicaStatus(self::REPLICATION_SLOW);
|
|
|
|
$ref->setReplicaMessage(
|
|
|
|
pht(
|
|
|
|
'This replica is lagging far behind the master. Data is at '.
|
|
|
|
'risk!'));
|
|
|
|
}
|
2016-04-10 00:21:29 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return $refs;
|
|
|
|
}
|
|
|
|
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
public function newManagementConnection() {
|
|
|
|
return $this->newConnection(
|
|
|
|
array(
|
|
|
|
'retries' => 0,
|
|
|
|
'timeout' => 3,
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
|
|
|
public function newApplicationConnection($database) {
|
|
|
|
return $this->newConnection(
|
|
|
|
array(
|
|
|
|
'database' => $database,
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
public function isSevered() {
|
|
|
|
return $this->didFailToConnect;
|
|
|
|
}
|
|
|
|
|
|
|
|
public function isReachable(AphrontDatabaseConnection $connection) {
|
|
|
|
if ($this->isSevered()) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
try {
|
|
|
|
$connection->openConnection();
|
|
|
|
$reachable = true;
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
$reachable = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!$reachable) {
|
|
|
|
$this->didFailToConnect = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return $reachable;
|
|
|
|
}
|
|
|
|
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
public static function getMasterDatabaseRef() {
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
$refs = self::getLiveRefs();
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
|
|
|
|
if (!$refs) {
|
|
|
|
$conf = PhabricatorEnv::newObjectFromConfig(
|
|
|
|
'mysql.configuration-provider',
|
|
|
|
array(null, 'w', null));
|
|
|
|
|
|
|
|
return id(new self())
|
|
|
|
->setHost($conf->getHost())
|
|
|
|
->setPort($conf->getPort())
|
|
|
|
->setUser($conf->getUser())
|
|
|
|
->setPass($conf->getPassword())
|
|
|
|
->setIsMaster(true);
|
|
|
|
}
|
|
|
|
|
|
|
|
$master = null;
|
|
|
|
foreach ($refs as $ref) {
|
|
|
|
if ($ref->getDisabled()) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if ($ref->getIsMaster()) {
|
|
|
|
return $ref;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
2016-04-10 14:10:06 +02:00
|
|
|
public static function getReplicaDatabaseRef() {
|
Automatically degrade to read-only mode when unable to connect to the master
Summary:
Ref T4571. If we fail to connect to the master, automatically try to degrade into a temporary read-only mode ("UNREACHABLE") for the remainder of the request, if possible.
If the request was something like "load the homepage", that'll work fine. If it was something like "submit a comment", there's nothing we can do and we just have to fail.
Detecting this condition imposes a performance penalty: every request checks the connection and gives the database a long time to respond, since we don't want to drop writes unless we have to. So the degraded mode works, but it's really slow, and may perpetuate the problem if the root issue is load-related.
This lays the groundwork for improving this case by degrading futher into a "SEVERED" mode which will persist across requests. In the future, if several requests in a short period of time fail, we'll sever the database host and refuse to try to connect to it for a little while, connecting directly to replicas instead (basically, we're "health checking" the master, like a load balancer would health check a web application server). This will give us a better (much faster) degraded mode in a major service disruption, and reduce load on the master if the root cause is load-related, giving it a better chance of recovering on its own.
Test Plan:
- Disabled master in config by changing the host/username, got degraded automatically to UNREACAHBLE mode immediately.
- Faked full SEVERED mode, requests hit replicas and put me in the mode properly.
- Made stuff work, hit some good pages.
- Hit some non-cluster pages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571
Differential Revision: https://secure.phabricator.com/D15674
2016-04-10 14:51:34 +02:00
|
|
|
$refs = self::getLiveRefs();
|
2016-04-10 14:10:06 +02:00
|
|
|
|
|
|
|
if (!$refs) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
// TODO: We may have multiple replicas to choose from, and could make
|
|
|
|
// more of an effort to pick the "best" one here instead of always
|
|
|
|
// picking the first one. Once we've picked one, we should try to use
|
|
|
|
// the same replica for the rest of the request, though.
|
|
|
|
|
|
|
|
foreach ($refs as $ref) {
|
|
|
|
if ($ref->getDisabled()) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if ($ref->getIsMaster()) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
return $ref;
|
|
|
|
}
|
|
|
|
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
private function newConnection(array $options) {
|
|
|
|
$spec = $options + array(
|
|
|
|
'user' => $this->getUser(),
|
|
|
|
'pass' => $this->getPass(),
|
|
|
|
'host' => $this->getHost(),
|
|
|
|
'port' => $this->getPort(),
|
|
|
|
'database' => null,
|
|
|
|
'retries' => 3,
|
|
|
|
'timeout' => 15,
|
|
|
|
);
|
|
|
|
|
2016-04-10 00:21:29 +02:00
|
|
|
return PhabricatorEnv::newObjectFromConfig(
|
|
|
|
'mysql.implementation',
|
|
|
|
array(
|
When `cluster.databases` is configured, read the master connection from it
Summary:
Ref T4571. Ref T10759. Ref T10758. This isn't complete, but gets most of the job done:
- When `cluster.databases` is set up, most things ignore `mysql.host` now.
- You can `bin/storage upgrade` and stuff works.
- You can browse around in the web UI and stuff works.
There's still a lot of weird tricky stuff to navigate, and this has real no advantages over configuring a single server yet (no automatic failover, etc).
Test Plan:
- Configured `cluster.databases` to point at my `t1.micro` hosts in EC2 (master + replica).
- Ran `bin/storage upgrade`, got a new install setup on them properly.
- Survived setup warnings, browsed around.
- Switched back to local config, ran `bin/storage upgrade`, browsed around, went through setup checks.
- Intentionally broke config (bad hosts, no masters) and things seemed to react reasonably well.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4571, T10758, T10759
Differential Revision: https://secure.phabricator.com/D15668
2016-04-10 04:46:42 +02:00
|
|
|
$spec,
|
2016-04-10 00:21:29 +02:00
|
|
|
));
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|