phorge-phorge

mirror of https://we.phorge.it/source/phorge.git synced 2025-02-01 01:18:22 +01:00

History

epriestley ebff07d019 Automatically sever databases after prolonged unreachability Summary: Ref T4571. When a database goes down briefly, we fall back to replicas. However, this fallback is slow (not good for users) and keeps sending a lot of traffic to the master (might be bad if the root cause is load-related). Keep track of recent connections and fully degrade into "severed" mode if we see a sequence of failures over a reasonable period of time. In this mode, we send much less traffic to the master (faster for users; less load for the database). We do send a little bit of traffic still, and if the master recovers we'll recover back into normal mode seeing several connections in a row succeed. This is similar to what most load balancers do when pulling web servers in and out of pools. For now, the specific numbers are: - We do at most one health check every 3 seconds. - If 5 checks in a row fail or succeed, we sever or un-sever the database (so it takes about 15 seconds to switch modes). - If the database is currently marked unhealthy, we reduce timeouts and retries when connecting to it. Test Plan: - Configured a bad `master`. - Browsed around for a bit, initially saw "unrechable master" errors. - After about 15 seconds, saw "major interruption" errors instead. - Fixed the config for `master`. - Browsed around for a while longer. - After about 15 seconds, things recovered. - Used "Cluster Databases" console to keep an eye on health checks: it now shows how many recent health checks were good: {F1213397} Reviewers: chad Reviewed By: chad Maniphest Tasks: T4571 Differential Revision: https://secure.phabricator.com/D15677		2016-04-11 08:43:52 -07:00
..
__tests__	Use PhutilClassMapQuery instead of PhutilSymbolLoader	2015-08-14 07:49:01 +10:00
aphront	When proxying cluster HTTP requests, forward only selected headers	2016-04-09 03:39:17 -07:00
applications	Automatically sever databases after prolonged unreachability	2016-04-11 08:43:52 -07:00
docs	Add a "Database Cluster Status" console in Config	2016-04-09 20:34:13 -07:00
extensions	Add `src/extensions/` to Phabricator	2013-08-14 15:38:06 -07:00
infrastructure	Automatically sever databases after prolonged unreachability	2016-04-11 08:43:52 -07:00
view	Fix an issue with date parsing when viewer timezone differs from server timezone	2016-04-11 07:47:37 -07:00
__phutil_library_init__.php	Delete license headers from files	2012-11-05 11:16:51 -08:00
__phutil_library_map__.php	Automatically sever databases after prolonged unreachability	2016-04-11 08:43:52 -07:00