1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-28 09:42:41 +01:00
phorge-phorge/src/applications/diffusion
epriestley 892a9a1f07 Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.

If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.

We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.

Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.

Basically, the changes are:

  - If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
  - Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
  - Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.

Test Plan:
  - Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
  - Pushed like this:

```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
 1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```

  - Here, I stopped `mysqld` from the CLI in another terminal window.

```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```

  - Here, I started `mysqld` again.

```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
   2cbf87c..707ecc3  master -> master
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10860

Differential Revision: https://secure.phabricator.com/D15792
2016-04-25 11:37:31 -07:00
..
application Trivially implement RepositoryEditEngine and API methods 2016-04-17 16:02:13 -07:00
capability Simplify the implementation of PhabricatorPolicyCapability subclasses 2014-07-25 08:25:42 +10:00
conduit Move all cluster locking logic to a separate class 2016-04-25 11:20:29 -07:00
config Move FontIcon calls to Icon 2016-01-28 08:48:45 -08:00
controller Move all cluster locking logic to a separate class 2016-04-25 11:20:29 -07:00
data Parse and display commit authorship date in Git in Diffusion 2016-01-11 09:32:37 -08:00
doorkeeper Partially modernize Doorkeeper/Asana bridge 2014-10-01 07:09:34 -07:00
edge Fix reverting commit language 2015-06-01 09:54:30 +10:00
editor Support more transactions types in RepositoryEditEngine 2016-04-17 16:27:02 -07:00
engine Record which cluster host received a push 2016-04-19 13:06:30 -07:00
engineextension Move PhabricatorHovercard to PHUIHovercard 2016-02-03 16:26:30 +00:00
exception Replace AphrontUsageException with AphrontMalformedRequestException 2015-09-03 10:04:17 -07:00
garbagecollector Support ID-based repository URIs, and canonicalize repository URIs 2016-02-18 09:56:28 -08:00
gitlfs Implement a Git LFS link table and basic batch API 2016-03-17 17:15:20 -07:00
herald Move various other callsites away from callsigns 2016-01-04 06:54:42 -08:00
management Show "Last Writer" and "Last Write At" in the UI, add more documentation 2016-04-20 10:45:03 -07:00
panel Add "Mailing List" users 2015-06-03 18:42:33 -07:00
protocol Make cluster repositories more resistant to freezing 2016-04-25 11:37:31 -07:00
query Extract repository command construction from Repositories 2016-04-19 04:51:48 -07:00
remarkup Stop all object mentions from matching after "@" 2015-09-29 06:43:49 -07:00
request Remove uncalled DiffusionRequest->getCallsign() 2016-02-17 17:17:35 -08:00
response Implement a Git LFS server which supports no operations 2016-03-17 08:08:43 -07:00
ssh Make cluster repositories more resistant to freezing 2016-04-25 11:37:31 -07:00
symbol Extend from Phobject 2015-06-15 18:02:27 +10:00
typeahead Improve type and icon information in typeahead 2016-02-05 12:48:20 -08:00
view Fix an issue with PHID/handle management in push logs 2016-04-20 04:47:10 -07:00
DiffusionLintSaveRunner.php Move repository URIs to a dedicated index 2016-01-13 09:34:31 -08:00