Summary:
Ref T13202. See PHI889. If the lock log is enabled, we can try to offer more details about lock holders.
When we fail to acquire a lock:
- check for recent acquisitions and suggest that this is a bottleneck issue;
- if there are no recent acquisitions, check for the last acquisition and print details about it (what process, how long ago, whether or not we believe it was released).
Test Plan:
- Enabled the lock log.
- Changed the lock wait time to 1 second.
- Added a `sleep(10)` after grabbing the lock.
- In one window, ran a Conduit call or a `git fetch`.
- In another window, ran another operation.
- Got useful/sensible errors for both ssh and web lock holders, for example:
> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=12609, host=orbital-3.local, sapi=apache2handler, controller=PhabricatorConduitAPIController, method=diffusion.rawdiffquery) 3 second(s) ago. There is no record of this lock being released.)
> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=65251, host=orbital-3.local, sapi=cli, argv=/Users/epriestley/dev/core/lib/phabricator/bin/ssh-exec --phabricator-ssh-device local.phacility.net --phabricator-ssh-key 2) 2 second(s) ago. There is no record of this lock being released.)
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13202
Differential Revision: https://secure.phabricator.com/D19702
Summary: Depends on D19173. Ref T13096. Adds an optional, disabled-by-default lock log to make it easier to figure out what is acquiring and holding locks.
Test Plan: Ran `bin/lock log --enable`, `--disable`, `--name`, etc. Saw sensible-looking output with log enabled and daemons restarted. Saw no additional output with log disabled and daemons restarted.
Maniphest Tasks: T13096
Differential Revision: https://secure.phabricator.com/D19174
Summary:
Ref T13096. Currently, we do a fair amount of clever digesting and string manipulation to build lock names which are less than 64 characters long while still being reasonably readable.
Instead, do more of this automatically. This will let lock acquisition become simpler and make it more possible to build a useful lock log.
Test Plan: Ran `bin/repository update`, saw a reasonable lock acquire and release.
Maniphest Tasks: T13096
Differential Revision: https://secure.phabricator.com/D19173
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
Summary: Fixes T9715. Adds a MySQL-based lock to ensure that schema migrations are not applied on multiple hosts simultaneously.
Test Plan: Ran `./bin/storage upgrade` concurrently. One invocation was successful whilst the other hit a `PhutilLockException`.
Reviewers: #blessed_reviewers, epriestley
Subscribers: Korvin
Maniphest Tasks: T9715
Differential Revision: https://secure.phabricator.com/D14463
Summary:
Fixes T6958. Ref T7484.
- When we collide on a lock in `bin/repository update`, explain what that means.
- GlobalLock currently uses a "lock name" which is different from the lock's actual name. Don't do this. There's a small chance this fixes T7484, but I don't have high hopes.
Test Plan: Ran `bin/repository update X` in two windows really fast, got the new message in one of them.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6958, T7484
Differential Revision: https://secure.phabricator.com/D12574
Summary:
Fixes T7484. There's a bunch of spooky mystery here but the current behavior can probably cause problems in at least some situations.
Also moves a couple callsigns to monograms (see T4245).
Test Plan:
- Faked a short lock length to hit the exception.
- Updated normally.
- Grepped for other use sites, none seemed suspicious or likely to overflow the lock length.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7484
Differential Revision: https://secure.phabricator.com/D12263
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary:
We currently cache all connections in LiskDAO so we can roll back transactions when fixtured unit tests complete.
Since we establish a new connection wrapper each time we establish a global lock, this cache currently grows without bound.
Instead, pool global lock connections so we never have more than the largest number of locks we've held open at once (in PullLocalDaemon, always 1).
Another way to solve this is probably to add an "onclose" callback to `AphrontDatabaseConnection` so that it can notify any caches that it been closed. However, we currently allow a connection to be later reopened (which seeems reasonable) so we'd need a callback for that too. This is much simpler, and this use case is unusual, so I'd like to wait for more use cases before pursing a more complicated fix.
Test Plan:
Ran this in a loop:
while (true) {
for ($ii = 0; $ii < 100; $ii++) {
$lock = PhabricatorGlobalLock::newLock('derp');
$lock->lock();
$lock->unlock();
}
$this->sleep(1);
}
Previously it leaked ~100KB/sec, now has stable memory usage.
Reviewers: vrana, nh, btrahan
Reviewed By: vrana
CC: aran
Maniphest Tasks: T1636
Differential Revision: https://secure.phabricator.com/D3239
Summary: Add an explicit close() method to connections and call it in GlobalLock.
Test Plan:
Wrote a script like this:
$lock = PhabricatorGlobalLock::newLock('test');
echo "LOCK";
$lock->lock();
sleep(10);
echo "UNLOCK";
$lock->unlock();
sleep(9999);
Using `SHOW FULL PROCESSLIST`, verified the connection closed after 10 seconds with both the "MySQL" and "MySQLi" implementations.
Reviewers: btrahan, vrana
Reviewed By: vrana
CC: aran
Maniphest Tasks: T1470
Differential Revision: https://secure.phabricator.com/D3035
Summary:
Currently, multiple unit tests that acquire global locks will interfere with each other. Namespace the locks so they don't.
(Possibly we should also rename this to PhabricatorStorageNamespaceLock or something since it's not really global any more, but that's kind of unwieldy...)
Test Plan: Acquired locks with --trace and verified they were namespaced properly.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1162
Differential Revision: https://secure.phabricator.com/D2939
Summary: See D2924.
Test Plan: Ran locks with blocking timeouts.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D2925
Summary: Implementation is a little crazy but this seems to work as advertised.
Test Plan: Acquired locks with "lock.php". Verified they held as long as the process reamined open and released properly on kill -9, ^C, etc.
Reviewers: nh, jungejason, vrana, btrahan, Girish, edward
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1400
Differential Revision: https://secure.phabricator.com/D2864