1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-14 08:41:07 +01:00
Commit graph

14 commits

Author SHA1 Message Date
epriestley
021c612cb2 When we fail to acquire a repository lock, try to provide a hint about why
Summary:
Ref T13202. See PHI889. If the lock log is enabled, we can try to offer more details about lock holders.

When we fail to acquire a lock:

  - check for recent acquisitions and suggest that this is a bottleneck issue;
  - if there are no recent acquisitions, check for the last acquisition and print details about it (what process, how long ago, whether or not we believe it was released).

Test Plan:
  - Enabled the lock log.
  - Changed the lock wait time to 1 second.
  - Added a `sleep(10)` after grabbing the lock.
  - In one window, ran a Conduit call or a `git fetch`.
  - In another window, ran another operation.
  - Got useful/sensible errors for both ssh and web lock holders, for example:

> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=12609, host=orbital-3.local, sapi=apache2handler, controller=PhabricatorConduitAPIController, method=diffusion.rawdiffquery) 3 second(s) ago. There is no record of this lock being released.)

> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=65251, host=orbital-3.local, sapi=cli, argv=/Users/epriestley/dev/core/lib/phabricator/bin/ssh-exec --phabricator-ssh-device local.phacility.net --phabricator-ssh-key 2) 2 second(s) ago. There is no record of this lock being released.)

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13202

Differential Revision: https://secure.phabricator.com/D19702
2018-09-24 15:20:07 -07:00
epriestley
44f0664d2c Add a "lock log" for debugging where locks are being held
Summary: Depends on D19173. Ref T13096. Adds an optional, disabled-by-default lock log to make it easier to figure out what is acquiring and holding locks.

Test Plan: Ran `bin/lock log --enable`, `--disable`, `--name`, etc. Saw sensible-looking output with log enabled and daemons restarted. Saw no additional output with log disabled and daemons restarted.

Maniphest Tasks: T13096

Differential Revision: https://secure.phabricator.com/D19174
2018-03-05 17:55:34 -08:00
epriestley
fd367adbaf Parameterize PhabricatorGlobalLock
Summary:
Ref T13096. Currently, we do a fair amount of clever digesting and string manipulation to build lock names which are less than 64 characters long while still being reasonably readable.

Instead, do more of this automatically. This will let lock acquisition become simpler and make it more possible to build a useful lock log.

Test Plan: Ran `bin/repository update`, saw a reasonable lock acquire and release.

Maniphest Tasks: T13096

Differential Revision: https://secure.phabricator.com/D19173
2018-03-05 15:30:27 -08:00
epriestley
892a9a1f07 Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.

If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.

We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.

Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.

Basically, the changes are:

  - If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
  - Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
  - Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.

Test Plan:
  - Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
  - Pushed like this:

```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
 1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```

  - Here, I stopped `mysqld` from the CLI in another terminal window.

```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```

  - Here, I started `mysqld` again.

```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
   2cbf87c..707ecc3  master -> master
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10860

Differential Revision: https://secure.phabricator.com/D15792
2016-04-25 11:37:31 -07:00
Joshua Spence
7164606285 Add a lock to storage upgrade and adjustment
Summary: Fixes T9715. Adds a MySQL-based lock to ensure that schema migrations are not applied on multiple hosts simultaneously.

Test Plan: Ran `./bin/storage upgrade` concurrently. One invocation was successful whilst the other hit a `PhutilLockException`.

Reviewers: #blessed_reviewers, epriestley

Subscribers: Korvin

Maniphest Tasks: T9715

Differential Revision: https://secure.phabricator.com/D14463
2015-12-02 06:18:28 +11:00
epriestley
fad75f939d Improve messaging around repository locks
Summary:
Fixes T6958. Ref T7484.

  - When we collide on a lock in `bin/repository update`, explain what that means.
  - GlobalLock currently uses a "lock name" which is different from the lock's actual name. Don't do this. There's a small chance this fixes T7484, but I don't have high hopes.

Test Plan: Ran `bin/repository update X` in two windows really fast, got the new message in one of them.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T6958, T7484

Differential Revision: https://secure.phabricator.com/D12574
2015-04-27 10:25:53 -07:00
epriestley
2b3d3cf7e4 Enforce that global locks have keys shorter than 64 characters
Summary:
Fixes T7484. There's a bunch of spooky mystery here but the current behavior can probably cause problems in at least some situations.

Also moves a couple callsigns to monograms (see T4245).

Test Plan:
  - Faked a short lock length to hit the exception.
  - Updated normally.
  - Grepped for other use sites, none seemed suspicious or likely to overflow the lock length.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T7484

Differential Revision: https://secure.phabricator.com/D12263
2015-04-02 13:42:22 -07:00
Jakub Vrana
324dd3c0d6 Reduce wait_timeout to max. allowed value
Summary: http://dev.mysql.com/doc/mysql/en/server-system-variables.html#sysvar_wait_timeout

Test Plan:
  $ arc unit

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D5590
2013-04-05 22:42:27 -07:00
vrana
ef85f49adc Delete license headers from files
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).

We are removing the headers for these reasons:

- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.

This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).

Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.

Reviewers: epriestley, davidrecordon

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T2035

Differential Revision: https://secure.phabricator.com/D3886
2012-11-05 11:16:51 -08:00
epriestley
62b06f0f5d Fix a memory leak in PhabricatorGlobalLock
Summary:
We currently cache all connections in LiskDAO so we can roll back transactions when fixtured unit tests complete.

Since we establish a new connection wrapper each time we establish a global lock, this cache currently grows without bound.

Instead, pool global lock connections so we never have more than the largest number of locks we've held open at once (in PullLocalDaemon, always 1).

Another way to solve this is probably to add an "onclose" callback to `AphrontDatabaseConnection` so that it can notify any caches that it been closed. However, we currently allow a connection to be later reopened (which seeems reasonable) so we'd need a callback for that too. This is much simpler, and this use case is unusual, so I'd like to wait for more use cases before pursing a more complicated fix.

Test Plan:
Ran this in a loop:

    while (true) {
      for ($ii = 0; $ii < 100; $ii++) {
        $lock = PhabricatorGlobalLock::newLock('derp');
        $lock->lock();
        $lock->unlock();
      }
      $this->sleep(1);
    }

Previously it leaked ~100KB/sec, now has stable memory usage.

Reviewers: vrana, nh, btrahan

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1636

Differential Revision: https://secure.phabricator.com/D3239
2012-08-10 11:28:43 -07:00
epriestley
f1270315e9 Allow connections to be closed; close connections for global locks
Summary: Add an explicit close() method to connections and call it in GlobalLock.

Test Plan:
Wrote a script like this:

  $lock = PhabricatorGlobalLock::newLock('test');
  echo "LOCK";
  $lock->lock();
  sleep(10);
  echo "UNLOCK";
  $lock->unlock();
  sleep(9999);

Using `SHOW FULL PROCESSLIST`, verified the connection closed after 10 seconds with both the "MySQL" and "MySQLi" implementations.

Reviewers: btrahan, vrana

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1470

Differential Revision: https://secure.phabricator.com/D3035
2012-07-23 19:06:58 -07:00
epriestley
bf8cbf55b1 Namespace GlobalLocks to storage namespaces
Summary:
Currently, multiple unit tests that acquire global locks will interfere with each other. Namespace the locks so they don't.

(Possibly we should also rename this to PhabricatorStorageNamespaceLock or something since it's not really global any more, but that's kind of unwieldy...)

Test Plan: Acquired locks with --trace and verified they were namespaced properly.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T1162

Differential Revision: https://secure.phabricator.com/D2939
2012-07-09 10:39:30 -07:00
epriestley
ce360926b7 Allow PhabricatorGlobalLock to block
Summary: See D2924.

Test Plan: Ran locks with blocking timeouts.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Differential Revision: https://secure.phabricator.com/D2925
2012-07-05 16:03:43 -07:00
epriestley
692e54ee36 Implement a MySQL-backed global lock
Summary: Implementation is a little crazy but this seems to work as advertised.

Test Plan: Acquired locks with "lock.php". Verified they held as long as the process reamined open and released properly on kill -9, ^C, etc.

Reviewers: nh, jungejason, vrana, btrahan, Girish, edward

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T1400

Differential Revision: https://secure.phabricator.com/D2864
2012-06-27 13:59:12 -07:00