1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-01 03:02:43 +01:00
phorge-phorge/src/applications/drydock/storage/DrydockSlotLock.php

177 lines
4.3 KiB
PHP
Raw Normal View History

Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
<?php
/**
* Simple optimistic locks for Drydock resources and leases.
*
* Most blueprints only need very simple locks: for example, a host blueprint
* might not want to create multiple resources representing the same physical
* machine. These optimistic "slot locks" provide a flexible way to do this
* sort of simple locking.
*
* @task info Getting Lock Information
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
* @task lock Acquiring and Releasing Locks
*/
final class DrydockSlotLock extends DrydockDAO {
protected $ownerPHID;
protected $lockIndex;
protected $lockKey;
protected function getConfiguration() {
return array(
self::CONFIG_TIMESTAMPS => false,
self::CONFIG_COLUMN_SCHEMA => array(
'lockIndex' => 'bytes12',
'lockKey' => 'text',
),
self::CONFIG_KEY_SCHEMA => array(
'key_lock' => array(
'columns' => array('lockIndex'),
'unique' => true,
),
'key_owner' => array(
'columns' => array('ownerPHID'),
),
),
) + parent::getConfiguration();
}
/* -( Getting Lock Information )------------------------------------------- */
/**
* Load all locks held by a particular owner.
*
* @param phid Owner PHID.
* @return list<DrydockSlotLock> All held locks.
* @task info
*/
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
public static function loadLocks($owner_phid) {
return id(new DrydockSlotLock())->loadAllWhere(
'ownerPHID = %s',
$owner_phid);
}
/**
* Test if a lock is currently free.
*
* @param string Lock key to test.
* @return bool True if the lock is currently free.
* @task info
*/
public static function isLockFree($lock) {
return self::areLocksFree(array($lock));
}
/**
* Test if a list of locks are all currently free.
*
* @param list<string> List of lock keys to test.
* @return bool True if all locks are currently free.
* @task info
*/
public static function areLocksFree(array $locks) {
$lock_map = self::loadHeldLocks($locks);
return !$lock_map;
}
/**
* Load named locks.
*
* @param list<string> List of lock keys to load.
* @return list<DrydockSlotLock> List of held locks.
* @task info
*/
public static function loadHeldLocks(array $locks) {
if (!$locks) {
return array();
}
$table = new DrydockSlotLock();
$conn_r = $table->establishConnection('r');
$indexes = array();
foreach ($locks as $lock) {
$indexes[] = PhabricatorHash::digestForIndex($lock);
}
return id(new DrydockSlotLock())->loadAllWhere(
'lockIndex IN (%Ls)',
$indexes);
}
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
/* -( Acquiring and Releasing Locks )-------------------------------------- */
/**
* Acquire a set of slot locks.
*
* This method either acquires all the locks or throws an exception (usually
* because one or more locks are held).
*
* @param phid Lock owner PHID.
* @param list<string> List of locks to acquire.
* @return void
* @task locks
*/
public static function acquireLocks($owner_phid, array $locks) {
if (!$locks) {
return;
}
$table = new DrydockSlotLock();
$conn_w = $table->establishConnection('w');
$sql = array();
foreach ($locks as $lock) {
$sql[] = qsprintf(
$conn_w,
'(%s, %s, %s)',
$owner_phid,
PhabricatorHash::digestForIndex($lock),
$lock);
}
try {
queryfx(
$conn_w,
'INSERT INTO %T (ownerPHID, lockIndex, lockKey) VALUES %Q',
$table->getTableName(),
implode(', ', $sql));
} catch (AphrontDuplicateKeyQueryException $ex) {
// Try to improve the readability of the exception. We might miss on
// this query if the lock has already been released, but most of the
// time we should be able to figure out which locks are already held.
$held = self::loadHeldLocks($locks);
$held = mpull($held, 'getOwnerPHID', 'getLockKey');
In Harbormaster, make sure artifacts are destroyed even if a build is aborted Summary: Ref T9252. Currently, Harbormaster and Drydock work like this in some cases: # Queue a lease for activation. # Then, a little later, save the lease PHID somewhere. # When the target/resource is destroyed, destroy the lease. However, something can happen between (1) and (2). In Drydock this window is very short and the "something" would have to be a lighting strike or something similar, but in Harbormaster we wait until the resource activates to do (2) so the window can be many minutes long. In particular, a user can use "Abort Build" during those many minutes. If they do, the target is destroyed but it doesn't yet have a record of the artifact, so the artifact isn't cleaned up. Make these things work like this instead: # Create a new lease and pre-generate a PHID for it. # Save that PHID as something that needs to be cleaned up. # Queue the lease for activation. # When the target/resource is destroyed, destroy the lease if it exists. This makes sure there's no step in the process where we might lose track of a lease/resource. Also, clean up and standardize some other stuff I hit. Test Plan: - Stopped daemons. - Restarted a build in Harbormaster. - Stepped through the build one stage at a time using `bin/worker execute ...`. - After the lease was queued, but before it activated, aborted the build. - Processed the Harbormaster side of things only. - Saw the lease get destroyed properly. Reviewers: chad, hach-que Reviewed By: hach-que Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14234
2015-10-05 14:58:53 +02:00
throw new DrydockSlotLockException($held);
}
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
}
/**
* Release all locks held by an owner.
*
* @param phid Lock owner PHID.
* @return void
* @task locks
*/
public static function releaseLocks($owner_phid) {
$table = new DrydockSlotLock();
$conn_w = $table->establishConnection('w');
queryfx(
$conn_w,
'DELETE FROM %T WHERE ownerPHID = %s',
$table->getTableName(),
$owner_phid);
}
}