1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-29 02:02:41 +01:00
phorge-phorge/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php

263 lines
7.6 KiB
PHP
Raw Normal View History

Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
<?php
final class DrydockAlmanacServiceHostBlueprintImplementation
extends DrydockBlueprintImplementation {
private $services;
private $freeBindings;
public function isEnabled() {
$almanac_app = 'PhabricatorAlmanacApplication';
return PhabricatorApplication::isClassInstalled($almanac_app);
}
public function getBlueprintName() {
return pht('Almanac Hosts');
}
public function getDescription() {
return pht(
'Allows Drydock to lease existing hosts defined in an Almanac service '.
'pool.');
}
public function canAnyBlueprintEverAllocateResourceForLease(
DrydockLease $lease) {
return true;
}
public function canEverAllocateResourceForLease(
DrydockBlueprint $blueprint,
DrydockLease $lease) {
$services = $this->loadServices($blueprint);
$bindings = $this->loadAllBindings($services);
if (!$bindings) {
// If there are no devices bound to the services for this blueprint,
// we can not allocate resources.
return false;
}
return true;
}
public function canAllocateResourceForLease(
DrydockBlueprint $blueprint,
DrydockLease $lease) {
// We will only allocate one resource per unique device bound to the
// services for this blueprint. Make sure we have a free device somewhere.
$free_bindings = $this->loadFreeBindings($blueprint);
if (!$free_bindings) {
return false;
}
return true;
}
public function allocateResource(
DrydockBlueprint $blueprint,
DrydockLease $lease) {
$free_bindings = $this->loadFreeBindings($blueprint);
shuffle($free_bindings);
$exceptions = array();
foreach ($free_bindings as $binding) {
$device = $binding->getDevice();
$device_name = $device->getName();
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
$binding_phid = $binding->getPHID();
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
$resource = $this->newResourceTemplate($blueprint, $device_name)
->setActivateWhenAllocated(true)
->setAttribute('almanacServicePHID', $binding->getServicePHID())
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
->setAttribute('almanacBindingPHID', $binding_phid)
->needSlotLock("almanac.host.binding({$binding_phid})");
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
try {
return $resource->allocateResource(DrydockResourceStatus::STATUS_OPEN);
} catch (Exception $ex) {
$exceptions[] = $ex;
}
}
throw new PhutilAggregateException(
pht('Unable to allocate any binding as a resource.'),
$exceptions);
}
public function canAcquireLeaseOnResource(
DrydockBlueprint $blueprint,
DrydockResource $resource,
DrydockLease $lease) {
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
// TODO: The current rule is one lease per resource, and there's no way to
// make that cheaper here than by just trying to acquire the lease below,
// so don't do any special checks for now. When we eventually permit
// multiple leases per host, we'll need to load leases anyway, so we can
// reject fully leased hosts cheaply here.
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
return true;
}
public function acquireLease(
DrydockBlueprint $blueprint,
DrydockResource $resource,
DrydockLease $lease) {
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
$resource_phid = $resource->getPHID();
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
$lease
->setActivateWhenAcquired(true)
Implement optimistic "slot locks" in Drydock Summary: See discussion in D10304. There's a lot of context there, but the general idea is: - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase. - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases. The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work: ``` ... ->needSlotLock("mylock(PHID-XYZQ-...)") ... ``` When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do. If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this: ``` mylock(whatever).slot(2) ``` Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks. This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing. (The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.) Test Plan: To create a race where the same binding is allocated as a resource twice: - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated. - (Comment out slot lock acquisition if you have this patch.) - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another. This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free. To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated. Reviewers: hach-que, chad Reviewed By: hach-que, chad Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 13:45:25 +02:00
->needSlotLock("almanac.host.lease({$resource_phid})")
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
->acquireOnResource($resource);
}
public function getType() {
return 'host';
}
public function getInterface(
DrydockBlueprint $blueprint,
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
DrydockResource $resource,
DrydockLease $lease,
$type) {
$viewer = PhabricatorUser::getOmnipotentUser();
switch ($type) {
case DrydockCommandInterface::INTERFACE_TYPE:
$credential_phid = $blueprint->getFieldValue('credentialPHID');
$binding_phid = $resource->getAttribute('almanacBindingPHID');
$binding = id(new AlmanacBindingQuery())
->setViewer($viewer)
->withPHIDs(array($binding_phid))
->executeOne();
if (!$binding) {
// TODO: This is probably a permanent failure, destroy this resource?
throw new Exception(
pht(
'Unable to load binding "%s" to create command interface.',
$binding_phid));
}
$interface = $binding->getInterface();
return id(new DrydockSSHCommandInterface())
->setConfig('credentialPHID', $credential_phid)
->setConfig('host', $interface->getAddress())
->setConfig('port', $interface->getPort());
}
Implement a rough AlmanacService blueprint in Drydock Summary: Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes. This attempts to make language more consistent: resources are "allocated" and leases are "acquired". This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do. In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like: $lease ->setActivateWhenAcquired(true) ->needSlotLock('x') ->needSlotLock('y') ->acquireOnResource($resource); In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward. This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes: - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode. - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster. - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks. Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem). The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier. Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to. This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself: - A bunch of code from that (e.g., interfaces) doesn't exist yet. - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing. This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct. NOTE: This technically works but doesn't do anything useful yet. Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints. Reviewers: hach-que, chad Reviewed By: hach-que, chad Subscribers: Mnkras Maniphest Tasks: T9253 Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 13:43:53 +02:00
}
public function getFieldSpecifications() {
return array(
'almanacServicePHIDs' => array(
'name' => pht('Almanac Services'),
'type' => 'datasource',
'datasource.class' => 'AlmanacServiceDatasource',
'datasource.parameters' => array(
'serviceClasses' => $this->getAlmanacServiceClasses(),
),
'required' => true,
),
'credentialPHID' => array(
'name' => pht('Credentials'),
'type' => 'credential',
'credential.provides' =>
PassphraseSSHPrivateKeyCredentialType::PROVIDES_TYPE,
'credential.type' =>
PassphraseSSHPrivateKeyTextCredentialType::CREDENTIAL_TYPE,
),
) + parent::getFieldSpecifications();
}
private function loadServices(DrydockBlueprint $blueprint) {
if (!$this->services) {
$service_phids = $blueprint->getFieldValue('almanacServicePHIDs');
if (!$service_phids) {
throw new Exception(
pht(
'This blueprint ("%s") does not define any Almanac Service PHIDs.',
$blueprint->getBlueprintName()));
}
$viewer = PhabricatorUser::getOmnipotentUser();
$services = id(new AlmanacServiceQuery())
->setViewer($viewer)
->withPHIDs($service_phids)
->withServiceClasses($this->getAlmanacServiceClasses())
->needBindings(true)
->execute();
$services = mpull($services, null, 'getPHID');
if (count($services) != count($service_phids)) {
$missing_phids = array_diff($service_phids, array_keys($services));
throw new Exception(
pht(
'Some of the Almanac Services defined by this blueprint '.
'could not be loaded. They may be invalid, no longer exist, '.
'or be of the wrong type: %s.',
implode(', ', $missing_phids)));
}
$this->services = $services;
}
return $this->services;
}
private function loadAllBindings(array $services) {
assert_instances_of($services, 'AlmanacService');
$bindings = array_mergev(mpull($services, 'getBindings'));
return mpull($bindings, null, 'getPHID');
}
private function loadFreeBindings(DrydockBlueprint $blueprint) {
if ($this->freeBindings === null) {
$viewer = PhabricatorUser::getOmnipotentUser();
$pool = id(new DrydockResourceQuery())
->setViewer($viewer)
->withBlueprintPHIDs(array($blueprint->getPHID()))
->withStatuses(
array(
DrydockResourceStatus::STATUS_PENDING,
DrydockResourceStatus::STATUS_OPEN,
DrydockResourceStatus::STATUS_CLOSED,
))
->execute();
$allocated_phids = array();
foreach ($pool as $resource) {
$allocated_phids[] = $resource->getAttribute('almanacDevicePHID');
}
$allocated_phids = array_fuse($allocated_phids);
$services = $this->loadServices($blueprint);
$bindings = $this->loadAllBindings($services);
$free = array();
foreach ($bindings as $binding) {
if (empty($allocated_phids[$binding->getPHID()])) {
$free[] = $binding;
}
}
$this->freeBindings = $free;
}
return $this->freeBindings;
}
private function getAlmanacServiceClasses() {
return array(
'AlmanacDrydockPoolServiceType',
);
}
}