Summary:
Depends on D19078. Ref T13073. Currently, there is a narrow window where we can acquire a resource after a reclaim has started against it.
To prevent this, briefly lock resources before acquiring them and make sure they're still good. If a resource isn't good, throw the lease back in the pool.
Test Plan:
This is tricky. You need:
- Hoax blueprint with limits and a rule where leases of a given "flavor" can only be satisfied by resources of the same flavor.
- Reduce the 3-minute "wait before resources can be released" to 3 seconds.
- Limit Hoaxes to 1.
- Allocate one "cherry" flavored Hoax and release the lease.
- Add a `sleep(15)` to `releaseResource()` in `DrydockResourceUpdateWorker`, after the `canReclaimResource()` check, with a `print`.
Now:
- Run `bin/phd debug task` in two windows.
- Run `bin/drydock lease --type host --attributes flavor=banana` in a third window.
- This will start to reclaim the existing "cherry" resource. Once one of the `phd` windows prints the "RECLAIMING" message run `bin/drydock lease --type host --attributes flavor=cherry` in a fourth window.
- Before patch: the "cherry" lease acquired immediately, then was released and destroyed moments later.
- After patch: the "cherry" lease yields.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19080
Summary:
Depends on D19075. Ref T13073. If a lease acquires a resource but finds that the resource builds directly into a dead state (which can happen for any number of reasonable reasons), reset the lease and throw it back in the pool.
This isn't the lease's fault and it hasn't caused any side effects or done anything we can't undo, so we can safely reset it and throw it back in the pool.
Test Plan:
- Created a blueprint which throws from `allocateResource()` so that resources never activate.
- Tried to lease it with `bin/drydock lease ...`.
- Before patch: lease was broken and destroyed after a failed activation.
- After patch: lease was returned to the pool after a failed activation.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19076
Summary: Ref T10093. Show better errors when a commit fails because it has already been merged and when a fetch fails because the ref isn't present in the remote.
Test Plan:
{F1160794}
{F1160795}
Reviewers: chad
Reviewed By: chad
Subscribers: michaeljs1990, yelirekim
Maniphest Tasks: T10093
Differential Revision: https://secure.phabricator.com/D15420
Summary: Ref T9252. Add a bit more logging and improve some behaviors.
Test Plan: Restarted a build, got a good result.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14210
Summary:
Ref T9253. For resources and leases that need to do something which takes a lot of time or requires waiting, allow them to allocate/acquire first and then activate later.
When we allocate a resource or acquire a lease, the blueprint can either activate it immediately (if all the work can happen quickly/inline) or activate it later. If the blueprint activates it later, we queue a worker to handle activating it.
Rebuild the "working copy" blueprint to work with this model: it allocates/acquires and activates in a separate step, once it is able to acquire a host.
Test Plan: With some power of imagination, brought up a bunch of working copies with `bin/drydock lease --type working-copy ...`
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14127