1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-11 08:06:13 +01:00
Commit graph

4 commits

Author SHA1 Message Date
epriestley
ec6d69e74d Give Drydock resources a proper expiry mechanism
Summary:
Fixes T6569. This implements an expiry mechanism for Drydock resources which parallels the mechanism for leases.

A few things are missing that we'll probably need in the future:

  - An "EXPIRES" command to update the expiration time. This would let resources be permanent while leased, then expire after, say, 24 hours without any leases.
  - A callback like `shouldActuallyExpireRightNow()` for resources and leases that lets them decide not to expire at the last second.
  - A callback like `didAcquireLease()` for resource blueprints, to parallel `didReleaseLease()`, letting them clear or extend their timer.

However, this stuff would mostly just let us tune behaviors, not really open up new capabilities.

Test Plan: Changed host resources to expire after 60 seconds, leased one, saw it vanish 60 seconds later.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T6569

Differential Revision: https://secure.phabricator.com/D14176
2015-09-28 09:35:14 -07:00
epriestley
3379904237 Allow Drydock leases to expire after a time limit
Summary: Ref T6569. If a lease is activated with an expiration date, schedule a task to try to clean it up after that time.

Test Plan:
  - Used `bin/drydock lease ... --until ...` to activate a lease in the near future.
  - Waited for a bit.
  - Saw it expire and get destroyed at the scheduled time.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T6569

Differential Revision: https://secure.phabricator.com/D14148
2015-09-23 13:54:27 -07:00
epriestley
1f311d64c6 Give Drydock resources and leases a real "destroy" lifecycle phase
Summary: Ref T9252. Some leases or resources may need to remove data, tear down VMs, etc., during cleanup. After they are released, queue a "destroy" phase for performing teardown.

Test Plan:
  - Used `bin/drydock lease ...` to create a working copy lease.
  - Used `bin/drydock release-lease` and `bin/drydock release-resource` to release the lease and then the working copy and host.
  - Saw working copy and host get destroyed and cleaned up properly.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T6569, T9252

Differential Revision: https://secure.phabricator.com/D14144
2015-09-23 11:20:20 -07:00
epriestley
789df89c84 Add a command queue to Drydock to manage lease/resource release
Summary:
Ref T9252. Broadly, Drydock currently races on releasing objects from the "active" state. To reproduce this:

  - Scatter some sleep()s pretty much anywhere in the release code.
  - Release several times from web UI or CLI in quick succession.

Resources or leases will execute some release code twice or otherwise do inconsistent things.

(I didn't chase down a detailed reproduction scenario for this since inspection of the code makes it clear that there are no meaningful locks or mechanisms preventing this.)

Instead, add a Harbormaster-style command queue to resources and leases. When something wants to do a release, it adds a command to the queue and schedules a worker. The workers acquire a lock, then try to consume commands from the queue.

This guarantees that only one process is responsible for writes to active resource/leases.

This is the last major step to giving resources and leases a single writer during all states:

  - Resource, Unsaved: AllocatorWorker
  - Resource, Pending: ResourceWorker (Possible rename to "Allocated?")
  - Resource, Open: This diff, ResourceUpdateWorker. (Likely rename to "Active").
  - Resource, Closed/Broken: Future destruction worker. (Likely rename to "Released" / "Broken"; maybe remove "Broken").
  - Resource, Destroyed: No writes.
  - Lease, Unsaved: Whatever wants the lease.
  - Lease, Pending: AllocatorWorker
  - Lease, Acquired: LeaseWorker
  - Lease, Active: This diff, LeaseUpdateWorker.
  - Lease, Released/Broken: Future destruction worker (Maybe remove "Broken"?)
  - Lease, Expired: No writes. (Likely rename to "Destroyed").

In most phases, we can already guarantee that there is a single writer without doing any extra work. This is more complicated in the "Active" case because the release buttons on the web UI, the release tools on the CLI, the lease requestor itself, the garbage collector, and any other release process cleaning up related objects may try to effect a release. All of these could race one another (and, in many cases, race other processes from other phases because all of these get to act immediately) as this code is currently written. Using a queue here lets us make sure there's only a single writer in this phase.

One thing which is notable is that whatever acquires a lease **can not write to it**! It is never the writer once it queues the lease for activation. It can not write to any resources, either. And, likewise, Blueprints can not write to resources while acquiring or releasing leases.

We may need to provide a mechinism so that blueprints and/or resource/lease holders get to attach some storage to resources/leases for bookkeeping. For example, a blueprint might need to keep some kind of cache on a resource to help it manage state. But I think we can cross that bridge when we come to it, and nothing else would need to write to this storage so it's technically straightforward to introduce such a mechanism if we need one.

Test Plan:
  - Viewed buttons in web UI, checked enabled/disabled states.
  - Clicked the buttons.
  - Saw commands show up in the command queue.
  - Saw some daemon stuff get scheduled.
  - Ran CLI tools, saw commands get consumed and resources/leases release.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14143
2015-09-23 07:42:08 -07:00