Summary: Ref T13682. Many subclasses of "CursorPagedPolicyAwareQuery" have the same implementation of "loadPage()", and this is a sensible default behavior.
Test Plan: Examined changes to verify that all removed methods have the same behavior.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13682
Differential Revision: https://secure.phabricator.com/D21838
Summary: Ref T13677. This was an accidental change in D21807: when reclaiming a resource, wait for it to be completely destroyed before allowing a lease to reclaim another resource.
Test Plan: Reverts accidental behavior change in D21807.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13677
Differential Revision: https://secure.phabricator.com/D21809
Summary:
Ref T13677. Drydock has a hard-coded and fairly arbitrary limit which prevents a resource pool from growing more than 25% at once.
This is vaguely reasonable for resources which allocate quickly, but suffocating for slower resources. It's also wholly arbitrary, and the "one per lease" limit introduced in D21807 should do a better job of covering the same ground while generally being more reasonable, expected, and predictable.
Test Plan: Ran Drydock allocations without the throttle, saw faster pool growth.
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13677
Differential Revision: https://secure.phabricator.com/D21808
Summary:
Ref T13677. Currently, one lease may cause multiple resources to allocate simultaneously if it starts allocating one, then wakes up from a yield later on and still sees no available resources.
This is never desired -- or, at least, produces desirable behavior only entirely by accident. Normally, it causes an excess of resources to allocate.
This is not a catastrophic problem: the extra resources usually get used sooner or later or cleaned up; and the total amount of badness is limited by overall resource allocation limits.
However, this behavior is also suppressed by an artificial "25% of current pool size" growth limit throttle which I intend to remove. Removing this throttle without fixing the allocator behavior could make this "too many resources" problem worse.
Change the allocator so that a lease that has started allocating a resource won't allocate another resource until the first resource leaves the "pending" state.
This also fixes some general oddness with the allocator and attempts to simplify the structure.
Test Plan:
- Ran 8 taskmasters.
- Destroyed all resources and leases.
- Leased 4 working copies.
- Saw exactly 4 resources build and lease, all simultaneously.
- Destroyed all resources and leases.
- Leased 32 working copies.
- Saw exactly 32 resources build and lease, approximately 8 at a time (limited by taskmasters).
- Destroyed all leases (but not resources).
- Leased 32 working copies, saw them satisfied by existing resources.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13677
Differential Revision: https://secure.phabricator.com/D21807
Summary:
Ref T13677. Track which resources a given lease has begun allocating or reclaiming in a more formal way, and add logging for waiting actions.
The "allocating" mechanism is new. This will replace an existing similar "reclaiming" mechanism in a future change.
Test Plan: See followup changes.
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13677
Differential Revision: https://secure.phabricator.com/D21806
Summary: Ref T13677. These flags increase the convenience of testing in a development environment.
Test Plan: Used new "--all" flags to release all resources and leases.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13677
Differential Revision: https://secure.phabricator.com/D21805
Summary:
Ref T13676. The 3-minute grace period when a resource can not be reclaimed after its leases are released currently doesn't work reliably because the Resource object usually isn't actually updated when a lease is released.
Add an additional check for recently-destroyed leases, and extend the grace period if we find any.
Test Plan:
- See T13676. Ran reproduction sequence there, observed immediate resource reclamation.
- Applied patch.
- Ran sequence again, observed repository B wait 3 minutes to reclaim a repository A resource.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13676
Differential Revision: https://secure.phabricator.com/D21803
Summary:
Ref T13676. Currently, "bin/drydock lease" just creates a lease that permits any blueprint.
To prepare for "use specific blueprint X", unify the logic between this workflow and LeaseUpdateWorker so we select only blueprints which at least have coarse compatibility (e.g., if we're leasing a host, only select enabled blueprints of classes that can allocate hosts).
Test Plan: Used `bin/drydock lease` to try to lease a nonsense type, got sensible error. Leased a host.
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13676
Differential Revision: https://secure.phabricator.com/D21801
Summary: Ref T13676. This makes it easier to create resource pressure without juggling a big pile of terminals.
Test Plan: Used `bin/drydock lease --count 5 ...` to acquire 5 leases.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13676
Differential Revision: https://secure.phabricator.com/D21800
Summary: Ref T13676. When the required "repositories.map" attribute is omitted, `bin/drydock lease` currently fatals in an unhelpful way when trying to lease a working copy.
Test Plan:
Ran `bin/drydock lease --type working-copy` with no attributes, after following steps in T13676.
```
<Allocation Failed> One or more blueprints promised a new resource, but failed when allocating: [PhutilAggregateException] All blueprints failed to allocate a suitable new resource when trying to allocate lease ("PHID-DRYL-orbtwtlinksm3xqpyhmw").
- Exception: Working copy lease is missing required attribute "repositories.map".
Attribute "repositories.map" should be a map of repository specifications.
```
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13676
Differential Revision: https://secure.phabricator.com/D21796
Summary: Ref T13676. Ref T13588. Fix some issues that prevent "bin/phd" and "bin/drydock" from executing under PHP 8.1, broadly because `null` is being passed to `strlen()`.
Test Plan: Ran `bin/phd debug task` and `bin/drydock ...` under PHP 8.1.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13676, T13588
Differential Revision: https://secure.phabricator.com/D21795
Summary: Ref T13641. Make "active bindings" a real query and make callers that only care about active bindings only query for active bindings.
Test Plan:
- Queried for "bindings" and "activeBindings" via Conduit.
- Disabled/enabled devices, saw binding status update in UI.
- Loaded Diffusion cluster layout.
- Grepped for `needBindings()`, `getActiveBindings()`, etc.
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13641
Differential Revision: https://secure.phabricator.com/D21628
Summary: See PHI1885. Repository operations are queryable by state and author, but neither column has a usable key. Add usable keys.
Test Plan: Ran EXPLAIN on a state query. Ran `bin/storage upgrade`. Ran EXPLAIN again, saw query go from a table scan to a `const` key lookup.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D21465
Summary: Depends on D20931. Ref T13362. Move all "Console"-style interfaces to use a consistent layout based on a new "LauncherView" which just centers the content.
Test Plan: Viewed all affected interfaces.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13362
Differential Revision: https://secure.phabricator.com/D20933
Summary: Fixes T13383. Provide a basic "drydock.resource.search". Also allow "drydock.lease.search" to be queried by resource PHID.
Test Plan: Called "drydock.resource.search" and "drydock.lease.search" with various constraints.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13383
Differential Revision: https://secure.phabricator.com/D20723
Summary:
Ref T13289. See <https://discourse.phabricator-community.org/t/fatal-error-in-pagination-in-drydock-resources-host-logs-all-logs/2735>.
`bin/drydock lease` and the web UI for reviewing all object logs when there is more than one page of logs didn't get fully updated to the new cursors.
- Use a cursor pager in `bin/drydock lease`.
- Implement `withIDs()` in `LeaseQuery` so the default paging works properly.
Test Plan:
- Ran `bin/drydock lease`, got a lease with log output along the way.
- Set page size to 2, viewed host logs with multiple pages, paged to page 2.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13289
Differential Revision: https://secure.phabricator.com/D20553
Summary:
Ref T13121. When you connect to a host with SSH, don't already know the host key, and don't have strict host key checking, it prints "Permanently adding host X to known hosts". This is super un-useful.
In a perfect world, we'd probably always have strict host key checking, but this is a significant barrier to configuration/setup and I think not hugely important (MITM attacks against SSH hosts are hard/rare and probably not hugely valuable). I'd imagine a more realistic long term approach is likely optional host key checking.
For now, try using `LogLevel=ERROR` instead of `LogLevel=quiet` to suppress this error. This should be strictly better (since at least some messages we want to see are ERROR or better), although it may not be perfect (there may be other INFO messages we would still like to see).
Test Plan:
- Ran `ssh -o LogLevel=... -o 'StrictHostKeyChecking=no' -o 'UserKnownHostsFile=/dev/null'` with bad credentials, for "ERROR", "quiet", and default ("INFO") log levels.
- With `INFO`, got a warning about adding the key, then an error about bad credentials (bad: don't want the key warning).
- With `quiet`, got nothing (bad: we want the credential error).
- With `ERROR`, got no warning but did get an error (good!).
Not sure this always gives us exactly what we want, but it seems like an improvement over "quiet".
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13121
Differential Revision: https://secure.phabricator.com/D20240
Summary:
Depends on D19919. Ref T11351. This method appeared in D8802 (note that "get...Object" was renamed to "get...Transaction" there, so this method was actually "new" even though a method of the same name had existed before).
The goal at the time was to let Harbormaster post build results to Diffs and have them end up on Revisions, but this eventually got a better implementation (see below) where the Harbormaster-specific code can just specify a "publishable object" where build results should go.
The new `get...Object` semantics ultimately broke some stuff, and the actual implementation in Differential was removed in D10911, so this method hasn't really served a purpose since December 2014. I think that broke the Harbormaster thing by accident and we just lived with it for a bit, then Harbormaster got some more work and D17139 introduced "publishable" objects which was a better approach. This was later refined by D19281.
So: the original problem (sending build results to the right place) has a good solution now, this method hasn't done anything for 4 years, and it was probably a bad idea in the first place since it's pretty weird/surprising/fragile.
Note that `Comment` objects still have an unrelated method with the same name. In that case, the method ties the `Comment` storage object to the related `Transaction` storage object.
Test Plan: Grepped for `getApplicationTransactionObject`, verified that all remaining callsites are related to `Comment` objects.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T11351
Differential Revision: https://secure.phabricator.com/D19920
Summary:
Depends on D19918. Ref T11351. In D19918, I removed all calls to this method. Now, remove all implementations.
All of these implementations just `return $timeline`, only the three sites in D19918 did anything interesting.
Test Plan: Used `grep willRenderTimeline` to find callsites, found none.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T11351
Differential Revision: https://secure.phabricator.com/D19919
Summary:
Ref T13222. Fixes T12588. See PHI683. In several cases, we present the user with a choice between multiple major options: Alamnac service types, Drydock blueprint types, Repository VCS types, Herald rule types, etc.
Today, we generally do this with radio buttons and a "Submit" button. This isn't terrible, but often it means users have to click twice (once on the radio; once on submit) when a single click would be sufficient. The radio click target can also be small.
In other cases, we have a container with a link and we'd like to link the entire container: notifications, the `/drydock/` console, etc. We'd like to just link the entire container, but this causes some problems:
- It's not legal to link block eleements like `<a><div> ... </div></a>` and some browsers actually get upset about it.
- We can `<a><span> ... </span></a>` instead, then turn the `<span>` into a block element with CSS -- and this sometimes works, but also has some drawbacks:
- It's not great to do that for screenreaders, since the readable text in the link isn't necessarily very meaningful.
- We can't have any other links inside the element (e.g., details or documentation).
- We can `<form><button> ... </button></form>` instead, but this has its own set of problems:
- You can't right-click to interact with a button in the same way you can with a link.
- Also not great for screenreaders.
Instead, try adding a `linked-container` behavior which just means "when users click this element, pretend they clicked the first link inside it".
This gives us natural HTML (real, legal HTML with actual `<a>` tags) and good screenreader behavior, but allows the effective link target to be visually larger than just the link.
If no issues crop up with this, I'd plan to eventually use this technique in more places (Repositories, Herald, Almanac, Drydock, Notifications menu, etc).
Test Plan:
{F6053035}
- Left-clicked and command-left-clicked the new JS fanciness, got sensible behaviors.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13222, T12588
Differential Revision: https://secure.phabricator.com/D19855
Summary: Depends on D19830. Ref T13216. See PHI908. See PHI750. See PHI885. Allow users to configure a filesize limit, and allow them to adjust the clone/fetch timeout.
Test Plan:
{F6021356}
- Configured a filesize limit and pushed, hit it. Made the limit larger and pushed, change went through.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19831
Summary:
Ref T13222. See PHI683. Currently, you can "Change subtype..." via Conduit and the bulk editor, but not via the comment action stack or edit forms.
In PHI683 an install is doing this often enough that they'd like it to become a first-class action. I've generally been cautious about pushing this action to become a first-class action (there are some inevitable rough edges and I don't want to add too much complexity if there isn't a use case for it) but since we have evidence that users would find it useful and nothing has exploded yet, I'm comfortable taking another step forward.
Currently, `EditEngine` has this sort of weird `setIsConduitOnly()` method. This actually means more like "this doesn't show up on forms". Make it better align with that. In particular, a "conduit only" field can already show up in the bulk editor, which is goofy. Change this to `setIsFormField()` and convert/simplify existing callsites.
Test Plan:
There are a lot of ways to reach EditEngine so this probably isn't entirely exhaustive, but I think I got pretty much anything which is likely to break:
- Searched for `setIsConduitOnly()` and `getIsConduitOnly()`, converted all callsites to `setIsFormField()`.
- Searched for `setIsLockable()`, `setIsReorderable()` and `setIsDefaultable()` and aligned these calls to intent where applicable.
- Created an Almanac binding.
- Edited an Almanac binding.
- Created an Almanac service.
- Edited an Almanac service.
- Edited a binding property.
- Deleted a binding property.
- Created and edited a badge.
- Awarded and revoked a badge.
- Created and edited an event.
- Made an event recurring.
- Created and edited a Conpherence thread.
- Edited and updated the diff for a revision.
- Created and edited a repository.
- Created and disabled repository URIs.
- Created and edited a blueprint.
- Created and edited tasks.
- Created a paste, edited/archived a paste.
- Created/edited/archived a package.
- Created/edited a project.
- Made comments.
- Moved tasks on workboards via comment action stack.
- Changed task subtype via comment action stack.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13222
Differential Revision: https://secure.phabricator.com/D19842
Summary:
Depends on D19814. Ref T13216. See PHI885. For various eldritch reasons, `git fetch` can hang. Although we'd probably like to fix this with `git fetch --require-sustained-network-transfer-rate=512KB/5s` or similar, that flag doesn't exist and we don't have a reasonable way to build it.
Short of that, move toward formalizing a repository "copy time limit": the longest amount of time anything may spend trying to make a copy of this repository.
This grows out of the existing intracluster sync limit, which is effectively the same thing. Here, apply it to `git clone` and `git fetch` in Drydock working copy construction, too. A future change may make it configurable.
Test Plan:
- Set the limit to 0.001.
- Tried to build and lease working copies, got sensible timeout errors (see D19815).
```
<Activation Failed> Lease activation failed: [CommandException] Command killed by timeout after running for more than 0.001 seconds.
COMMAND
ssh '-o' 'LogLevel=quiet' '-o' 'StrictHostKeyChecking=no' '-o' 'UserKnownHostsFile=/dev/null' '-o' 'BatchMode=yes' -l '********' -p '2222' -i '********' '127.0.0.1' -- '(cd '\''/var/drydock/workingcopy-163/repo/spellbook/'\'' && git clean -d --force && git fetch && git reset --hard)'
```
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19816
Summary:
Ref T13216. When a repository is clustered, we run this cleanup code (to tell the repository to update, and log some timing information) on both nodes. Currently, we do slightly too much work, which is unnecessary and can be a bit confusing to human readers.
The double update message doesn't hurt anything, but there's no reason to write it twice.
Likewise, the second timing information update query doesn't do anything: there's no PushEvent object with the right identifier, so it just updates nothing. We don't need to run it, and it's confusing that we do.
Instead, only do these writes if we're actually the final node with the repository on it.
Test Plan: Added some logging, saw double writes/updates before the change and no doubles afterwards, with no other behavioral changes.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19778
Summary:
Fixes T12145. Ref T13210. See PHI570. See PHI536.
Currently, when you give Drydock an Almanac host pool with more than one host, it never voluntarily builds a second host resource: there is no way to say "maximum X working copies per host" (only "maximum X global working copies") to make the first host overflow, and the allocator tries to pack resources as tightly as possible.
If you can force it to allocate the 2nd..Nth host, things will work reasonably well from there (it will spread working copies across the hosts randomly), but tricking it is very hard, especially before D19761.
To deal with this, give blueprints a new behavior around "supplemental allocations". The idea here is that a blueprint may decide that it would prefer to allocate a fresh new resource instead of allowing an otherwise valid acquisition to occur.
These supplemental allocations follow all the normal allocation rules (they can't exceed limits or actually replace existing resources), so they can only happen if there's free space in the resource pool. But a blueprint can elect for a supplemental allocation to provide a "grow the pool" hint.
The only useful policies here are probably "true" (immediately use all resources, like Almanac) or "false" (pack resources as efficiently as possible) but some other policies //might// be useful (perhaps "start growing the pool when we're getting a bit full even if we aren't at the limit yet, since our workload is bursty").
Then, give Almanac host resources a "true" policy (always allocate supplemental resources) so they use all hosts once a similar number of concurrent jobs arrive.
One aspect of this approach is that we only do supplemental resources if the normal allocation algorithm already decided that the best resource to acquire was part of the same blueprint. I started with an approach like "look at all the blueprints and see if any of them want to be greedy", but then a not-very-desirable blueprint would end up filling up its whole pool before we skipped the supplemental allocation part and ended up picking a different resource. That felt a bit silly and this feels a little cleaner and more focused.
Test Plan:
- Without changing the Almanac blueprint policy, allocated hosts. Got A, A, A, A, ... (second host never used).
- Changed the Almanac policy.
- Allocated hosts, got A, B, random mix of A and B.
- Destroyed B. Destroyed all leases on A. Allocated. Got A. This tests the "don't build a supplemental resource if there are no leases on the natural resource".
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210, T12145
Differential Revision: https://secure.phabricator.com/D19762
Summary:
Ref T13210. Ref T12145. The "Almanac Host" blueprint currently hands out new leases on a given host even if the binding has been disabled.
Although there are some more complicated cases here (e.g., involving cleanup of the existing resource and existing leases), this one seems clear cut: if the binding has been disabled, we should stop handing out new leases on it.
Test Plan:
- Created a service with two hosts.
- Requested a lease, got host A.
- Requested more leases, always got host A (we never build a new host when we don't have to, and we currently never have to).
- Disabled the binding to host A.
- Requested a lease.
- Before patch: got host A.
- After patch: got host B.
- Also disabled the other binding to host B, requested a lease, got an indefinite wait for resources (which is expected and reasonable).
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210, T12145
Differential Revision: https://secure.phabricator.com/D19761
Summary:
See T13212 for some context and discussion on this being revived.
See T11694 for original context.
Add a query constraint for lease owners and implement the conduit search method for Drydock leases.
Ref T11694. Fixes T13212.
Test Plan:
- Called the API method from conduit and browsed lease queries from the UI.
- Used the new "ownerPHIDs" constraint via API console.
{F5963044}
Reviewers: yelirekim, amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam, epriestley
Maniphest Tasks: T11694, T13212
Differential Revision: https://secure.phabricator.com/D16594
Summary:
Depends on D19753. Ref T13210. This is a small optimization that saves us from waiting up to 15 seconds for a yield.
When there are no Working Copy resources and a new lease comes in, we'll allocate one and yield until it activates.
If activating it (SSH'ing and running `git clone`) takes less than 15 seconds, the resource will activate (say, at T+4) but the lease won't update again for a while (say, until T+15). This leaves us with a pointless wait (in this example, we're sitting around for 9 seconds when we could move forward).
To improve this a little bit, let resources wake up the lease update tasks that triggered allocation after they activate. In the best case, that task runs ~15 seconds sooner. In the worst case, the awaken is just a no-op.
With a more-full queue, this has a smaller effect (it's likely something else will run and be able to use the resource in those 9 seconds).
With already-activated resources, this has no effect (when resources are already activated, we can lease immediately).
Test Plan:
- Cleaned up all working copy resources.
- Requested a new "A" working copy.
- Before patch: got a working copy after 17-18 seconds, most of which was spent yielded.
- After patch: got a working copy after 3-4 seconds.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210
Differential Revision: https://secure.phabricator.com/D19754
Summary:
Depends on D19752. Ref T13210. If resources take a long time to reclaim/destroy (normally, more than 15 seconds) a single new lease may update several times during the reclaim/destroy process and end up reclaiming multiple resources.
Instead: after a lease triggers a reclaim, prevent it from triggering another reclaim as long as the resource it is reclaiming hasn't finished its reclaim/destroy cycle. Basically, each lease only gets to destroy one resource at a time.
Test Plan:
- Added a `sleep(120)` to `destroyResource()` to simulate a long reclaim/destroy cycle.
- Allocated A, A, A working copies. Leased a B working copy.
- Before patch: saw "B" lease destroy all three "A" working copies after ~0, ~15, and ~30 seconds, then build a new "B" resource after ~120 seconds (when the first reclaim/destroy finished).
- After patch: saw "B" lease destroy one "A" working copy after ~0 seconds, then wait patiently until it finished up, then build a new "B" resource.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210
Differential Revision: https://secure.phabricator.com/D19753
Summary:
Depends on D19751. Ref T13210. When Drydock needs to reclaim an existing unused resource in order to build a new resource to satisfy a lease, the lease which triggered the reclaim currently gets thrown back into the pool with a 15-second yield.
If the queue is pretty empty and the reclaim is quick, this can mean that we spend up to 15 extra seconds just waiting for the lease update task to get another shot at building a resource (the resource reclaim may complete in a second or two, but nothing else happens until the yield ends).
Instead, when a lease triggers a reclaim, have the reclaim reawaken the lease task when it completes. In the best case, this saves us 15 seconds of waiting. In other cases (the task already completed some other way, the resource gets claimed before the lease gets to it), it's harmless.
Test Plan:
- Allocated A, A, A working copies with limit 3. Leased a B working copy.
- Before patch: allocation took ~32 seconds.
- After patch: allocation takes ~17 seconds (i.e., about 15 seconds less).
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210
Differential Revision: https://secure.phabricator.com/D19752
Summary:
Depends on D19750. See T13210. The `bin/drydock lease` command makes it easier to request ad-hoc leases, but currently takes lease attributes in the form `--attributes x=y,a=b`.
This was okay for all leases at the time, but doesn't really work for modern WorkingCopy resources since they take a `repositories.map` which has a dictionary as a value. You can't specify that with `repositories.map=...`.
Instead, point `--attributes` at a JSON file or use `--attributes -` to read from stdin.
Test Plan: Used `--attributes` with a file and stdin to allocate working copy leases with repositories.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D19751
Summary:
Ref T13210. We currently "git reset --hard HEAD" during working copy leasing, mostly by convention/familiarity.
However, this command does not work in an empty repository, because there is no HEAD yet.
The command "git reset --hard" appears to have the same meaning and effect in all cases, except that it also works correctly in an empty repository.
The manual suggests that omitting HEAD should be the same as specifying HEAD, too:
> The <tree-ish>/<commit> defaults to HEAD in all forms.
Test Plan: Successfully leased a working copy for an empty repository using Drydock.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13210
Differential Revision: https://secure.phabricator.com/D19750
Summary:
Depends on D19672. Ref T13197. See PHI873. This writes a trivial log when we begin acting on a working copy and makes it look reasonable in the UI.
This is mostly just to prove that logging works properly.
Test Plan: {F5887697}
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13197
Differential Revision: https://secure.phabricator.com/D19673
Summary:
Depends on D19671. Ref T13197. See PHI873.
Expose logs in the RepositoryOperation UI. Nothing writes the logs yet, so these interfaces are currently always empty.
Test Plan:
{F5887102}
{F5887103}
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13197
Differential Revision: https://secure.phabricator.com/D19672
Summary: Ref T13197. See PHI873. I want to give RepositoryOperation objects access to Drydock logging like leases, resources, and blueprints currently have. This just does the schema/query changes, no actual UI or new logging yet.
Test Plan: Ran storage upgrade, poked around the UI looking for anything broken.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13197
Differential Revision: https://secure.phabricator.com/D19671
Summary: Ref T13195. See PHI845. For custom OperationTypes, provide access to the Interface and Operation via getters. This is just for convenience, since passing these around everywhere can be a bit of a pain if you have a deep-ish stack of things or love using callbacks or whatever.
Test Plan: Landed a revision via upstream Drydock operations.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13195
Differential Revision: https://secure.phabricator.com/D19636
Summary: Ref T13189. See PHI690. When a lease is first acquired or activated, note the time. This supports better visibility into queue lengths. For now, this is only queryable via DB and visible in the UI, but can be more broadly exposed in the future.
Test Plan: Landed a revision, saw the leases get sensible timestamps for acquisition/activation.
Reviewers: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13189
Differential Revision: https://secure.phabricator.com/D19613
Summary:
Ref T13164. See PHI788. The issue requests a "created" timestamp.
Also add filtering for repository, state, and author.
Test Plan:
Used all filters.
{F5795085}
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13164
Differential Revision: https://secure.phabricator.com/D19574
Summary: See PHI513. `fprintf()` takes `(thing, pattern, args, ...)` but we aren't passing a `pattern`, so if the command returns a "%" in the output we get an error.
Test Plan:
- Installed `bytes`, a great useful program which prints all the bytes, on my HoaxOS(tm) system (see D19102).
```
epriestley@orbital ~/dev/phabricator $ ./bin/drydock command --lease 76287 -- bytes # Before patch.
[2018-03-29 02:09:08] ERROR 2: fprintf(): Too few arguments at [/Users/epriestley/dev/core/lib/phabricator/src/applications/drydock/management/DrydockManagementCommandWorkflow.php:60]
arcanist(head=experimental, ref.master=b8c9c385a7f5, ref.experimental=925c60e7b837), corgi(head=master, ref.master=6371578c9d32), instances(head=master, ref.master=d983b9517924), ledger(head=master, ref.master=4da4a24b8779), libcore(), phabricator(head=hoax1, ref.master=b586ee065a75, ref.hoax1=f8d7480bbdd1, custom=4), phutil(head=master, ref.master=1ad42491e44a), secure(head=master, ref.master=988cf9bd7958), services(head=master, ref.master=6b3fb8d8dd0a)
#0 fprintf(resource, string) called at [<phabricator>/src/applications/drydock/management/DrydockManagementCommandWorkflow.php:60]
#1 DrydockManagementCommandWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:441]
#2 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:333]
#3 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/drydock/drydock_control.php:21]
epriestley@orbital ~/dev/phabricator $ ./bin/drydock command --lease 76287 -- bytes # After patch.
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
```
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D19264
Summary: Ref T13073. The new log output from `bin/drydock lease` currently uses HTML handle rendering, but should render to text.
Test Plan: Ran `bin/drydock lease` and saw normal text in log output. Viewed the same logs from the web UI and saw HTML.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19101
Summary:
Depends on D19078. Ref T13073. Currently, there is a narrow window where we can acquire a resource after a reclaim has started against it.
To prevent this, briefly lock resources before acquiring them and make sure they're still good. If a resource isn't good, throw the lease back in the pool.
Test Plan:
This is tricky. You need:
- Hoax blueprint with limits and a rule where leases of a given "flavor" can only be satisfied by resources of the same flavor.
- Reduce the 3-minute "wait before resources can be released" to 3 seconds.
- Limit Hoaxes to 1.
- Allocate one "cherry" flavored Hoax and release the lease.
- Add a `sleep(15)` to `releaseResource()` in `DrydockResourceUpdateWorker`, after the `canReclaimResource()` check, with a `print`.
Now:
- Run `bin/phd debug task` in two windows.
- Run `bin/drydock lease --type host --attributes flavor=banana` in a third window.
- This will start to reclaim the existing "cherry" resource. Once one of the `phd` windows prints the "RECLAIMING" message run `bin/drydock lease --type host --attributes flavor=cherry` in a fourth window.
- Before patch: the "cherry" lease acquired immediately, then was released and destroyed moments later.
- After patch: the "cherry" lease yields.
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19080
Summary:
Depends on D19077. Ref T13073. When we're using slot locks to enforce a limit (e.g., maximum of 5 simultaneous things) we currently load locks owned by the blueprint to identify which slots are likely to be free.
However, this isn't right: the blueprint doesn't own these locks. The resources do.
We still get the right behavior eventually, but we incorrectly identify that every slot lock is always free, so as the slots fill up we'll tend to guess wrong more and more often.
Instead, load the slot locks by name explicitly.
Test Plan: Implemented lock-based limiting on `HoaxBlueprint`, `var_dump()`'d the candidate locks, saw correct test state for locks. Acquired leases without releasing, got all of the slots filled without any slot lock collisions (previously, the last slot or two tended to collide a lot).
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19078
Summary:
Depends on D19076. Ref T13073. Blueprints are stored as an attribute and `setAttributes()` overwrites all attributes.
This is sorta junk but make it less obviously broken, at least.
Test Plan: Ran `bin/drydock lease --type working-copy --attributes x=y` without instantly getting a fatal about "no blueprint PHIDs".
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13073
Differential Revision: https://secure.phabricator.com/D19077