Summary: Ref T182. Nothing fancy, just make these slightly easier to work with.
Test Plan: {F884754}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T182
Differential Revision: https://secure.phabricator.com/D14295
Summary: If you `!assign cahd` when you meant to `!assign chad`, we'll hit an "Undefined variable: assign_phid" a little further down.
Test Plan: Eyeballed it. See IRC.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D14291
Summary:
Fixes T9273. Remarkup has reasonably good fundamentals but the API is a giant pain to work with.
Provide a `PHUIRemarkupView` to make it easier. This object is way simpler to use by default.
It's not currently as powerful, but we can expand the power level later by adding more setters.
Eventually I'd expect to replace `PhabricatorRemarkupInterface` and `PhabricatorMarkupOneOff` with this, but no rush on those.
I converted a few callsites as a sanity check that it works OK.
Test Plan:
- Viewed remarkup in Passphrase.
- Viewed remarkup in Badges.
- Viewed a Conduit method.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9273
Differential Revision: https://secure.phabricator.com/D14289
Summary: Ref T182. Make a reasonable attempt to get the commit message, author, and committer data correct.
Test Plan: BEHOLD: rGITTEST810b7f17cd0c909256a45d29a5062fcf417d0489
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T182
Differential Revision: https://secure.phabricator.com/D14280
Summary: Fixes T9568. We just weren't setting this properly so it would default away from the proper value.
Test Plan:
- Edited a credential in a non-default space, edit form populated properly.
- Changed "Space", introduced an error, saved form, got error with sticky value for "Space" properly.
- Saved form with new space value.
- Created a new credential.
Reviewers: chad
Reviewed By: chad
Subscribers: revi
Maniphest Tasks: T9568
Differential Revision: https://secure.phabricator.com/D14278
Summary:
Ref T9252. This fixes a bug from D14236. D14272 discusses the observable effects of the bug, primarily that the window for racing is widened from ~a few milliseconds to several minutes under our configuration.
This SQL query is missing a `GROUP BY` clause, so all of the resources get counted as having the same status (specifically, the alphabetically earliest status any resource had, I think). For test cases this often gets the right result since the number of resources may be small and they may all have the same status, but in production this isn't true. In particular, the allocator would sometimes see "35 destroyed resources" (or whatever), when the real counts were "32 destroyed resources + 3 pending resources".
Since this allocator behavior is soft/advisory this didn't cause any actual problems, per se (we do expect races here occasionally), it just made the race very very easy to hit. For example, Drydock in production currently has three pending working copy resources. Although we do expect this to be //possible//, getting 4 resources when the configured limit is 1 should be hard (not lightning strike / cosmic radiaion hard, but "happens once a year" hard).
Also exclude destroyed resources since we never care about them.
Test Plan:
Followed the plan from D14272 and restarted two Harbormaster workers at the same time.
After this patch was applied, they no longer created two different resources (we expect it to be possible for this to happen, just very hard).
We should still be able to force this race by putting something like `sleep(10)` right before the query, then `sleep(10)` right after it. That would prevent the allocators from seeing one another (so they would both think there were no other resources) and push us down the pathway where we exceed the soft limit.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14274
Summary:
Ref T9252. This is mostly a fix for an edge case from D14236. Here's the setup:
- There are no resources.
- A request for a new resource arrives.
- We build a new resource.
Now, if we were leasing an existing resource, we'd call `canAcquireLeaseOnResource()` before acquiring a lease on the new resource.
However, for new resources we don't do that: we just acquire a lease immediately. This is wrong, because we now allow and expect some resources to be unleasable when created.
In a more complex workflow, this can also produce the wrong result and leave the lease acquired sub-optimally (and, today, deadlocked).
Make the "can we acquire?" pathway consistent for new and existing resources, so we always do the same set of checks.
Test Plan:
- Started daemons.
- Deleted all working copy resources.
- Ran two working-copy-using build plans at the same time.
- Before this change, one would often [1] acquire a lease on a pending resource which never allocated, then deadlock.
- After this change, the same thing happens except that the lease remains pending and the work completes.
[1] Although the race this implies is allowed (resource pool limits are soft/advisory, and it is expected that we may occasionally run over them), it's MUCH easier to hit right now than I would expect it to be, so I think there's probably at least one more small bug here somewhere. I'll see if I can root it out after this change.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14272
Summary:
Ref T182. If 35 other things are configured completely correctly, make it remotely possible that this button may do something approximating the thing that the user wanted.
This primarily fleshes out the idea that "operations" (like landing, merging or cherry-picking) can have some beahavior, and when we run an operation we do whatever that behavior is instead of just running `git show`.
Broadly, this isn't too terrible because Drydock seems like it actually works properly for the most part (???!?!).
Test Plan: {F876431}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T182
Differential Revision: https://secure.phabricator.com/D14270
Summary:
Ref T182. This doesn't do anything interesting yet and is mostly scaffolding, but here's roughly the workflow. From previous revision, you can configure "Repository Automation" for a repository:
{F875741}
If it's configured, a new "Land Revision" button shows up:
{F875743}
Once you click it you get a big warning dialog that it won't work, and then this shows up at the top of the revision (completely temporary/placeholder UI, some day a nice progress bar or whatever):
{F875747}
If you're lucky, the operation eventually sort of works:
{F875750}
It only runs `git show` right now, doesn't actually do any writes or anything.
Test Plan:
- Clicked "Land Revision".
- Watched `phd debug task`.
- Saw it log `git show` to output.
- Verified operation success in UI (by fiddling URL, no way to get there normally yet).
Reviewers: chad
Reviewed By: chad
Subscribers: revi
Maniphest Tasks: T182
Differential Revision: https://secure.phabricator.com/D14266
Summary:
Ref T182. This allows you to assign blueprints that a repository can use to perform working copy operations. Eventually, this will support "merge this" in Differential, etc.
This is just UI for now, with no material effects.
Most of this diff is just taking logic that was in the existing "Blueprints" CustomField and putting it in more general places so Diffusion (which does not use CustomFields) can also access it.
Test Plan:
- Configured repository automation for a repository.
- Removed repository automation for a repository.
Reviewers: chad
Reviewed By: chad
Subscribers: avivey
Maniphest Tasks: T182
Differential Revision: https://secure.phabricator.com/D14259
Summary: Fixes T9552. We need to set a questionID and the question object (for policy) when initializing a new Answer.
Test Plan: Write an answer that mentions another user.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T9552
Differential Revision: https://secure.phabricator.com/D14263
Summary:
Fixes T9558. The recent changes to validate PHID fields don't work cleanly with this gross hack.
This can probably be unwound now but it will definitely get fixed in T9132 so I may just wait for that.
Test Plan: Edited a custom "users" field in Maniphest. This should only affect Maniphest because it has a weird hack.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9558
Differential Revision: https://secure.phabricator.com/D14264
Summary:
In Mercurial 3.2 the `locate` command was deprecated in favor of `files` command. This change updates the DiffusionLowLevelMercurialPathsQuery command to conditionally use `locate` or `files` based on the version of Mercurial used.
Closes T7375
Test Plan:
My test/develop Phabricator instance is setup to run Mercurial 3.5.1.
The test procedure to verify valid file listings are being returned:
1. I navigated to `http://192.168.0.133/conduit/method/diffusion.querypaths/`
2. I populated the following fields:
- path: `"/"`
- commit: `"d721d5b57fc9ef72e47ff9d4e0c583d74a46590c"`
- callsign: `"HGTEST"`
3. I submitted request and verified that result contained all files in the repository:
```
{
"0": "README",
"1": "alpha/beta/trifle",
"2": "test/Chupacabra.cow",
"3": "test/socket.ks"
}
```
I repeated the above steps after setting up Mercurial 2.6.2, which I installed in the following manner:
1. I downloaded Mercurial 2.6.2 source and run `make local` which will only compile it to work from its own directory (`/opt/mercurial-2.6.2`)
2. I linked `/usr/local/bin/hg -> /opt/mercurial-2.6.2/hg` (there's also a `/usr/bin/hg` which is a link to `/usr/local/bin/hg`)
3. I navigated to my home directory and verify that `hg --version` returns 2.6.2.
4. I restarted phabricator services (probably unnecessary).
With the Multimeter application active
1. I verified that `/usr/local/bin/hg` referred to version 2.6
2. I ran the same conduit call from the conduit application
3. I verified that `http://192.168.0.133/multimeter/?type=2&group=label` incremented values for `bin.hg locate`.
4. I swapped out mercurial versions for 3.5.1
5. I ran the same conduit call from the conduit application
6. I verified that `http://192.168.0.133/multimeter/?type=2&group=label` incremented values for `bin.hg files`
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T7375
Differential Revision: https://secure.phabricator.com/D14253
Summary:
Ref T9551. We currently use the same logic for generating project hashtags and Phriction slugs, but should be a little more conservative with project hashtags.
Stop them from generating with stuff that won't parse in a "Reviewers:" field or generally in commments (commas, colons, etc).
Test Plan:
Created a bunch of projects with nonsense in them and saw them generate pretty reasonable hashtags.
{F873456}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9551
Differential Revision: https://secure.phabricator.com/D14261
Summary:
Ref T9551. To set things up:
- Name a project `aa bb`. This will have the tag `aa_bb`.
- Try to visit `/tag/aa%20bb`.
Here's what happens now:
- You get an Aphront redirect error as it tries to add the trailing `/`. Add `phutil_escape_uri()` so that works again.
- Then, you 404, even though this tag is reasonably equivalent to the real project tag and could be redirected. Add a fallback to lookup, resolve, and redirect if we can find a hit for the tag.
This also fixes stuff like `/tag/AA_BB/`.
Test Plan: Visited URIs like `/tag/aa%20bb`, `/tag/aa%20bb/`, `/tag/Aa_bB/`, etc. None of them worked before and now they all do.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9551
Differential Revision: https://secure.phabricator.com/D14260
Summary:
Ref T9519. When acquiring leases on resources:
- Only consider resources created by authorized blueprints.
- Only consider authorized blueprints when creating new resources.
- Fail with a tailored error if no blueprints are allowed.
- Fail with a tailored error if missing authorizations are causing acquisition failure.
One somewhat-substantial issue with this is that it's pretty hard to figure out from the Harbormaster side. Specifically, the Build step UI does not show field value anywhere, so the presence of unapproved blueprints is not communicated. This is much more clear in Drydock. I'll plan to address this in future changes to Harbormaster, since there are other related/similar issues anyway.
Test Plan: {F872527}
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9519
Differential Revision: https://secure.phabricator.com/D14254
Summary: Updates metamta for handleRequest
Test Plan: Unable to test this, but looks safe?
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D14256
Summary: Updates Harbormaster for handleRequest over processRequest
Test Plan: Went through various Harbormaster areas, buildables, actions.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D14255
Summary:
Ref T9519. This is like 80% of the way there and doesn't fully work yet, but roughly shows the shape of things to come. Here's how it works:
First, there's a new custom field type for blueprints which works like a normal typeahead but has some extra logic. It's implemented this way to make it easy to add to Blueprints in Drydock and Build Plans in Harbormaster. Here, I've added a "Use Blueprints" field to the "WorkingCopy" blueprint, so you can control which hosts the working copies are permitted to allocate on:
{F869865}
This control has a bit of custom rendering logic. Instead of rendering a normal list of PHIDs, it renders an annotated list with icons:
{F869866}
These icons show whether the blueprint on the other size of the authorization has approved this object. Once you have a green checkmark, you're good to go.
On the blueprint side, things look like this:
{F869867}
This table shows all the objects which have asked for access to this blueprint. In this case it's showing that one object is approved to use the blueprint since I already approved it, but by default new requests come in here as "Authorization Requested" and someone has to go approve them.
You approve them from within the authorization detail screen:
{F869868}
You can use the "Approve" or "Decline" buttons to allow or prevent use of the blueprint.
This doesn't actually do anything yet -- objects don't need to be authorized in order to use blueprints quite yet. That will come in the next diff, I just wanted to get the UI in reasonable shape first.
The authorization also has a second piece of state, which is whether the request from the object is active or inactive. We use this to keep track of the authorization if the blueprint is (maybe temporarily) deleted.
For example, you might have a Build Plan that uses Blueprints A and B. For a couple days, you only want to use A, so you remove B from the "Use Blueprints: ..." field. Later, you can add B back and it will connect to its old authorization again, so you don't need to go re-approve things (and if you're declined, you stay declined instead of being able to request authorization over and over again). This should make working with authorizations a little easier and less labor intensive.
Stuff not in this diff:
- Actually preventing any allocations (next diff).
- Probably should have transactions for approve/decline, at least, at some point, so there's a log of who did approvals and when.
- Maybe should have a more clear/loud error state when no blueprints are approved?
- Should probably restrict the typeahead to specific blueprint types.
Test Plan:
- Added the field.
- Typed some stuff into it.
- Saw the UI update properly.
- Approved an authorization.
- Declined an authorization.
- Saw active authorizations on a blueprint page.
- Didn't see any inactive authroizations there.
- Clicked "View All Authorizations", saw all authorizations.
Reviewers: chad, hach-que
Reviewed By: chad
Maniphest Tasks: T9519
Differential Revision: https://secure.phabricator.com/D14251
Summary:
If Mercurial 3.4+ is used to host repositories in Phabricator, any clients using 3.5+ will receive an exception after the bundle is pushed up. Clients will also fail to update phases for changesets pushed up.
Before directly responding to mercurial clients with all capabilities, this change filters out the 'bundle2' capability so the client negotiates using a legacy bundle wire format instead.
Test Plan:
Server: Mercurial 3.5
Client: Mercurial 3.4
Test with both HTTP and SSH protocols:
1. Create a local commit on client
2. Push commit to server
3. Verify the client emits something like:
```
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
```
Closes T9450
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T9450
Differential Revision: https://secure.phabricator.com/D14241
Summary:
Ref T9123. Two major Harbormaster-related UI changes in Diffusion:
- Tags table now shows tag build status.
- Branches table now shows branch build status.
Then some minor consistency / qualtiy of life changes:
- Picked a nicer looking "history" icon?
- Branches table now uses the same "history" icon as other tables.
- Tags table now has a "history" link.
- Browse table now has a "history" link.
- Dates now use more consistent formatting.
- Column order is now more consistent.
- Use of style is now more consistent.
Test Plan:
{F865056}
{F865057}
{F865058}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9123
Differential Revision: https://secure.phabricator.com/D14242
Summary: Ref T9252. Move these into a new "Drydock" group.
Test Plan: Clicked "Add Build Step", saw Drydock steps in a Drydock group.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14237
Summary:
Ref T9252. I think there's a more complex version of this problem discussed elsewhere, but here's what we hit today:
- 5 commits land at the same time and trigger 5 builds.
- All of them go to acquire a working copy.
- Working copies have a limit of 1 right now, so 1 of them gets the lease on it.
- The other 4 all trigger allocation of //new// working copies. So now we have: 1 active, leased working copy and 4 pending, leased working copies.
- The 4 pending working copies will never activate without manual intervention, so these 4 builds are stuck forever.
To fix this, prevent WorkingCopies from giving out leases until they activate. So now the leases won't acquire until we know the working copy is good, which solves the first problem.
However, this creates a secondary problem:
- As above, all 5 go to acquire a working copy.
- One gets it.
- The other 4 trigger allocations, but no longer acquire leases. This is an improvement.
- Every time the leases update, they trigger another allocation, but never acquire. They trigger, say, a few thousand allocations.
- Eventually the first build finishes up and the second lease acquires the working copy. After some time, all of the builds finish.
- However, they generated an unboundedly large number of pending working copy resources during this time.
This is technically "okay-ish", in that it did work correctly, it just generated a gigantic mess as a side effect.
To solve this, at least for now, provide a mechanism to impose allocation rate limits and put a cap on the number of allocating resources of a given type. As hard-coded, this the greater of "1" or "25% of the active resources in the pool".
So if there are 40 working copies active, we'll start allocating up to 10 more and then cut new allocations off until those allocations get sorted out. This prevents us from getting runaway queues of limitless size.
This also imposes a total active working copy resource limit of 1, which incidentally also fixes the problem, although I expect to raise this soon.
These mechanisms will need refinement, but the basic idea is:
- Resources which aren't sure if they can actually activate should wait until they do activate before allowing leases to acquire them. I'm fairly confident this rule is a reasonable one.
- Then we limit how many bookkeeping side effects Drydock can generate once it starts encountering limits.
Broadly, some amount of mess is inevitable because Drydock is allowed to try things that might not work. In an extreme case we could prevent this mess by setting all these limits at "1" forever, which would degrade Drydock to effectively be a synchronous, blocking queue.
The idea here is to put some amount of slack in the system (more than zero, but less than infinity) so we get the performance benefits of having a parallel, asyncronous system without a finite, manageable amount of mess.
Numbers larger than 0 but less than infinity are pretty tricky, but I think rules like "X% of active resources" seem fairly reasonable, at least for resources like working copies.
Test Plan:
Ran something like this:
```
for i in `seq 1 5`; do sh -c '(./bin/harbormaster build --plan 10 rX... &) &'; done;
```
Saw 5 plans launch, acquire leases, proceed in an orderly fashion, and eventually finish successfully.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14236
Summary:
Ref T9513. I checked this briefly but didn't do a very thorough job of it.
- Don't try to query merges for Subversion, since it doesn't support them.
- Fix up "existsquery" to work properly (and efficiently) for both hosted and imported repositories.
- Fix up "parentsquery" to have similar behavior on invalid commits to other VCSes (throw an exception).
Test Plan:
- No more merges warning on SVN.
- Hosted SVN gets the right exists result now.
- Visiting "r23980283789287" now 404's instead of "not parsed yet".
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9513
Differential Revision: https://secure.phabricator.com/D14239
Summary:
Ref T9252. Currently, Harbormaster and Drydock work like this in some cases:
# Queue a lease for activation.
# Then, a little later, save the lease PHID somewhere.
# When the target/resource is destroyed, destroy the lease.
However, something can happen between (1) and (2). In Drydock this window is very short and the "something" would have to be a lighting strike or something similar, but in Harbormaster we wait until the resource activates to do (2) so the window can be many minutes long. In particular, a user can use "Abort Build" during those many minutes.
If they do, the target is destroyed but it doesn't yet have a record of the artifact, so the artifact isn't cleaned up.
Make these things work like this instead:
# Create a new lease and pre-generate a PHID for it.
# Save that PHID as something that needs to be cleaned up.
# Queue the lease for activation.
# When the target/resource is destroyed, destroy the lease if it exists.
This makes sure there's no step in the process where we might lose track of a lease/resource.
Also, clean up and standardize some other stuff I hit.
Test Plan:
- Stopped daemons.
- Restarted a build in Harbormaster.
- Stepped through the build one stage at a time using `bin/worker execute ...`.
- After the lease was queued, but before it activated, aborted the build.
- Processed the Harbormaster side of things only.
- Saw the lease get destroyed properly.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14234
Summary:
Ref T9028. When users push a commit, then later delete it (e.g., by deleting the branch which contained it) we currently explode when trying to view it.
Instead, degrade gradually if some information is not available.
Test Plan:
- Looked at valid commits with parents, refs, branches and merges.
- Looked at invalid commits.
- Looked at a previously valid, now-deleted + gc'd commit:
{F859273}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9028
Differential Revision: https://secure.phabricator.com/D14227
Summary: Ref T9478. This should probably be configurable eventually, but for now treat any 200-block status as success. Also show the result code.
Test Plan:
- Hit a bad URI, saw "HTTP 503" + failure.
- Hit a good URI, saw "HTTP 200" + success.
Reviewers: chad, hach-que
Reviewed By: chad, hach-que
Maniphest Tasks: T9478
Differential Revision: https://secure.phabricator.com/D14226
Summary:
Fixes T9494. This:
- Removes all the random GC.x.y.z config.
- Puts it all in one place that's locked and which you use `bin/garbage set-policy ...` to adjust.
- Makes every TTL-based GC configurable.
- Simplifies the code in the actual GCs.
Test Plan:
- Ran `bin/garbage collect` to collect some garbage, until it stopped collecting.
- Ran `bin/garbage set-policy ...` to shorten policy. Saw change in web UI. Ran `bin/garbage collect` again and saw it collect more garbage.
- Set policy to indefinite and saw it not collect garabge.
- Set policy to default and saw it reflected in web UI / `collect`.
- Ran `bin/phd debug trigger` and saw all GCs fire with reasonable looking queries.
- Read new docs.
{F857928}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9494
Differential Revision: https://secure.phabricator.com/D14219
Summary: Ref T9252. This variable was always wrong but we fell back to just resetting to `HEAD` before. Use the correct variable name.
Test Plan: Verified variable name.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14224
Summary: Ref T9352. See D13635. Build targets can have variables already, but let builds have them too. This mostly enables future use cases (sub-builds, more sophisticated build triggers).
Test Plan: With a custom Herald rule + action like the one in T9352, updated a revision and saw it generate multiple builds with varying parameters.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9352
Differential Revision: https://secure.phabricator.com/D14222
Summary: Ref T9494. Depends on D14216. Remove 10 copies of this code.
Test Plan: Ran `arc unit --everything`, browsed Config > Modules, clicked around Herald / etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9494
Differential Revision: https://secure.phabricator.com/D14217
Summary:
Ref T9252. This primarily allows Harbormaster to request (and Drydock to fulfill) working copies with a patch from a staging area. Doing this means we can do builds on in-review changes from `arc diff`.
This is a little cobbled-together but should basically work.
Also fix some other issues:
- Yielded, awakend workers are fine to update but could complain.
- We can't log slot lock failures to resources if we don't end up saving them.
- Killing the transaction would wipe out the log.
- Fix some TODOs, etc.
Test Plan: Ran Harbormaster builds on a local revision.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14214
Summary:
Ref T9252. Long ago you sometimes manually created resources, so they had human-enterable names. However, users never make resources manually any more, so this field isn't really useful any more.
In particular, it means we write a lot of untranslatable strings like "Working Copy" to the database in the default locale. Instead, do the call at runtime so resource names are translatable.
Also clean up a few minor things I hit while kicking the tires here.
It's possible we might eventually want to introduce a human-choosable label so you can rename your favorite resources and this would just be a default name. I don't really have much of a use case for that yet, though, and I'm not sure there will ever be one.
Test Plan:
- Restarted a Harbormaster build, got a clean build.
- Released all leases/resources, restarted build, got a clean build with proper resource names.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14213
Summary:
Ref T9252. See companion change in D14211. This does the same thing for leases.
Particularly, most of the TODOs about error handling can just be removed because they'll do the right things by default now.
This and D14211 also move slot lock release to after resource destruction. This feels cleaner than trying to release early at release/break.
Test Plan: Restarted a Harbormaster build, got a clean build result. This needs more vetting but I'll clean up any issues as I hit them.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14212
Summary:
Ref T9252. Currently, error handling behavior isn't great and a lot of errors aren't dealt with properly. Try to improve this by making default behaviors better:
- Yields, slot lock exceptions, and aggregate or proxy exceptions containing an excpetion of these types turn into yields.
- All other exceptions are considered permanent failures. They break the resource and
This feels a little bit "magical" but I want to try to get the default behaviors to align reasonably well with expectations so that blueprints mostly don't need to have a ton of error handling. This will probably need at least some refinement down the road, but it's a reasonable rule for all exception/error conditions we currently have.
Test Plan: I did a clean build, but haven't vetted this super thoroughly. Next diff will do the same thing to leases, then I'll work on stabilizing this code better.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14211
Summary: Ref T9252. Add a bit more logging and improve some behaviors.
Test Plan: Restarted a build, got a good result.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14210
Summary:
Ref T9252. This is the same as D14201, but for lease stuff instead of resource stuff.
This one is a little heavier but still feels pretty reasonable to me at the end of the day (worker is <1K lines and has a ton of comment stuff).
Also fixes a few random bugs I hit in the task queue.
Test Plan:
- Restarted some Harbormaster builds, saw them go through cleanly.
- Released pre-activation resources/leases.
- Probably still kinda buggy but I'll iron the details out over time.
Logs are starting to look somewhat plausible:
{F855747}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14202
Summary:
Ref T9252. Currently, Drydock Leases and Resources have several workers:
- Resources: ResourceWorker, ResourceUpdateWorker, ResourceDestroyWorker
- Leases: AllocatorWorker, LeaseWorker, LeaseUpdateWorker, LeaseDestroyWorker
This is kind of a lot of stuff, and it creates some problems.
In particular, leases and resources in early lifecycle phases (pending/allocating/acquiring) can't process commands yet, because that code is only in the "UpdateWorker" classes. If they aren't able to move forward because of a bug, they also can't be released because they can't react to the release command until later in their lifecycle. This creates a soft hang where I have to go wipe stuff out of the database since there's no other way to get rid of it.
Instead, I want leases and resources to be releasable from any (pre-release / pre-destroy) phase of their lifecycle. To support this, all the workers before the "UpdateWorker" need to be able to process commands.
A second, similar issue is that logging and exception handling behaviors are underpowered right now. Elsewhere I began improving this, but ran into issues where all of the workers needed to share very similar exception code. Merging them will make this future change simpler.
This diff fixes this for resources: it merges the Worker, UpdateWorker and DestroyWorker logic into UpdateWorker and throws away the other two workers.
Test Plan: Nothing substantive yet, see next diff. I'll do the same thing for Leases, then test both more thoroughly.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14201
Summary: Ref T9252. Make log types modular so they can be translated and have complicated rendering logic if necessary (currently, none have this).
Test Plan: {F855330}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14198
Summary:
Ref T9252. Drydock logs are almost exclusively useful as a diagnostic tool for debugging immediate problems, so GC them fairly aggressively.
(I expect 99% of the usefulness of these logs to be within the first 24 hours, basically "why isn't my thing working". I can't really think of any cases where having old logs would be useful.)
Test Plan:
- Ran GC, saw it hit the log table (with no effect).
- Changed TTL from 30 days to 30 seconds, ran GC, saw it wipe recent logs.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14197
Summary:
Ref T9252. Several general changes here:
- Moves logs to use PHIDs instead of IDs. This generally improves flexibility (for example, it's a lot easier to render handles).
- Adds `blueprintPHID` to logs. Although you can usually figure this out from the leasePHID or resourcePHID, it lets us query relevant logs on Blueprint views.
- Instead of making logs a top-level object, make them strictly a sub-object of Blueprints, Resources and Leases. So you go Drydock > Lease > Logs, etc., to get to logs.
- I might restore the "everything" view eventually, but it doesn't interact well with policies and I'm not sure it's very useful. A policy-violating `bin/drydock log` might be cleaner.
- Policy-wise, we always show you that logs exist, we just don't show you log content if it's about something you can't see. This is similar to seeing restricted handles in other applications.
- Instead of just having a message, give logs "type" + "data". This will let logs be more structured and translatable. This is similar to recent changes to Herald which seem to have worked well.
Test Plan:
Added some placeholder log writes, viewed those logs in the UI.
{F855199}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14196
Summary: Ref T9252. We're currently resetting to the local branch, but should be resetting to the origin branch.
Test Plan: Restarted a build, had it run `git show`, saw proper HEAD.
Reviewers: chad
Reviewed By: chad
Subscribers: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14194
Summary: Ref T9271, maybe fixes it. This restores feed publishing for answers (broken in D13951) and sends the author of the question an email for new answers. Also, unsure how to pull all question subsribers to the answer email, or is it automagical?
Test Plan: notchad asks a question, chad answers, log into notchad and see that mail was delivered, see feed story.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: revi, Korvin
Maniphest Tasks: T9271
Differential Revision: https://secure.phabricator.com/D14171
Summary: This is to make it more obvious how to ignore a folder that may be at the root, and it resolves https://secure.phabricator.com/T8894
Test Plan: Just a documentation change
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, nornagon
Differential Revision: https://secure.phabricator.com/D14193
Summary: Replaces D13687. Leases track an owner but don't currently show it.
Test Plan:
Looked at a lease.
{F851223}
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D14191
Summary: Fixes T9481. If the viewer does not have access to Differential (for example, because it is not installed), hide the "Revision" column in Diffusion.
Test Plan:
- Viewed history, saw "Revision" column.
- Uninstalled Differential, reloaded, no "Revision" column.
Reviewers: chad
Reviewed By: chad
Subscribers: revi
Maniphest Tasks: T9481
Differential Revision: https://secure.phabricator.com/D14188
Summary:
Ref T9482. This button goes to the wrong place and this table conditionally hides itself so I missed it.
Instead:
- Always show the table, with an empty string if there are no relevant commits.
- Link to a working UI.
Test Plan: Saw table. Clicked button.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9482
Differential Revision: https://secure.phabricator.com/D14189
Summary:
Fixes T9479. Currently, `@aaaaaaaa` may try to match as a commit hash, and `@C123456` may try to match as a Countdown reference. These should only match as user mentions.
Prevent object mention rules from matching after `@`. We already prevent them after `-` and `#`, and already prevented the username rule after `@` (i.e., preventing `@@user`).
Test Plan:
Created some "interesting" users locally and `@mentioned` them:
{F850779}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9479
Differential Revision: https://secure.phabricator.com/D14186
Summary:
Ref T9123. After recent changes, viewing a live build log from the web UI throws a CSRF exception.
Check `start` ("this object is an active log"), not `live` ("this log is open somewhere").
Test Plan: Will push / etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9123
Differential Revision: https://secure.phabricator.com/D14185
Summary:
Ref T9123. To run upstream builds in Harbormaster/Drydock, we need to be able to check out `libphutil`, `arcanist` and `phabricator` next to one another.
This adds an "Also Clone: ..." field to Harbormaster working copy build steps so I can type all three repos into it and get a proper clone with everything we need.
This is somewhat upstream-centric and a bit narrow, but I don't think it's totally unreasonable, and most of the underlying stuff is relatively general.
This adds some more typechecking and improves data/type handling for custom fields, too. In particular, it prevents users from entering an invalid/restricted value in a field (for example, you can't "Also Clone" a repository you don't have permission to see).
Test Plan: Restarted build, got a Drydock resource with multiple repositories in it.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9123
Differential Revision: https://secure.phabricator.com/D14183
Summary:
Fixes T8728. As far as I can tell, I simply got this wrong in D11826. This is not the proper name for the preference.
That change primarily focused on the "spammy junk during import" issue, and the code did get the importing flag right. It looks like my testing in D11827 focused on "during import" and just missed this case.
Test Plan: Grepped for `disable-herald`. Grepped for `herald-disable`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T8728
Differential Revision: https://secure.phabricator.com/D14181
Summary:
Ref T9252. For building Phabricator itself, we need to have `libphutil/`, `arcanist/` and `phabricator/` next to one another on disk.
Expand the Drydock WorkingCopy resource so that it can have multiple repositories if the caller needs them.
I'm not sure if I'm going to put the actual config for this in Harbormaster or Drydock yet, but the WorkingCopy resource itself should work the same way in either case.
Test Plan: Restarted a Harbormaster build which leases a working copy, saw it build as expected.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14180
Summary:
Ref T9252. Currently, Harbormaster does this when trying to acquire a working copy:
- Ask for a working copy.
- Yield for 15 seconds.
- Check if we have a working copy yet.
That's OK, but Drydock takes ~1s to acquire a working copy lease if a resource is already available, so we end up doing this:
- T+0: Ask for a working copy.
- T+0: Yield for 15 seconds.
- T+1: Working copy lease activates.
- T+15: Working copy lease is used.
- T+16: Build finishes.
So we end up spending about 2 seconds doing work and 14 seconds sleeping.
One way to fix this would be to fiddle with the yield duration, so we yield for 1, 2, 4, ... seconds or something. This probably isn't a bad idea for longer leases (i.e., wait for 15, 30, 45 ... seconds or similar) but it implies a lot of churn for short leases.
Instead, let tasks "awaken" other tasks when they complete. The "awaken" operation means: if a task is in a yielded state (no failures, no owner, explicitly yielded, future expires time), pretend it only yielded until right now instead of whenever it really yielded to.
Basically, this rewrites history so that even though Harbormaster did a `yield(15)`, we pretend it did a `yield(4)` after we activate the lease if lease activation took 4 seconds.
If this misses, it's fine: we fall back to the normal yield behavior and things move forward normally a few seconds later.
If it hits, we get a more nimble process pretty cleanly.
Test Plan:
- Restarted a build plan (lease working copy + run `ls`) with this patch no-op'd, took about 16 seconds.
- Restarted a build plan with this patch active, took about 1 second.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14178
Summary: Ref T9252. Show the user when a resource or lease has a pending release command in queue.
Test Plan: Released a resource and lease from the web UI. In both cases, saw a "releasing" tag and the action disable.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14177
Summary:
Fixes T6569. This implements an expiry mechanism for Drydock resources which parallels the mechanism for leases.
A few things are missing that we'll probably need in the future:
- An "EXPIRES" command to update the expiration time. This would let resources be permanent while leased, then expire after, say, 24 hours without any leases.
- A callback like `shouldActuallyExpireRightNow()` for resources and leases that lets them decide not to expire at the last second.
- A callback like `didAcquireLease()` for resource blueprints, to parallel `didReleaseLease()`, letting them clear or extend their timer.
However, this stuff would mostly just let us tune behaviors, not really open up new capabilities.
Test Plan: Changed host resources to expire after 60 seconds, leased one, saw it vanish 60 seconds later.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569
Differential Revision: https://secure.phabricator.com/D14176
Summary: Fixes T9474
Test Plan: Make a poll, log out, still see it.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T9474
Differential Revision: https://secure.phabricator.com/D14179
Summary: Fixes T5991. If //all requested documents// failed to index, consider this a catastrophic failure and exit with an error code.
Test Plan:
- Ran `bin/search index --type TASK`, observed successful exit despite a small number of un-indexable documents.
- Ran `bin/search index PHID-TASK-xxx` for an invalid task, observed exception on exit after complete failure.
- Ran normal indexing through daemons.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T5991
Differential Revision: https://secure.phabricator.com/D14174
Summary: Ref T9272. This doesn't fix anything, just a little cleanup while I was looking at it.
Test Plan: Clicked "Show Details" on a couple description changes, got the same effect for less code.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9272
Differential Revision: https://secure.phabricator.com/D14168
Summary: Fixes T9312. This is a bit fluff, but does simplify the view controller slightly and seems reasonable/useful in general.
Test Plan: Clicked "View Raw File" on a paste, got redirected to the raw file via a stable URI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9312
Differential Revision: https://secure.phabricator.com/D14167
Summary: Ref T9252. This mostly cleans up future and log handling, and edges us closer to being able to do useful work with Harbormaster / Drydock.
Test Plan:
- Added a "Run `ls -alh`" step to my trivial build plan.
- Ran it a bunch of times.
- Worked great.
- Also did an HTTP plan.
{F835227}
{F835228}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14161
Summary: Fixes T9434. I'm not sure exactly what changed behavior here, but we need a `needAuditRequests()`.
Test Plan: Ran a query which hit the exception (empty query was good enough, locally), then applied this patch; saw exception go away.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9434
Differential Revision: https://secure.phabricator.com/D14162
Summary:
Fixes T9427. Currently, replies to audits/commits go to "Cxxx", but so do replies to countdowns.
There is non real non-disruptive approach available here and this seems least-bad.
Test Plan:
- Made a comment on a commit.
- Fished the reply-to address out of `bin/mail list-oubound` + `bin/mail show-outbound` (it was now "COMMIT...").
- Sent mail to that address.
- Grabbed the raw message and wrote it to `mail.txt`.
- Ran `cat mail.txt | ./scripts/mail/mail_handler.php --process-duplicates`.
- Used `bin/mail list-inbound` + `bin/mail show-inbound` to verify receipt.
- Saw comment appear on audit.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9427
Differential Revision: https://secure.phabricator.com/D14163
Summary:
Fixes T9328. There's no way to hit these error states by clicking things in the UI that I could find, but if you mash stuff into your URL bar or "Inspect Element..." and then edit the form to be full of garbage you can hit them.
Make them a little more informative and don't send them to the log, since these are pretty much just fancy 404s.
Test Plan: Bashed my fist on the URL bar to hit all these messages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9328
Differential Revision: https://secure.phabricator.com/D14164
Summary:
Fixes T9145. We currently restrict lint codes to 32 bytes, but PHPCS generates codes like "PHPCS.E.PEAR.Comments.Messages.Line.TooLong".
These codes seem reasonable as codes, and we don't currently have any key-length problems or other technical concerns with simply raising the size of this column.
Test Plan: Ran `bin/storage upgrade` to pick up adjustments.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9145
Differential Revision: https://secure.phabricator.com/D14166
Summary: Ref T9252. This is still crude in a few ways but basically works, at least for commits.
Test Plan:
- Made a build plan with just this build step.
- Ran `bin/harbormaster build --plan 10 ...` on a commit.
- It actually built a working copy, leased it, took no action, and released the lease. MAGIC~~~
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14160
Summary: Ref T9252. This is the same as D14157, just for Resources and their leases.
Test Plan: Viewed a resource, saw only active leases, clicked "View All Leases", queried, clicked around, used crumbs.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14158
Summary:
Ref T9252. Currently, Drydock blueprint pages:
- show all resources, even if there are a million;
- show resources in all states, although destroyed resources are usually uninteresting;
- have some junky `$pager` code.
Instead, show the few most recent active resources and link to a filtered resource view in ApplicationSearch.
Test Plan:
- Viewed some blueprints.
- Clicked "View All Resources".
- Saw all resources.
- Used query / crumbs / etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14157
Summary: Ref T9252. If you have a blueprint and you do not like that blueprint very much, you can disable it.
Test Plan: Disabled / enabled some blueprints.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14156
Summary:
Ref T9252. Move these to the more modern stuff to pick up ordering and interface support for free.
Also work around the blueprint / custom field integration a little more gracefully.
Test Plan: Searched for blueprints, resources and leases.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14155
Summary: Ref T9252. Resources always have a corresponding blueprint, and it makes sense to use the same policies for both.
Test Plan: Viewed resources in web UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14154
Summary:
Ref T9252. Drydock currently uses integer statuses, but there's no reason for this (they don't need to be ordered) and it makes debugging them, working with them, future APIs, etc., more cumbersome.
Switch to string instead.
Also rename `STATUS_OPEN` to `STATUS_ACTIVE` and `STATUS_CLOSED` to `STATUS_RELEASED` for consistency. This makes resources and leases have more similar states, and gives resource states more accurate names.
Test Plan: Browsed web UI, grepped for changed constants, applied patch, inspected database.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14153
Summary:
Ref T9252. Leases currently have a `resourceID`, but this is a bit nonstandard and generally less flexible than giving them a `resourcePHID`.
In particular, a `resourcePHID` is easier to use when rendering interfaces, since you can get handles out of a PHID.
Add a PHID column, copy over all the PHIDs that correspond to existing IDs, then drop the ID column.
Test Plan:
- Browsed web UIs.
- Inspected database during/after migration.
- Grepped for `resourceID`.
- Allocated a new lease with `bin/drydock lease`.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14151
Summary: Ref T9252. This is now more consistent (same as the equivalent Resource state) and accurate (leases can end up in this state a bunch of ways, including by expiring).
Test Plan: `grep`, browsed around web UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14150
Summary: Ref T6569. If a lease is activated with an expiration date, schedule a task to try to clean it up after that time.
Test Plan:
- Used `bin/drydock lease ... --until ...` to activate a lease in the near future.
- Waited for a bit.
- Saw it expire and get destroyed at the scheduled time.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569
Differential Revision: https://secure.phabricator.com/D14148
Summary:
Ref T9252. This simplifies some Drydock code.
Most of this code relates to the old notion of Drydock being able to enumerate all the tasks it needs to complete in order to acquire a lease. The code has stepped back from this, since it's unnecessary, the queue is more powerful than it used to be, and it would be a lot of work to keep track of.
The ~only thing that should ever wait for leases in modern code is `bin/drydock lease`, and it's fine for it to just sit there sleeping, so this just does that.
This reduces the granularity of logging, but I'll address that separately in future logging-focused changes.
Test Plan: Used `bin/drydock lease` to acquire a lease, saw it acquire cleanly.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14147
Summary: Fixes T9388, lays in basic ApplicationSearch.
Test Plan: Build a dashboard with Posts and Blogs, click on search icon, get sent to correct page.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Maniphest Tasks: T9388
Differential Revision: https://secure.phabricator.com/D14146
Summary: Ref T9252. Some leases or resources may need to remove data, tear down VMs, etc., during cleanup. After they are released, queue a "destroy" phase for performing teardown.
Test Plan:
- Used `bin/drydock lease ...` to create a working copy lease.
- Used `bin/drydock release-lease` and `bin/drydock release-resource` to release the lease and then the working copy and host.
- Saw working copy and host get destroyed and cleaned up properly.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569, T9252
Differential Revision: https://secure.phabricator.com/D14144
Summary:
Ref T9252. Broadly, Drydock currently races on releasing objects from the "active" state. To reproduce this:
- Scatter some sleep()s pretty much anywhere in the release code.
- Release several times from web UI or CLI in quick succession.
Resources or leases will execute some release code twice or otherwise do inconsistent things.
(I didn't chase down a detailed reproduction scenario for this since inspection of the code makes it clear that there are no meaningful locks or mechanisms preventing this.)
Instead, add a Harbormaster-style command queue to resources and leases. When something wants to do a release, it adds a command to the queue and schedules a worker. The workers acquire a lock, then try to consume commands from the queue.
This guarantees that only one process is responsible for writes to active resource/leases.
This is the last major step to giving resources and leases a single writer during all states:
- Resource, Unsaved: AllocatorWorker
- Resource, Pending: ResourceWorker (Possible rename to "Allocated?")
- Resource, Open: This diff, ResourceUpdateWorker. (Likely rename to "Active").
- Resource, Closed/Broken: Future destruction worker. (Likely rename to "Released" / "Broken"; maybe remove "Broken").
- Resource, Destroyed: No writes.
- Lease, Unsaved: Whatever wants the lease.
- Lease, Pending: AllocatorWorker
- Lease, Acquired: LeaseWorker
- Lease, Active: This diff, LeaseUpdateWorker.
- Lease, Released/Broken: Future destruction worker (Maybe remove "Broken"?)
- Lease, Expired: No writes. (Likely rename to "Destroyed").
In most phases, we can already guarantee that there is a single writer without doing any extra work. This is more complicated in the "Active" case because the release buttons on the web UI, the release tools on the CLI, the lease requestor itself, the garbage collector, and any other release process cleaning up related objects may try to effect a release. All of these could race one another (and, in many cases, race other processes from other phases because all of these get to act immediately) as this code is currently written. Using a queue here lets us make sure there's only a single writer in this phase.
One thing which is notable is that whatever acquires a lease **can not write to it**! It is never the writer once it queues the lease for activation. It can not write to any resources, either. And, likewise, Blueprints can not write to resources while acquiring or releasing leases.
We may need to provide a mechinism so that blueprints and/or resource/lease holders get to attach some storage to resources/leases for bookkeeping. For example, a blueprint might need to keep some kind of cache on a resource to help it manage state. But I think we can cross that bridge when we come to it, and nothing else would need to write to this storage so it's technically straightforward to introduce such a mechanism if we need one.
Test Plan:
- Viewed buttons in web UI, checked enabled/disabled states.
- Clicked the buttons.
- Saw commands show up in the command queue.
- Saw some daemon stuff get scheduled.
- Ran CLI tools, saw commands get consumed and resources/leases release.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14143
Summary:
Ref T9458. This is basically the same as D13319, but the "Tags" field didn't get covered in that change.
Specifically, the issue is:
- We try to generate mail to a disabled user (later, we'll drop it without delivering it, but that filtering doesn't happen yet).
- The disabled user doesn't have permission to use Conduit (or any other Conduit-related problem occurs).
- We fail here, then retry generating the mail again later.
Instead, just degrade to not building the field and showing what went wrong.
Test Plan:
- Pushed some commits, saw mail generate.
- Added a fake exception to the field, saw the mail generate with an error message.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9458
Differential Revision: https://secure.phabricator.com/D14142
Summary: This UI should have a Collapsed PHUIObjectBoxView.
Test Plan: review a file in sandbox
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D14139
Summary:
Ref T9253. For resources and leases that need to do something which takes a lot of time or requires waiting, allow them to allocate/acquire first and then activate later.
When we allocate a resource or acquire a lease, the blueprint can either activate it immediately (if all the work can happen quickly/inline) or activate it later. If the blueprint activates it later, we queue a worker to handle activating it.
Rebuild the "working copy" blueprint to work with this model: it allocates/acquires and activates in a separate step, once it is able to acquire a host.
Test Plan: With some power of imagination, brought up a bunch of working copies with `bin/drydock lease --type working-copy ...`
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14127
Summary: Ref T9253. Provide a meaningful command interface for Almanac hosts.
Test Plan:
Configued and leased a real host (`sbuild001.phacility.net`) and ran a command on it.
```
$ ./bin/drydock command --lease 90 -- ls /
bin
boot
core
dev
etc
home
initrd.img
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
vmlinuz
```
Reviewers: chad, hach-que
Reviewed By: chad, hach-que
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14126
Summary: Ref T9253. We had some un-modern use of UI elements, clean that up. Add a tab for showing slot locks so you don't have to fish around in the database.
Test Plan: Looked at blueprints, resources and leases. Looked at slot locks.
Reviewers: chad, hach-que
Reviewed By: chad, hach-que
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14119
Summary:
See discussion in D10304. There's a lot of context there, but the general idea is:
- Blueprints should manage locks in a granular way during the actual allocation/acquisition phase.
- Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases.
The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work:
```
...
->needSlotLock("mylock(PHID-XYZQ-...)")
...
```
When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do.
If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this:
```
mylock(whatever).slot(2)
```
Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks.
This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing.
(The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.)
Test Plan:
To create a race where the same binding is allocated as a resource twice:
- Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated.
- (Comment out slot lock acquisition if you have this patch.)
- Run `bin/drydock lease ...` in two windows, within 10 seconds of one another.
This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free.
To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Differential Revision: https://secure.phabricator.com/D14118
Summary:
Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes.
This attempts to make language more consistent: resources are "allocated" and leases are "acquired".
This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do.
In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like:
$lease
->setActivateWhenAcquired(true)
->needSlotLock('x')
->needSlotLock('y')
->acquireOnResource($resource);
In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward.
This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes:
- The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode.
- The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster.
- This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks.
Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem).
The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier.
Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to.
This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself:
- A bunch of code from that (e.g., interfaces) doesn't exist yet.
- I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing.
This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct.
NOTE: This technically works but doesn't do anything useful yet.
Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Subscribers: Mnkras
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14117
Summary:
Ref T9253. The Drydock allocator is very pseudocodey right now. Particularly, it was written before Blueprints were concrete.
Reorganize it to make its responsibilities and error handling behaviors more clear.
In particular, the Allocator does not manage locks. It's primarily trying to reject allocations which can not possibly work. Blueprints are responsible for locks. See some discussion in D10304.
NOTE: This code probably doesn't work as written, see future diffs.
Test Plan: See future diffs.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Subscribers: cburroughs
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14114
Summary:
Ref T9253. Some of the Drydock code is pretty old. This applies standard modernizations to it:
- Modernize Query classes to use stuff like `buildWhereClauseParts()` and `loadStandardPage()`.
- Modernize all the getX() / attachX() stuff. In particular:
- Require and attach implementations to Blueprints.
- Require and attach Blueprints to Resources.
- BlueprintImplementations are now always unique per-Blueprint so they can store/cache state if they want without running over one another.
- BlueprintImplementations are now passed a `$blueprint`, like other similar APIs (this could go various ways but I generally like this as a balance of concerns).
NOTE: This probably doesn't run on its own, I'm just trying to split the next diff (core allocator stuff) up a bit and these pieces are all pretty standard.
Test Plan:
- Not much; see next revision or two.
- Clicked around Resource and Blueprint lists.
Reviewers: chad, hach-que
Reviewed By: chad, hach-que
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14113
Summary:
Ref T9253. See discussion in D13843.
I want to let Drydock blueprints for Almanac services choose those services from a typeahead, but only list appropriate services in the typeahead. To do this:
- Provide a StandardCustomField for an arbitrary datasource.
- Adjust the AlmanacServiceDatasource to allow filtering by service class.
This implementation is substantially the same as the one in D13843, with some adjustments:
- I lifted most of the code in the `Users` standard custom field into a new `Tokenizer` standard custom field.
- The `Users` and `Datasource` custom fields now extend the `Tokenizer` custom field and can share most of the code it uses.
- I exposed this field fully as a configurable field. I don't think anyone will ever use it, but this generality costs us nearly nothing and improves consistency.
- The code in D13843 didn't actually pass the parameters over the wire, since the object that responds to the request is not the same object that renders the field. Use the "parameters" mechanism in datasources to get things passed over the wire.
Test Plan:
- Created a custom "users" field in Maniphest and made sure it still wokred.
- Created a custom "almanc services" field in Maniphest and selected some services for a task.
- With additional changes from D13843, selected an appropriate Almanac service in a new Drydock blueprint.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14111
Summary:
Ref T9253. This comes from a time before Almanac. Now that we have Almanac, it makes much more sense to put this logic there than to try to put it in Drydock itself.
Remove the preallocated host blueprint, a relic of a bygone time.
Test Plan: Grepped for callsites.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14110
Summary: Ref T9253. See D13843 for some discussion. This is very bare-bones for now since I believe that almost all interesting configuration (e.g., credentials) should live in Drydock, although I imagine it getting some configuration eventually.
Test Plan: Used {nav Almanac > Services > Create Service} to create a new service of this type.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14109
Summary:
Fixes T9446. We allow administrators to send "Welcome" mail to bots and mailing lists.
This is harmless (these links do not function), but confusing.
Instead, disable this option in the UI and explain why it is disabled when it is clicked. Also prevent generation of this mail lower in the stack.
Test Plan:
- Viewed a bot page, saw action disabled, clicked it, got explanation.
- Viewed a normal user page, saw action enabled, clicked it, sent welcome email.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9446
Differential Revision: https://secure.phabricator.com/D14134