1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-28 17:52:43 +01:00
Commit graph

71 commits

Author SHA1 Message Date
epriestley
426d648681 In Drydock, don't reset current branch to point at unrelated commit
Summary:
Fixes T10037. When we're building commit `aabbccdd`, we currently do this to check it out:

  git reset --hard aabbccdd

However, this has an undesirable side effect of moving the current branch pointer to point at `aabbccdd`. The current branch pointer may be some totally different branch which `aabbccdd` is not part of, so this is confusing and misleading.

Instead, use `git reset --hard HEAD` to get the primary effect we want (destroying staged changes) and then `git checkout aabbccdd` to checkout the commit in a detached HEAD state.

Test Plan:
  - Ran a build (a commit-focused operation) successfully.
  - Verified working copy was pointed at a detached HEAD afterward:

```
builder@sbuild001:/var/drydock/workingcopy-167/repo/git-test-ii$ git status
HEAD detached at ffc7635
nothing to commit, working directory clean
```

  - Ran a land (a branch-foused operation) successfully.
  - Verified working copy was pointed at a branch afterward:

```
builder@sbuild001:/core/data/drydock/workingcopy-168/repo/git-test$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working directory clean
```

Reviewers: chad

Reviewed By: chad

Subscribers: yelirekim

Maniphest Tasks: T10037

Differential Revision: https://secure.phabricator.com/D14850
2015-12-22 06:47:47 -08:00
epriestley
bbf4ce79e3 Provide a username and email when running git merge --squash
Summary:
Ref T182. This command should never actually generate a commit because `--squash` prevents that, but `git` seems to sometimes hit a check for username/email configuration (maybe when merging a non-fastforward?).

Give it some dummy values to placate it. This command shouldn't commit anything so these values should never actually be used.

Test Plan: Landed rGITTESTd8c8643cb02bbe60048c6c206afc2940c760a77e.

Reviewers: chad, Mnkras

Reviewed By: Mnkras

Maniphest Tasks: T182

Differential Revision: https://secure.phabricator.com/D14347
2015-10-26 21:12:16 +00:00
epriestley
5a35dd233b Don't use --ff-only inside "Land Revision"
Summary:
Ref T182. I lifted this logic out of `arc`, but the context is a little different there, and this option is too strict in "Land Revision".

Specifically, it prevents `git` from merging unless the merge is //strictly// a fast-foward, even with `--squash`. That means revisions can't merge unless they're rebased on the current `master`, even if they have no conflicts.

(This whole process will probably need additional refinement, but the behavior without this flag is more reasonable overall than the behavior with it for now.)

Test Plan: Will land stuff in production~~

Reviewers: chad, Mnkras

Reviewed By: Mnkras

Maniphest Tasks: T182

Differential Revision: https://secure.phabricator.com/D14346
2015-10-26 20:26:56 +00:00
epriestley
0b24a6e200 Make "Land Revision" show merge conflicts more clearly
Summary:
Ref T182. We just show "an error happened" right now. Improve this behavior.

This error handling chain is a bit ad-hoc for now but we can formalize it as we hit other cases.

Test Plan:
{F910247}

{F910248}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T182

Differential Revision: https://secure.phabricator.com/D14343
2015-10-26 20:11:21 +00:00
epriestley
9c39493796 Make WorkingCopyBlueprint responsible for performing merges
Summary:
Ref T182. Currently, the "RepositoryLand" operation is responsible for performing merges when landing a revision.

However, we'd like to be able to perform these merges in a larger set of cases in the future. For example:

  - After Releeph is revamped, when someone says "I want to merge bug fix X into stable branch Y", it would probably be nice to make that a Buildable and let tests run against it without requring that it actually be pushed anywhere.
  - Same deal if we want a merge-from-Diffusion or cherry-pick-from-Diffusion operation.
  - Similar deal if we want a "random web UI edits from Diffusion".

Move the merging part into WorkingCopy so more applications can share/use it in the future.

A big chunk of this is me making stuff up for now (the ol' undocumented dictionary full of arbitrary magic keys), but I anticipate formalizing it as we move along.

Test Plan: Pushed rGITTEST0d58eef3ce0fa5a10732d2efefc56aec126bc219 up from my local install via "Land Revision".

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T182

Differential Revision: https://secure.phabricator.com/D14337
2015-10-26 12:40:16 -07:00
epriestley
c059149eb9 Remove Drydock host resource limits and give working copies simple limits
Summary:
Ref T9252. Right now, we have very strict limits on Drydock: one lease per host, and one working copy per working copy blueprint.

These are silly and getting in the way of using "Land Revision" more widely, since we need at least one working copy for each landable repository.

For now, just remove the host limit and put a simple limit on working copies. This might need to be fancier some day (e.g., limit working copies per-host) but it is generally reasonable for the use cases of today.

Also add a `--background` flag to make testing a little easier.

(Limits are also less important nowadays than they were in the past, because pools expand slowly now and we seem to have stamped out all the "runaway train" bugs where allocators go crazy and allocate a million things.)

Test Plan:
  - With a limit of 5, ran 10 concurrent builds and saw them finish after allocating 5 total resources.
  - Removed limit, raised taskmaster concurrency to 128, ran thousands of builds in blocks of 128 or 256.
    - Saw Drydock gradually expand the pool, allocating a few more working copies at first and a lot of working copies later.
    - Got ~256 builds in ~140 seconds, which isn't a breakneck pace or anything but isn't too bad.
    - This stuff seems to be mostly bottlenecked on `sbuild` throttling inbound SSH connections. I haven't tweaked it.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14334
2015-10-26 12:39:47 -07:00
epriestley
ac7edf54af Fix bad counting in SQL when enforcing Drydock allocator soft limits
Summary:
Ref T9252. This fixes a bug from D14236. D14272 discusses the observable effects of the bug, primarily that the window for racing is widened from ~a few milliseconds to several minutes under our configuration.

This SQL query is missing a `GROUP BY` clause, so all of the resources get counted as having the same status (specifically, the alphabetically earliest status any resource had, I think). For test cases this often gets the right result since the number of resources may be small and they may all have the same status, but in production this isn't true. In particular, the allocator would sometimes see "35 destroyed resources" (or whatever), when the real counts were "32 destroyed resources + 3 pending resources".

Since this allocator behavior is soft/advisory this didn't cause any actual problems, per se (we do expect races here occasionally), it just made the race very very easy to hit. For example, Drydock in production currently has three pending working copy resources. Although we do expect this to be //possible//, getting 4 resources when the configured limit is 1 should be hard (not lightning strike / cosmic radiaion hard, but "happens once a year" hard).

Also exclude destroyed resources since we never care about them.

Test Plan:
Followed the plan from D14272 and restarted two Harbormaster workers at the same time.

After this patch was applied, they no longer created two different resources (we expect it to be possible for this to happen, just very hard).

We should still be able to force this race by putting something like `sleep(10)` right before the query, then `sleep(10)` right after it. That would prevent the allocators from seeing one another (so they would both think there were no other resources) and push us down the pathway where we exceed the soft limit.

Reviewers: chad, hach-que

Reviewed By: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14274
2015-10-14 06:18:10 -07:00
epriestley
1bdf225354 Use Drydock authorizations when acquiring leases
Summary:
Ref T9519. When acquiring leases on resources:

  - Only consider resources created by authorized blueprints.
  - Only consider authorized blueprints when creating new resources.
  - Fail with a tailored error if no blueprints are allowed.
  - Fail with a tailored error if missing authorizations are causing acquisition failure.

One somewhat-substantial issue with this is that it's pretty hard to figure out from the Harbormaster side. Specifically, the Build step UI does not show field value anywhere, so the presence of unapproved blueprints is not communicated. This is much more clear in Drydock. I'll plan to address this in future changes to Harbormaster, since there are other related/similar issues anyway.

Test Plan: {F872527}

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T9519

Differential Revision: https://secure.phabricator.com/D14254
2015-10-12 17:02:35 -07:00
epriestley
2f6d3119f5 Rough cut of "Blueprint Authorizations"
Summary:
Ref T9519. This is like 80% of the way there and doesn't fully work yet, but roughly shows the shape of things to come. Here's how it works:

First, there's a new custom field type for blueprints which works like a normal typeahead but has some extra logic. It's implemented this way to make it easy to add to Blueprints in Drydock and Build Plans in Harbormaster. Here, I've added a "Use Blueprints" field to the "WorkingCopy" blueprint, so you can control which hosts the working copies are permitted to allocate on:

{F869865}

This control has a bit of custom rendering logic. Instead of rendering a normal list of PHIDs, it renders an annotated list with icons:

{F869866}

These icons show whether the blueprint on the other size of the authorization has approved this object. Once you have a green checkmark, you're good to go.

On the blueprint side, things look like this:

{F869867}

This table shows all the objects which have asked for access to this blueprint. In this case it's showing that one object is approved to use the blueprint since I already approved it, but by default new requests come in here as "Authorization Requested" and someone has to go approve them.

You approve them from within the authorization detail screen:

{F869868}

You can use the "Approve" or "Decline" buttons to allow or prevent use of the blueprint.

This doesn't actually do anything yet -- objects don't need to be authorized in order to use blueprints quite yet. That will come in the next diff, I just wanted to get the UI in reasonable shape first.

The authorization also has a second piece of state, which is whether the request from the object is active or inactive. We use this to keep track of the authorization if the blueprint is (maybe temporarily) deleted.

For example, you might have a Build Plan that uses Blueprints A and B. For a couple days, you only want to use A, so you remove B from the "Use Blueprints: ..." field. Later, you can add B back and it will connect to its old authorization again, so you don't need to go re-approve things (and if you're declined, you stay declined instead of being able to request authorization over and over again). This should make working with authorizations a little easier and less labor intensive.

Stuff not in this diff:

  - Actually preventing any allocations (next diff).
  - Probably should have transactions for approve/decline, at least, at some point, so there's a log of who did approvals and when.
  - Maybe should have a more clear/loud error state when no blueprints are approved?
  - Should probably restrict the typeahead to specific blueprint types.

Test Plan:
  - Added the field.
  - Typed some stuff into it.
  - Saw the UI update properly.
  - Approved an authorization.
  - Declined an authorization.
  - Saw active authorizations on a blueprint page.
  - Didn't see any inactive authroizations there.
  - Clicked "View All Authorizations", saw all authorizations.

Reviewers: chad, hach-que

Reviewed By: chad

Maniphest Tasks: T9519

Differential Revision: https://secure.phabricator.com/D14251
2015-10-10 07:15:25 -07:00
epriestley
ee937e99fb Fix unbounded expansion of allocating resource pool
Summary:
Ref T9252. I think there's a more complex version of this problem discussed elsewhere, but here's what we hit today:

  - 5 commits land at the same time and trigger 5 builds.
  - All of them go to acquire a working copy.
  - Working copies have a limit of 1 right now, so 1 of them gets the lease on it.
  - The other 4 all trigger allocation of //new// working copies. So now we have: 1 active, leased working copy and 4 pending, leased working copies.
  - The 4 pending working copies will never activate without manual intervention, so these 4 builds are stuck forever.

To fix this, prevent WorkingCopies from giving out leases until they activate. So now the leases won't acquire until we know the working copy is good, which solves the first problem.

However, this creates a secondary problem:

  - As above, all 5 go to acquire a working copy.
  - One gets it.
  - The other 4 trigger allocations, but no longer acquire leases. This is an improvement.
  - Every time the leases update, they trigger another allocation, but never acquire. They trigger, say, a few thousand allocations.
  - Eventually the first build finishes up and the second lease acquires the working copy. After some time, all of the builds finish.
  - However, they generated an unboundedly large number of pending working copy resources during this time.

This is technically "okay-ish", in that it did work correctly, it just generated a gigantic mess as a side effect.

To solve this, at least for now, provide a mechanism to impose allocation rate limits and put a cap on the number of allocating resources of a given type. As hard-coded, this the greater of "1" or "25% of the active resources in the pool".

So if there are 40 working copies active, we'll start allocating up to 10 more and then cut new allocations off until those allocations get sorted out. This prevents us from getting runaway queues of limitless size.

This also imposes a total active working copy resource limit of 1, which incidentally also fixes the problem, although I expect to raise this soon.

These mechanisms will need refinement, but the basic idea is:

  - Resources which aren't sure if they can actually activate should wait until they do activate before allowing leases to acquire them. I'm fairly confident this rule is a reasonable one.
  - Then we limit how many bookkeeping side effects Drydock can generate once it starts encountering limits.

Broadly, some amount of mess is inevitable because Drydock is allowed to try things that might not work. In an extreme case we could prevent this mess by setting all these limits at "1" forever, which would degrade Drydock to effectively be a synchronous, blocking queue.

The idea here is to put some amount of slack in the system (more than zero, but less than infinity) so we get the performance benefits of having a parallel, asyncronous system without a finite, manageable amount of mess.

Numbers larger than 0 but less than infinity are pretty tricky, but I think rules like "X% of active resources" seem fairly reasonable, at least for resources like working copies.

Test Plan:
Ran something like this:

```
for i in `seq 1 5`; do sh -c '(./bin/harbormaster build --plan 10 rX... &) &'; done;
```

Saw 5 plans launch, acquire leases, proceed in an orderly fashion, and eventually finish successfully.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14236
2015-10-05 15:59:16 -07:00
epriestley
4cf1270ecd In Harbormaster, make sure artifacts are destroyed even if a build is aborted
Summary:
Ref T9252. Currently, Harbormaster and Drydock work like this in some cases:

  # Queue a lease for activation.
  # Then, a little later, save the lease PHID somewhere.
  # When the target/resource is destroyed, destroy the lease.

However, something can happen between (1) and (2). In Drydock this window is very short and the "something" would have to be a lighting strike or something similar, but in Harbormaster we wait until the resource activates to do (2) so the window can be many minutes long. In particular, a user can use "Abort Build" during those many minutes.

If they do, the target is destroyed but it doesn't yet have a record of the artifact, so the artifact isn't cleaned up.

Make these things work like this instead:

  # Create a new lease and pre-generate a PHID for it.
  # Save that PHID as something that needs to be cleaned up.
  # Queue the lease for activation.
  # When the target/resource is destroyed, destroy the lease if it exists.

This makes sure there's no step in the process where we might lose track of a lease/resource.

Also, clean up and standardize some other stuff I hit.

Test Plan:
  - Stopped daemons.
  - Restarted a build in Harbormaster.
  - Stepped through the build one stage at a time using `bin/worker execute ...`.
  - After the lease was queued, but before it activated, aborted the build.
  - Processed the Harbormaster side of things only.
  - Saw the lease get destroyed properly.

Reviewers: chad, hach-que

Reviewed By: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14234
2015-10-05 05:58:53 -07:00
epriestley
4496176924 Add staging area support to Harbormaster/Drydock + various fixes
Summary:
Ref T9252. This primarily allows Harbormaster to request (and Drydock to fulfill) working copies with a patch from a staging area. Doing this means we can do builds on in-review changes from `arc diff`.

This is a little cobbled-together but should basically work.

Also fix some other issues:

  - Yielded, awakend workers are fine to update but could complain.
  - We can't log slot lock failures to resources if we don't end up saving them.
  - Killing the transaction would wipe out the log.
  - Fix some TODOs, etc.

Test Plan: Ran Harbormaster builds on a local revision.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14214
2015-10-01 16:55:01 -07:00
epriestley
d4a0b1c870 Remove names from Drydock resources
Summary:
Ref T9252. Long ago you sometimes manually created resources, so they had human-enterable names. However, users never make resources manually any more, so this field isn't really useful any more.

In particular, it means we write a lot of untranslatable strings like "Working Copy" to the database in the default locale. Instead, do the call at runtime so resource names are translatable.

Also clean up a few minor things I hit while kicking the tires here.

It's possible we might eventually want to introduce a human-choosable label so you can rename your favorite resources and this would just be a default name. I don't really have much of a use case for that yet, though, and I'm not sure there will ever be one.

Test Plan:
  - Restarted a Harbormaster build, got a clean build.
  - Released all leases/resources, restarted build, got a clean build with proper resource names.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14213
2015-10-01 08:13:43 -07:00
epriestley
e589d15231 Improve error and exception handling for Drydock resources
Summary:
Ref T9252. Currently, error handling behavior isn't great and a lot of errors aren't dealt with properly. Try to improve this by making default behaviors better:

  - Yields, slot lock exceptions, and aggregate or proxy exceptions containing an excpetion of these types turn into yields.
  - All other exceptions are considered permanent failures. They break the resource and

This feels a little bit "magical" but I want to try to get the default behaviors to align reasonably well with expectations so that blueprints mostly don't need to have a ton of error handling. This will probably need at least some refinement down the road, but it's a reasonable rule for all exception/error conditions we currently have.

Test Plan: I did a clean build, but haven't vetted this super thoroughly. Next diff will do the same thing to leases, then I'll work on stabilizing this code better.

Reviewers: chad, hach-que

Reviewed By: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14211
2015-10-01 08:12:51 -07:00
epriestley
6b775e6090 Add more Drydock log types and some additional logging
Summary: Ref T9252. Add a bit more logging and improve some behaviors.

Test Plan: Restarted a build, got a good result.

Reviewers: chad, hach-que

Reviewed By: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14210
2015-10-01 08:11:42 -07:00
epriestley
2ef5b5321d Move Drydock logs to PHIDs and increased structure
Summary:
Ref T9252. Several general changes here:

  - Moves logs to use PHIDs instead of IDs. This generally improves flexibility (for example, it's a lot easier to render handles).
  - Adds `blueprintPHID` to logs. Although you can usually figure this out from the leasePHID or resourcePHID, it lets us query relevant logs on Blueprint views.
  - Instead of making logs a top-level object, make them strictly a sub-object of Blueprints, Resources and Leases. So you go Drydock > Lease > Logs, etc., to get to logs.
    - I might restore the "everything" view eventually, but it doesn't interact well with policies and I'm not sure it's very useful. A policy-violating `bin/drydock log` might be cleaner.
  - Policy-wise, we always show you that logs exist, we just don't show you log content if it's about something you can't see. This is similar to seeing restricted handles in other applications.
  - Instead of just having a message, give logs "type" + "data". This will let logs be more structured and translatable. This is similar to recent changes to Herald which seem to have worked well.

Test Plan:
Added some placeholder log writes, viewed those logs in the UI.

{F855199}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14196
2015-10-01 08:06:23 -07:00
epriestley
9d997df964 Reset Drydock git working copies better
Summary: Ref T9252. We're currently resetting to the local branch, but should be resetting to the origin branch.

Test Plan: Restarted a build, had it run `git show`, saw proper HEAD.

Reviewers: chad

Reviewed By: chad

Subscribers: hach-que

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14194
2015-09-30 07:45:02 -07:00
epriestley
33be8f719f Allow WorkingCopy resources to have multiple working copies
Summary:
Ref T9252. For building Phabricator itself, we need to have `libphutil/`, `arcanist/` and `phabricator/` next to one another on disk.

Expand the Drydock WorkingCopy resource so that it can have multiple repositories if the caller needs them.

I'm not sure if I'm going to put the actual config for this in Harbormaster or Drydock yet, but the WorkingCopy resource itself should work the same way in either case.

Test Plan: Restarted a Harbormaster build which leases a working copy, saw it build as expected.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14180
2015-09-28 09:35:58 -07:00
epriestley
e117ace8c7 Convert Drydock lease and resource constants to strings
Summary:
Ref T9252. Drydock currently uses integer statuses, but there's no reason for this (they don't need to be ordered) and it makes debugging them, working with them, future APIs, etc., more cumbersome.

Switch to string instead.

Also rename `STATUS_OPEN` to `STATUS_ACTIVE` and `STATUS_CLOSED` to `STATUS_RELEASED` for consistency. This makes resources and leases have more similar states, and gives resource states more accurate names.

Test Plan: Browsed web UI, grepped for changed constants, applied patch, inspected database.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14153
2015-09-24 07:57:05 -07:00
epriestley
fcb6d1e2fa Strip some obsolete code out of Drydock
Summary:
Ref T9252. This simplifies some Drydock code.

Most of this code relates to the old notion of Drydock being able to enumerate all the tasks it needs to complete in order to acquire a lease. The code has stepped back from this, since it's unnecessary, the queue is more powerful than it used to be, and it would be a lot of work to keep track of.

The ~only thing that should ever wait for leases in modern code is `bin/drydock lease`, and it's fine for it to just sit there sleeping, so this just does that.

This reduces the granularity of logging, but I'll address that separately in future logging-focused changes.

Test Plan: Used `bin/drydock lease` to acquire a lease, saw it acquire cleanly.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T9252

Differential Revision: https://secure.phabricator.com/D14147
2015-09-23 13:21:41 -07:00
epriestley
1f311d64c6 Give Drydock resources and leases a real "destroy" lifecycle phase
Summary: Ref T9252. Some leases or resources may need to remove data, tear down VMs, etc., during cleanup. After they are released, queue a "destroy" phase for performing teardown.

Test Plan:
  - Used `bin/drydock lease ...` to create a working copy lease.
  - Used `bin/drydock release-lease` and `bin/drydock release-resource` to release the lease and then the working copy and host.
  - Saw working copy and host get destroyed and cleaned up properly.

Reviewers: hach-que, chad

Reviewed By: chad

Maniphest Tasks: T6569, T9252

Differential Revision: https://secure.phabricator.com/D14144
2015-09-23 11:20:20 -07:00
epriestley
f1119ffcf5 Support working copies and separate allocate + activate steps for resources/leases in Drydock
Summary:
Ref T9253. For resources and leases that need to do something which takes a lot of time or requires waiting, allow them to allocate/acquire first and then activate later.

When we allocate a resource or acquire a lease, the blueprint can either activate it immediately (if all the work can happen quickly/inline) or activate it later. If the blueprint activates it later, we queue a worker to handle activating it.

Rebuild the "working copy" blueprint to work with this model: it allocates/acquires and activates in a separate step, once it is able to acquire a host.

Test Plan: With some power of imagination, brought up a bunch of working copies with `bin/drydock lease --type working-copy ...`

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Maniphest Tasks: T9253

Differential Revision: https://secure.phabricator.com/D14127
2015-09-21 04:46:24 -07:00
epriestley
6a0eb9d84b Allow AlmanacHost blueprints to build a meaningful CommandInterface
Summary: Ref T9253. Provide a meaningful command interface for Almanac hosts.

Test Plan:
Configued and leased a real host (`sbuild001.phacility.net`) and ran a command on it.

```
$ ./bin/drydock command --lease 90 -- ls /
bin
boot
core
dev
etc
home
initrd.img
lib
lib64
lost+found
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
vmlinuz
```

Reviewers: chad, hach-que

Reviewed By: chad, hach-que

Maniphest Tasks: T9253

Differential Revision: https://secure.phabricator.com/D14126
2015-09-21 04:46:02 -07:00
epriestley
3ac99006bf Implement optimistic "slot locks" in Drydock
Summary:
See discussion in D10304. There's a lot of context there, but the general idea is:

  - Blueprints should manage locks in a granular way during the actual allocation/acquisition phase.
  - Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases.

The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work:

```
...
->needSlotLock("mylock(PHID-XYZQ-...)")
...
```

When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do.

If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this:

```
mylock(whatever).slot(2)
```

Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks.

This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing.

(The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.)

Test Plan:
To create a race where the same binding is allocated as a resource twice:

  - Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated.
  - (Comment out slot lock acquisition if you have this patch.)
  - Run `bin/drydock lease ...` in two windows, within 10 seconds of one another.

This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free.

To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Differential Revision: https://secure.phabricator.com/D14118
2015-09-21 04:45:25 -07:00
epriestley
6e03419593 Implement a rough AlmanacService blueprint in Drydock
Summary:
Ref T9253. Broadly, this realigns Allocator behavior to be more consistent and straightforward and amenable to intended future changes.

This attempts to make language more consistent: resources are "allocated" and leases are "acquired".

This prepares for (but does not implement) optimistic "slot locking", as discussed in D10304. Although I suspect some blueprints will need to perform other locking eventually, this does feel like a good fit for most of the locking blueprints need to do.

In particular, I've made the blueprint operations on `$resource` and `$lease` objects more purposeful: they need to invoke an activator on the appropriate object to be implemented correctly. Before they invoke this activator method, they configure the object. In a future diff, this configuration will include specifying slot locks that the lease or resource must acquire. So the API will be something like:

  $lease
    ->setActivateWhenAcquired(true)
    ->needSlotLock('x')
    ->needSlotLock('y')
    ->acquireOnResource($resource);

In the common case where slot locks are a good fit, I think this should make correct blueprint implementation very straightforward.

This prepares for (but does not implement) resources and leases which need significant setup steps. I've basically carved out two modes:

  - The "activate immediately" mode, as here, immediately opens the resource or activates the lease. This is appropriate if little or no setup is required. I expect many leases to operate in this mode, although I expect many resources will operate in the other mode.
  - The "allocate now, activate later" mode, which is not fully implemented yet. This will queue setup workers when the allocator exits. Overall, this will work very similarly to Harbormaster.
  - This new structure makes it acceptable for blueprints to sleep as long as they want during resource allocation and lease acquisition, so long as they are not waiting on anything which needs to be completed by the queue. Putting a `sleep(15 * 60)` in your EC2Blueprint to wait for EC2 to bring a machine up will perform worse than using delayed activation, but won't deadlock the queue or block any locks.

Overall, this flow is more similar to Harbormaster's flow. Having consistency between Harbormaster's model and Drydock's model is good, and I think Harbormaster's model is also simply much better than Drydock's (what exists today in Drydock was implemented a long time ago, and we had more support and infrastructure by the time Harbormaster was implemented, as well as a more clearly defined problem).

The particular strength of Harbormaster is that objects always (or almost always, at least) have a single, clearly defined writer. Ensuring objects have only one writer prevents races and makes reasoning about everything easier.

Drydock does not currently have a clearly defined single writer, but this moves us in that direction. We'll probably need more primitives eventually to flesh this out, like Harbormaster's command queue for messaging objects which you can't write to.

This blueprint was originally implemented in D13843. This makes a few changes to the blueprint itself:

  - A bunch of code from that (e.g., interfaces) doesn't exist yet.
  - I let the blueprint have multiple services. This simplifies the code a little and seems like it costs us nothing.

This also removes `bin/drydock create-resource`, which no longer makes sense to expose. It won't get locking, leasing, etc., correct, and can not be made correct.

NOTE: This technically works but doesn't do anything useful yet.

Test Plan: Used `bin/drydock lease --type host` to acquire leases against these blueprints.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Subscribers: Mnkras

Maniphest Tasks: T9253

Differential Revision: https://secure.phabricator.com/D14117
2015-09-21 04:43:53 -07:00
epriestley
bb28f94f9b Reduce garbage-level of Drydock Allocator implementation
Summary:
Ref T9253. The Drydock allocator is very pseudocodey right now. Particularly, it was written before Blueprints were concrete.

Reorganize it to make its responsibilities and error handling behaviors more clear.

In particular, the Allocator does not manage locks. It's primarily trying to reject allocations which can not possibly work. Blueprints are responsible for locks. See some discussion in D10304.

NOTE: This code probably doesn't work as written, see future diffs.

Test Plan: See future diffs.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Subscribers: cburroughs

Maniphest Tasks: T9253

Differential Revision: https://secure.phabricator.com/D14114
2015-09-21 04:43:25 -07:00
epriestley
c44f9d80de Remove DrydockPreallocatedHostBlueprintImplementation
Summary:
Ref T9253. This comes from a time before Almanac. Now that we have Almanac, it makes much more sense to put this logic there than to try to put it in Drydock itself.

Remove the preallocated host blueprint, a relic of a bygone time.

Test Plan: Grepped for callsites.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Maniphest Tasks: T9253

Differential Revision: https://secure.phabricator.com/D14110
2015-09-21 04:41:40 -07:00
Joshua Spence
f695dcea9e Use PhutilClassMapQuery
Summary: Use `PhutilClassMapQuery` where appropriate.

Test Plan: Browsed around the UI to verify things seemed somewhat working.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D13429
2015-07-07 22:51:57 +10:00
J Rhodes
70a82017b3 Drop Windows-specific escaping in preallocated host
Summary: This drops the Windows-specific escaping code for the creation of directories when acquiring a lease.  This is basically the change from D10378 without the other, no longer necessary changes.

Test Plan: This code hasn't been run in a production environment for a while (any instances of Phabricator using Drydock / Harbormaster with Windows have had this code removed by the D10378 patch for a while).

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin

Projects: #drydock

Differential Revision: https://secure.phabricator.com/D13341
2015-06-19 15:06:32 +10:00
Joshua Spence
1239cfdeaf Add a bunch of tests for subclass implementations
Summary: Add a bunch of tests to ensure that subclasses behave.

Test Plan: `arc unit`

Reviewers: eadler, #blessed_reviewers, epriestley

Reviewed By: eadler, #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D13272
2015-06-15 18:13:27 +10:00
Joshua Spence
b6d745b666 Extend from Phobject
Summary: All classes should extend from some other class. See D13275 for some explanation.

Test Plan: `arc unit`

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D13283
2015-06-15 18:02:27 +10:00
Joshua Spence
f47e69c015 Mark some strings for translation
Summary: Add some more `pht`izations.

Test Plan: Eyeball it.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D13200
2015-06-09 23:06:52 +10:00
Joshua Spence
36e2d02d6e phtize all the things
Summary: `pht`ize a whole bunch of strings in rP.

Test Plan: Intense eyeballing.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D12797
2015-05-22 21:16:39 +10:00
Joshua Spence
f3d382cec4 Fix some method signatures
Summary: Fix some method signatures so that arguments with default values are at the end of the argument list (see D12418).

Test Plan: Eyeballed the callsites.

Reviewers: epriestley, #blessed_reviewers, hach-que

Reviewed By: epriestley, #blessed_reviewers, hach-que

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D12782
2015-05-19 06:58:43 +10:00
Joshua Spence
acb45968d8 Use __CLASS__ instead of hard-coding class names
Summary: Use `__CLASS__` instead of hard-coding class names. Depends on D12605.

Test Plan: Eyeball it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D12806
2015-05-14 07:21:13 +10:00
Bob Trahan
53d7868c6d Policy - convert Drydock query for repository to policy-based query
Summary: Ref T7094. Switch to OmnipotentUser policy-based query since this is usually done offline, etc.

Test Plan: pretty simple code change so I just have my fingers crossed while I am typing this

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, epriestley

Maniphest Tasks: T7094

Differential Revision: https://secure.phabricator.com/D11655
2015-02-03 12:28:11 -08:00
Joshua Spence
3cf9a5820f Minor formatting changes
Summary: Apply some autofix linter rules.

Test Plan: `arc lint` and `arc unit`

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D10585
2014-10-08 08:39:49 +11:00
James Rhodes
ddfa5cbdf6 Remove setWorkingDirectory call on SFTP interface
Summary: I derped on this; the SFTP interface doesn't have setWorkingDirectory because it implements DrydockFilesystemInterface and not DrydockCommandInterface.  So when you use the Upload File build step, the daemon will crash due to an undefined method.

Test Plan: Tested on my live server.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D10351
2014-08-27 08:27:46 +10:00
James Rhodes
2a4a30044b Set the working directory when providing SSH / SFTP interfaces
Summary: Ref T1049.  Set the working directory when executing commands on Drydock hosts.  Without this set, they execute in the user's default home directory.

Test Plan: Ran a build and saw the correct working directory when running `pwd`.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: CanadianBadass, epriestley, Korvin

Maniphest Tasks: T1049

Differential Revision: https://secure.phabricator.com/D10293
2014-08-22 14:40:31 +10:00
James Rhodes
df7fb09845 Remove localhost Drydock allocator
Summary: This has never been enabled by default, and isn't safe.  Remove it since people can use preallocated or EC2 hosts.

Test Plan: Removed it; didn't see it appear on the "Create Blueprint" page.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D10287
2014-08-20 08:29:32 +10:00
James Rhodes
e48aaa563a Allow Drydock blueprints to define and use custom fields
Summary: This allows Drydock blueprints to define custom fields for blueprint settings.

Test Plan: Pulled out of EC2 allocator diff.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D10224
2014-08-12 08:39:00 +10:00
Joshua Spence
8756d82cf6 Remove @group annotations
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.

Test Plan: Eye-ball it.

Reviewers: #blessed_reviewers, epriestley, chad

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9859
2014-07-10 08:12:48 +10:00
Joshua Spence
0a62f13464 Change double quotes to single quotes.
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.

Test Plan: Eyeballed it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9431
2014-06-09 11:36:50 -07:00
epriestley
962aca664f Add names to Drydock blueprints
Summary:
Ref T2015. Adds human-readable names to Drydock blueprints.

Also the new patches stuff is so much nicer.

Test Plan: Edited, created, and reviewed migrated blueprints.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7918
2014-01-09 10:56:34 -08:00
epriestley
5502fca5f4 Make Drydock blueprint create workflow somewhat more standard
Summary: Ref T2015. This workflow is a little weird (runs in a dialog, no edit-before-create step, lots of internal classnames). Make it a little more standard.

Test Plan: See screenshots.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7908
2014-01-08 14:12:27 -08:00
epriestley
9b0fa5747b Make Drydock more broadly aware of policies
Summary:
Ref T2015. Moves a bunch of raw object loads into modern policy-aware queries.

Also straightens out the Log and Lease policies a little bit: there are legitimate states where these objects are not attached to a resource (particularly, while a lease is being acquired). Handle these more gracefully.

Test Plan: Lint / browsed stuff.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7836
2013-12-27 13:15:19 -08:00
epriestley
6b2d480fe7 Make DrydockLease a policy-aware object
Summary: Ref T2015. DrydockLease predates widespread adoption of policies. Make it -- and its query -- policy aware.

Test Plan: Browsed leases from the web UI. Grepped for callsites.

Reviewers: btrahan

Reviewed By: btrahan

CC: hach-que, aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7826
2013-12-26 10:41:36 -08:00
epriestley
d846f6508b Fix some repository URI handling issues in Git and Mercurial
Summary:
See <https://github.com/facebook/phabricator/issues/467>. @dctrwatson also ran into an issue where we were trying to `setPass()` a GitURI.

  - For Git and Mercurial, properly generate credential URIs where relevant.
  - Don't try to `setPass()` on Git-style URIs.

This isn't perfect but should clean things up a bit.

Test Plan: Added unit tests. Lots of `grep`.

Reviewers: btrahan

Reviewed By: btrahan

CC: dctrwatson, aran

Differential Revision: https://secure.phabricator.com/D7759
2013-12-12 09:45:27 -08:00
James Rhodes
dd01535ed6 Implement "Upload Artifact" build step
Summary: This implements a build step for uploading an artifact from a build machine to Phabricator.  It uses SFTP so that it will work on both UNIX and Windows build machines.

Test Plan: Ran an "Upload Artifact" build against a Windows machine (with FreeSSHD installed).  The artifact uploaded to Phabricator, appeared on the build view and the file contents could be viewed from Phabricator.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley

CC: Korvin, epriestley, aran

Maniphest Tasks: T1049

Differential Revision: https://secure.phabricator.com/D7582
2013-12-06 14:11:05 +11:00
James Rhodes
9c6f6043f0 Update preallocated hosts to use Passphrase credentials
Summary: Depends on D7695.  This updates preallocated hosts to use Passphrase credentials.  Due to the way SSH private key text credentials work (the TempFile disappears before SSH commands can be executed), this only supports file-based private keys at the moment.

Test Plan:
Created a Passphrase credential for a file-based SSH key.  Allocated a resource with:

```
bin/drydock create-resource --blueprint 1 --name "My Linux Host" --attributes platform=linux,host=localhost,port=22,path=/var/drydock,credential=2
```

and successfully leased it.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley

CC: Korvin, epriestley, aran

Maniphest Tasks: T4111, T1049

Differential Revision: https://secure.phabricator.com/D7697
2013-12-05 08:17:23 +11:00