Summary:
Ref T6183. Ref T10054. Historically, only members could watch projects because there were some weird special cases with policies. These policy issues have been resolved and Herald is generally powerful enough to do equivalent watches on most objects anyway.
Also puts a "Watch Project" button on the feed panel to make the behavior and meaning more obvious.
Test Plan:
- Watched a project I was not a member of.
- Clicked the feed watch/unwatch button.
{F1064909}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T6183, T10054
Differential Revision: https://secure.phabricator.com/D15063
Summary: See Q266.
Test Plan: Created a bulk job, clicked "Details" instead of "Confirm", clicked "Continue" to get back to confirmation dialog.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D14985
Summary: Ref T9979. Convert all DestructionEngine behaviors to extensions.
Test Plan:
{F1033244}
Destroyed an object, verifying:
- Herald transcripts were destroyed;
- edges were destroyed;
- flags were destroyed;
- tokens were destroyed;
- transactions were destroyed;
- worker tasks were cancelled.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9979
Differential Revision: https://secure.phabricator.com/D14832
Summary:
Ref T9994. This fixes the first issue discussed on that task, which is that when a merge fails after "arc land", we would not clean up all the leases properly.
Specifically, when a merge fails, we use `queueTask()` to schedule a followup task. This followup destroys the lease and frees the underlying resource.
However, the default behavior of `queueTask()` is to //not queue tasks// if the parent task fails. This is a reasonable, safe behavior that was originally introduced in D8774, where it kept us from sending too much mail if a task did "send some mail" and then failed a little later on and got retried.
Since I think the default behavior is correct, I just special cased the behavior for Drydock to make it queue even on failure. These are the only types of followup tasks we currently want to queue on main task failure.
(It's possible that future Blueprints might want some kind of more specialized behavior, where some tasks queue only on success, but we can cross that bridge when we come to it.)
Test Plan:
- See T9994#149878 for test case setup.
- I ran that test case again with this patch, and saw the followup task queue properly in the `--trace` log, a correspoinding update task show up in `/daemon/`, and the lease get destroyed when I ran it a moment later.
{F1029915}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9994
Differential Revision: https://secure.phabricator.com/D14818
Summary: This logic is flipped.
Test Plan:
- Before change: ran `bin/phd debug task`, saw queries to the config table every second.
- After change: ran `bin/phd debug task`, saw queries to the config table every 10 seconds.
Reviewers: chad, joshuaspence
Reviewed By: chad, joshuaspence
Differential Revision: https://secure.phabricator.com/D14542
Summary:
Fixes T9808.
An instance imported a very large repository, generating approximately 4 million tasks over the course of a few days. A week later, these tasks started expiring and became candidates for garbage collection.
The GC works by deleting 100 rows at at time over and over again. It finds the rows it's going to delete by querying for old rows.
Currently, this query generates a `WHERE dateCreated < X ORDER BY id DESC` query. This query can not efficiently execute using a single key, because it relies on `dateCreated` order to find the rows, then on `id` order to sort them. With a table with 4M rows, this is slow.
This would still be OK, except that the query has to execute a lot of times since it only deletes 100 rows each time. Particularly, it needs to execute a total of ~40K times.
Instead, generate `WHERE dateCreated < X ORDER BY dateCreated DESC, id DESC`. This should have the same effect in general and the GC definitely doesn't care about the difference, but it should be more efficient at large scales.
Test Plan:
I had to `TRUNCATE` the problem table so I don't have a perfect repro to completely convincingly test this anymore. Both queries behave fine at small scales, which is why we haven't seen this before.
I was able to run the newer query in production before I nuked the table and have it complete in a reasonable amount of time, while the old query hung longer than I wanted to wait (several minutes?). The query plan for the new query was also a good one, while the query plan for the old query was terrible.
I loaded the daemon console and ran `bin/garbage collect --collector worker.tasks --trace`. I verified the queries looked reasonable and produced reasonable results in production.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9808
Differential Revision: https://secure.phabricator.com/D14505
Summary: Fixes T7053. Depends on D14452.
Test Plan:
Created a custom daemon which dumps out the config hash (by querying `PhabricatorEnv::calculateEnvironmentHash()`). Ran this daemon with `./bin/phd debug PhabricatorDebugDaemon` and saw the config hash update within 30 seconds.
{P1886}
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: Korvin
Maniphest Tasks: T7053
Differential Revision: https://secure.phabricator.com/D14458
Summary:
Ref T9252. Currently, `queueTask()` accepts `$priority` as its third argument. Allow it to take a full range of `$options` instead. This API just never got updated after we expanded avialable options.
Arguably this whole API should be some kind of "TaskQueueRequest" object but I'll leave that for another day.
Test Plan:
- Grepped for `queueTask()` and verified no other callsites are affected by this API change.
- Ran some daemons.
- See also next diff.
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14235
Summary:
Ref T9252. Currently, Harbormaster and Drydock work like this in some cases:
# Queue a lease for activation.
# Then, a little later, save the lease PHID somewhere.
# When the target/resource is destroyed, destroy the lease.
However, something can happen between (1) and (2). In Drydock this window is very short and the "something" would have to be a lighting strike or something similar, but in Harbormaster we wait until the resource activates to do (2) so the window can be many minutes long. In particular, a user can use "Abort Build" during those many minutes.
If they do, the target is destroyed but it doesn't yet have a record of the artifact, so the artifact isn't cleaned up.
Make these things work like this instead:
# Create a new lease and pre-generate a PHID for it.
# Save that PHID as something that needs to be cleaned up.
# Queue the lease for activation.
# When the target/resource is destroyed, destroy the lease if it exists.
This makes sure there's no step in the process where we might lose track of a lease/resource.
Also, clean up and standardize some other stuff I hit.
Test Plan:
- Stopped daemons.
- Restarted a build in Harbormaster.
- Stepped through the build one stage at a time using `bin/worker execute ...`.
- After the lease was queued, but before it activated, aborted the build.
- Processed the Harbormaster side of things only.
- Saw the lease get destroyed properly.
Reviewers: chad, hach-que
Reviewed By: hach-que
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14234
Summary:
Fixes T9494. This:
- Removes all the random GC.x.y.z config.
- Puts it all in one place that's locked and which you use `bin/garbage set-policy ...` to adjust.
- Makes every TTL-based GC configurable.
- Simplifies the code in the actual GCs.
Test Plan:
- Ran `bin/garbage collect` to collect some garbage, until it stopped collecting.
- Ran `bin/garbage set-policy ...` to shorten policy. Saw change in web UI. Ran `bin/garbage collect` again and saw it collect more garbage.
- Set policy to indefinite and saw it not collect garabge.
- Set policy to default and saw it reflected in web UI / `collect`.
- Ran `bin/phd debug trigger` and saw all GCs fire with reasonable looking queries.
- Read new docs.
{F857928}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9494
Differential Revision: https://secure.phabricator.com/D14219
Summary:
Ref T9252. This primarily allows Harbormaster to request (and Drydock to fulfill) working copies with a patch from a staging area. Doing this means we can do builds on in-review changes from `arc diff`.
This is a little cobbled-together but should basically work.
Also fix some other issues:
- Yielded, awakend workers are fine to update but could complain.
- We can't log slot lock failures to resources if we don't end up saving them.
- Killing the transaction would wipe out the log.
- Fix some TODOs, etc.
Test Plan: Ran Harbormaster builds on a local revision.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14214
Summary:
Ref T9252. This is the same as D14201, but for lease stuff instead of resource stuff.
This one is a little heavier but still feels pretty reasonable to me at the end of the day (worker is <1K lines and has a ton of comment stuff).
Also fixes a few random bugs I hit in the task queue.
Test Plan:
- Restarted some Harbormaster builds, saw them go through cleanly.
- Released pre-activation resources/leases.
- Probably still kinda buggy but I'll iron the details out over time.
Logs are starting to look somewhat plausible:
{F855747}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14202
Summary:
Ref T9123. To run upstream builds in Harbormaster/Drydock, we need to be able to check out `libphutil`, `arcanist` and `phabricator` next to one another.
This adds an "Also Clone: ..." field to Harbormaster working copy build steps so I can type all three repos into it and get a proper clone with everything we need.
This is somewhat upstream-centric and a bit narrow, but I don't think it's totally unreasonable, and most of the underlying stuff is relatively general.
This adds some more typechecking and improves data/type handling for custom fields, too. In particular, it prevents users from entering an invalid/restricted value in a field (for example, you can't "Also Clone" a repository you don't have permission to see).
Test Plan: Restarted build, got a Drydock resource with multiple repositories in it.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9123
Differential Revision: https://secure.phabricator.com/D14183
Summary:
Ref T9252. Currently, Harbormaster does this when trying to acquire a working copy:
- Ask for a working copy.
- Yield for 15 seconds.
- Check if we have a working copy yet.
That's OK, but Drydock takes ~1s to acquire a working copy lease if a resource is already available, so we end up doing this:
- T+0: Ask for a working copy.
- T+0: Yield for 15 seconds.
- T+1: Working copy lease activates.
- T+15: Working copy lease is used.
- T+16: Build finishes.
So we end up spending about 2 seconds doing work and 14 seconds sleeping.
One way to fix this would be to fiddle with the yield duration, so we yield for 1, 2, 4, ... seconds or something. This probably isn't a bad idea for longer leases (i.e., wait for 15, 30, 45 ... seconds or similar) but it implies a lot of churn for short leases.
Instead, let tasks "awaken" other tasks when they complete. The "awaken" operation means: if a task is in a yielded state (no failures, no owner, explicitly yielded, future expires time), pretend it only yielded until right now instead of whenever it really yielded to.
Basically, this rewrites history so that even though Harbormaster did a `yield(15)`, we pretend it did a `yield(4)` after we activate the lease if lease activation took 4 seconds.
If this misses, it's fine: we fall back to the normal yield behavior and things move forward normally a few seconds later.
If it hits, we get a more nimble process pretty cleanly.
Test Plan:
- Restarted a build plan (lease working copy + run `ls`) with this patch no-op'd, took about 16 seconds.
- Restarted a build plan with this patch active, took about 1 second.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14178
Summary:
Fixes T6569. This implements an expiry mechanism for Drydock resources which parallels the mechanism for leases.
A few things are missing that we'll probably need in the future:
- An "EXPIRES" command to update the expiration time. This would let resources be permanent while leased, then expire after, say, 24 hours without any leases.
- A callback like `shouldActuallyExpireRightNow()` for resources and leases that lets them decide not to expire at the last second.
- A callback like `didAcquireLease()` for resource blueprints, to parallel `didReleaseLease()`, letting them clear or extend their timer.
However, this stuff would mostly just let us tune behaviors, not really open up new capabilities.
Test Plan: Changed host resources to expire after 60 seconds, leased one, saw it vanish 60 seconds later.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569
Differential Revision: https://secure.phabricator.com/D14176
Summary: Ref T6569. If a lease is activated with an expiration date, schedule a task to try to clean it up after that time.
Test Plan:
- Used `bin/drydock lease ... --until ...` to activate a lease in the near future.
- Waited for a bit.
- Saw it expire and get destroyed at the scheduled time.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T6569
Differential Revision: https://secure.phabricator.com/D14148
Summary:
Ref T9252. This simplifies some Drydock code.
Most of this code relates to the old notion of Drydock being able to enumerate all the tasks it needs to complete in order to acquire a lease. The code has stepped back from this, since it's unnecessary, the queue is more powerful than it used to be, and it would be a lot of work to keep track of.
The ~only thing that should ever wait for leases in modern code is `bin/drydock lease`, and it's fine for it to just sit there sleeping, so this just does that.
This reduces the granularity of logging, but I'll address that separately in future logging-focused changes.
Test Plan: Used `bin/drydock lease` to acquire a lease, saw it acquire cleanly.
Reviewers: hach-que, chad
Reviewed By: chad
Maniphest Tasks: T9252
Differential Revision: https://secure.phabricator.com/D14147
Summary:
Ref T9253. For resources and leases that need to do something which takes a lot of time or requires waiting, allow them to allocate/acquire first and then activate later.
When we allocate a resource or acquire a lease, the blueprint can either activate it immediately (if all the work can happen quickly/inline) or activate it later. If the blueprint activates it later, we queue a worker to handle activating it.
Rebuild the "working copy" blueprint to work with this model: it allocates/acquires and activates in a separate step, once it is able to acquire a host.
Test Plan: With some power of imagination, brought up a bunch of working copies with `bin/drydock lease --type working-copy ...`
Reviewers: hach-que, chad
Reviewed By: hach-que, chad
Maniphest Tasks: T9253
Differential Revision: https://secure.phabricator.com/D14127
Summary: Use `PhutilClassMaQuery` instead of `PhutilSymbolLoader`, mostly for consistency. Depends on D13588.
Test Plan: Poked around a bunch of pages.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13589
Summary:
Fixes T7370. Two changes:
- Make the default to show nothing, instead of showing all the data. This is a better default because the data is sometimes sensitive. Workers should have to opt in to revealing it.
- For TargetWorkers, link to the target (technically the build, for now, since there's no dedicated target detail page).
Test Plan: {F698325}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7370
Differential Revision: https://secure.phabricator.com/D13845
Summary: Move some `PhabricatorSearchField` subclasses to be adjacent to the application to which they belong. This seems generally better to me than lumping them all together in the `src/applications/search/field/` directory. I was also wondering if it makes sense to rename these subclasses as `PhabricatorXSearchField` rather than `PhabricatorSearchXField` (as per T5655), but wasn't really sure if these objects are meant to be search-fields, or just fields belonging to the #search application.
Test Plan: N/A.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13374
Summary:
Ref T8637. This does nothing interesting, just has empty scaffolding for a bulk job queue.
Basic idea is that when you do something like a batch edit in Maniphest, we:
- Create a BulkJob with all the details.
- Queue a worker to start the job.
- Send you to a progress bar page for the job.
In the background:
- The "start job" worker creates a ton of Task objects, then queues worker tasks to do the work.
In the foreground:
- Fancy ajax animates the progress bar and it goes wooosh.
In general:
- Big jobs actually work.
- Jobs get logged.
- You can monitor jobs.
- Terrible junk like T8637 should be much harder to write and much easier to catch and diagnose.
Test Plan:
No interesting code/beahavior yet. Clean `storage adjust`.
{F526411}
Reviewers: chad, btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T8637
Differential Revision: https://secure.phabricator.com/D13392
Summary: Fixes T8599. I'm not sure how to reproduce the original issue, but I'm fairly confident that the issue is that the issue is that `execute()` is not called on the query object.
Test Plan: Created a Harbormaster build plan with a single "Lease Host" step. Ran `./bin/harbormaster build --plan 1 D1` from the command line and hit the exception as described in T8599. Applied patch and hit a different exception (which I think is just because I don't know how to use #drydock and #harbormaster).
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: hach-que, epriestley, Korvin
Maniphest Tasks: T8599
Differential Revision: https://secure.phabricator.com/D13335
Summary: All classes should extend from some other class. See D13275 for some explanation.
Test Plan: `arc unit`
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13283
Summary:
Ref T8424. This adds a standard KeyValueCache to serve as a request cache.
In particular, I need to cache Spaces (they are frequently accessed, sometimes by multiple viewers) but not have them survive longer than the scope of one request.
This request cache is explicitly destroyed by each web request and each daemon request.
In the very long term, building this kind of construct supports reusing PHP interpreters to run web requests (see some discussion in T2312).
Test Plan:
- Added and executed unit tests.
- Ran every daemon.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T8424
Differential Revision: https://secure.phabricator.com/D13153
Summary: Use `__CLASS__` instead of hard-coding class names. Depends on D12605.
Test Plan: Eyeball it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: hach-que, Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D12806
Summary:
Ref T4100. Ref T5595. These functions are trivial for now, but move us toward being able to define more default query behavior by default.
Future changes will give these methods meaningful, nontrivial behaviors.
Test Plan: `arc unit --everything`
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5595, T4100
Differential Revision: https://secure.phabricator.com/D12454
Summary:
Ref T4100. Ref T5595.
To support a unified "Projects:" query across all applications, a future diff is going to add a set of "Edge Logic" capabilities to `PolicyAwareQuery` which write the required SELECT, JOIN, WHERE, HAVING and GROUP clauses for you.
With the addition of "Edge Logic", we'll have three systems which may need to build components of query claues: ordering/paging, customfields/applicationsearch, and edge logic.
For most clauses, queries don't currently call into the parent explicitly to get default components. I want to move more query construction logic up the class tree so it can be shared.
For most methods, this isn't a problem, but many subclasses define a `buildWhereClause()`. Make all such definitions protected and consistent.
This causes no behavioral changes.
Test Plan: Ran `arc unit --everything`, which does a pretty through job of verifying this statically.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: yelirekim, hach-que, epriestley
Maniphest Tasks: T4100, T5595
Differential Revision: https://secure.phabricator.com/D12453
Summary:
Ref T7352. Make sure modern `phd stop` can still read the old PID file format and stop the daemons, at least for now.
Without this, `stop` still detects them and tells you to `stop --force`, which works, but this makes things a good deal cleaner.
Test Plan: Ran `phd stop` from master, then `phd stop` from this revision. Saw old daemons stop cleanly.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11873
Summary:
Fixes T7352. This reduces the memory footprint for instances by combining these two similar daemons into one daemon which handles the responsibilities of both.
The fit isn't 100% perfect here but it's pretty close, and the GC daemon is fairly trivial.
Test Plan:
- Adjusted all the numbers to small numbers (5 second sleep, 120 second GC length).
- Added a ton of logging.
- Started trigger daemon.
- Saw it run a GC cycle.
- Saw it reschedule another cycle after 120 seconds (adjusted down from 4 hours).
- Reverted all the logging/small numbers.
- Ran `bin/phd start`, saw stable trigger daemon running.
- Grepped for removed daemon class name.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11872
Summary: Ref T7352. This is pretty straightforward. I renamed `phd.start-taskmasters` to `phd.taskmasters` for clarity.
Test Plan:
- Ran `phd start`, `phd start --autoscale-reserve 0.25`, `phd restart --autoscale-reserve 0.25`, etc.
- Examined PID file to see options were passed.
- I'm defaulting this off (0 reserve) and making it a flag rather than an option because it's a very advanced feature which is probably not useful outside of instancing.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11871
Summary:
Ref T7352. We were previously identifying things by `<daemonClass, overseerPID, startTime>` but that's not unique in a world where one overseer can run multiple daemons.
We already have an internal "daemonID", it just doesn't get written into the DB right now.
Start writing it, then use it to clean up `phd status`.
Test Plan: Ran `phd status`, got more accurate/useful output than previously.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11865
Summary: Ref T7352. This makes `phd stop` and `phd status` produce more reasonable output with the new PID file format.
Test Plan: Ran `phd stop`, `phd status`, etc.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11856
Summary:
Ref T7352. This changes `phd` to pass configuration to overseers over stdin. We still run one overseer per daemon.
The "status" stuff needs some cleanup, but it's mostly just UI/cosmetic.
Test Plan:
- Ran `phd debug`, `phd launch`, `phd start`, `phd status`, `phd stop`, etc.
- Verified PID files write in a reasonable format.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11855
Summary:
Right now, taskmasters on empty queues sleep for 30 seconds. With a default setup (4 taskmasters), this averages out to 7.5 seconds between the time you do anything that queues something and the time that the taskmasters start work on it.
On instances, which currently launch a smaller number of taskmasters, this wait is even longer.
Instead, sleep for the number of seconds that there are taskmasters, with a random offset. This makes the average wait to start a task from an empty queue 1 second, and the average maximum load of an empty queue also one query per second.
On loaded instances this doesn't matter, but this should dramatically improve behavior for less-loaded instances without any real tradeoffs.
Test Plan: Started several taskmasters, saw them jitter out of sync and then use short sleeps to give an empty queue about a 1s delay.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11772
Summary:
Ref T7152. Ref T3554.
- When an administrator clicks "send invites", queue tasks to send the invites.
- Then, actually send the invites.
- Make the links in the invites work properly.
- Also provide `bin/worker execute` to make debugging one-off workers like this easier.
- Clean up some UI, too.
Test Plan:
We now get as far as the exception which is a placeholder for a registration workflow.
{F291213}
{F291214}
{F291215}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T3554, T7152
Differential Revision: https://secure.phabricator.com/D11736