Summary:
Fixes T6702. Ref T3554. Currently, tasks can be cancelled, retried and freed from the web UI by any logged in user.
This isn't appreciably dangerous (I can't come up with a way that a user could do anything security-affecting), but I think I probably intended this to be admin-only, but these actions should move to the CLI anyway.
Move them to the CLI. Lay some groundwork for some future `bin/worker cancel --class SomeTaskClass`, but don't implement that yet.
Test Plan: Used `cancel`, `retry` and `free` from the CLI. Hit all the error/success states.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T3554, T6702
Differential Revision: https://secure.phabricator.com/D10939
Summary:
Ref T6615. Mixing ASC and DESC ordering on a multipart key makes it dramatically less effective (or perhaps totally ineffective).
Reverse the meaning of the `priority` column so it goes in the same direction as the `id` column (both ascending, lower values execute sooner).
Test Plan:
- Queued 1.2M tasks with `bin/worker flood`.
- Processed ~1 task/second with `bin/phd debug taskmaster` before patch.
- Applied patch, took ~5 seconds for ~1.2M rows.
- Processed ~100-200 tasks/second with `bin/phd debug taskmaster` after patch.
- "Next in Queue" query on daemon page dropped from 1.5s to <1ms.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: aklapper, 20after4, epriestley
Maniphest Tasks: T6615
Differential Revision: https://secure.phabricator.com/D10895
Summary: Ref T6615. Ref T3554. We need better tooling around the queue eventually, so start here.
Test Plan: Added 100K+ tasks locally with `bin/worker flood`. Executed some of them with `bin/phd debug taskmaster` (we already have a TestWorker, used in unit tests).
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T3554, T6615
Differential Revision: https://secure.phabricator.com/D10894
Summary:
Fixes an issue with T5336 / D9871. We did 99% of the work here but didn't actually turn on the priority sorting. The unit test passed by default, which didn't catch this.
- Fix the unit test (it failed).
- Fix the query (test now passes).
- Add a "Next in Queue" element to the UI to make this kind of thing easier to spot/understand.
Test Plan: Ran unit test. Viewed "Next in Queue". Queued some tasks, flushed the queue. Web UI tracked the state sensibly.
Reviewers: joshuaspence, btrahan
Reviewed By: btrahan
Subscribers: cburroughs, epriestley
Differential Revision: https://secure.phabricator.com/D10766
Summary:
Ref T1191. Now that the whole database is covered, we don't need to do as much work to build expected schemata. Doing them database-by-database was helpful in converting, but is just reudndant work now.
Instead of requiring every application to build its Lisk objects, just build all Lisk objects.
I removed `harbormaster.lisk_counter` because it is unused.
It would be nice to autogenerate edge schemata, too, but that's a little trickier.
Test Plan: Database setup issues are all green.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, hach-que
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10620
Summary:
Ref T1191. When changing the column type of an AUTO_INCREMENT column, we currently may lose the autoincrement attribute.
Instead, support it. This is a bit messy because AUTO_INCREMENT columns interact with PRIMARY KEY columns (tables may only have one AUTO_INCREMENT column, and it must be a primary key). We need to migrate in more phases to avoid this issue.
Introduce new `auto` and `auto64` types to represent autoincrement IDs.
Test Plan:
- Saw autoincrement show up correctly in web UI.
- Fixed an autoincrement issue on the XHProf storage table with `bin/storage adjust` safely.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10607
Summary:
Ref T1191. Ref T6203. While generating expected schemata, I ran into these columns which seem to have sketchy nullability.
- Mark most of them for later resolution (T6203). They work fine today and don't need to block T1191. Changing them can break the application, so we can't autofix them.
- Forgive a couple of them that are sort-of reasonable or going to get wiped out.
Test Plan: Saw 94 remaining warnings.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: hach-que, epriestley
Maniphest Tasks: T1191, T6203
Differential Revision: https://secure.phabricator.com/D10593
Summary:
Ref T1191. Some notes here:
- Drops the old LDAP and OAuth info tables. These were migrated to the ExternalAccount table a very long time ago.
- Separates surplus/missing keys from other types of surplus/missing things. In the long run, my plan is to have only two notice levels:
- Error: something we can't fix (missing database, table, or column; overlong key).
- Warning: something we can fix (surplus anything, missing key, bad column type, bad key columns, bad uniqueness, bad collation or charset).
- For now, retaining three levels is helpful in generating all the expected scheamta.
Test Plan:
- Saw ~200 issues resolve, leaving ~1,300.
- Grepped for removed tables.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10580
Summary: T1191. Nothing very notable here.
Test Plan: Saw more blue in web UI.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D10522
Summary:
Fixes T5855. Adds a `--graceful N` flag to `phd stop` and `phd restart`.
`phd` will send SIGINT, wait `N` seconds, SIGTERM, wait 15 seconds, and SIGKILL. By default, `N` is 15.
Test Plan:
- Ran `bin/phd debug ...` and used `^C` to interrupt daemons. Saw graceful shutdown behavior, and abrupt termination on multiple `^C`.
- Ran `bin/phd start`, `bin/phd stop` and `bin/phd restart` with `--graceful` set to various things, notably `0`. Saw graceful shutdowns on the CLI and in the web UI. With `0`, abrupt shutdowns.
Reviewers: btrahan, hach-que
Reviewed By: hach-que
Subscribers: epriestley
Maniphest Tasks: T5855
Differential Revision: https://secure.phabricator.com/D10228
Summary: Fixes T5336. Currently, `PhabricatorWorkerLeaseQuery` is basically FIFO. It makes more sense for the queue to be a priority-queue, and to assign higher priorities to alerts (email and SMS).
Test Plan: Created dummy tasks in the queue (with different priorities). Verified that the priority field was set correctly in the DB and that the priority was shown on the `/daemon/` page. Started a `PhabricatorTaskmasterDaemon` and verified that the higher priority tasks were executed before lower priority tasks.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Maniphest Tasks: T5336
Differential Revision: https://secure.phabricator.com/D9871
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.
Test Plan: Eye-ball it.
Reviewers: #blessed_reviewers, epriestley, chad
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9859
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.
Test Plan: Eyeballed it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9431
Summary: Ref D8930. My "send test" for SMS was failing before this patch, and now it works nicely.
Test Plan: Used new code in D8930 that uses $this->queueTask() to get some work done and it got done in process
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D9018
Summary:
For Harbormaster tasks which want to poll or wait, this lets them say "try again a little later" without having to sleep and hold a queue slot.
This is basically the same as failing, except that we don't increment the failure counter. Instead, we just set the current lease to the correct length and then exit. The task will be retried after the lease expires.
Test Plan: Using both `bin/harbormaster` and `phd debug taskmaster`, ran a lot of waiting tasks through the queue, faking them to either yield or not yield in a controlled manner. The queue responded as expected, yielding tasks appropraitely and retrying them later.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D8792
Summary:
See discussion in D8773. Three small adjustments which should help prevent this kind of issue:
- When queueing followup tasks, hold them on the worker until we finish the task, then queue them only if the work was successful.
- Increase the default lease time from 60 seconds to 2 hours. Although most tasks finish in far fewer than 60 seconds, the daemons are generally stable nowadays and these short leases don't serve much of a purpose. I think they also date from an era where lease expiry and failure were less clearly distinguished.
- Increase the default wait-after-failure from 60 seconds to 5 minutes. This largely dates from the MetaMTA era, where Facebook ran services with high failure rates and it was appropriate to repeatedly hammer them until things went through. In modern infrastructure, such failures are rare.
Test Plan:
- Verified that tasks queued properly after the main task was updated.
- Verified that leases default to 7200 seconds.
- Intentionally failed a task and verified default 300 second wait before retry.
- Removed all default leases shorter than 7200 seconds (there was only one).
- Checked all the wait before retry implementations for anything much shorter than 5 minutes (they all seem reasonable).
Reviewers: btrahan, sowedance
Reviewed By: sowedance
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D8774
Summary:
There are quite a few tests in Arcanist, libphutil and Phabricator that do something similar to `$this->assertEqual(false, ...)` or `$this->assertEqual(true, ...)`.
This is unnecessarily verbose and it would be cleaner if we had `assertFalse` and `assertTrue` methods.
Test Plan: I contemplated adding a unit test for the `getCallerInfo` method but wasn't sure if it was required / where it should live.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley
CC: Korvin, epriestley, aran
Differential Revision: https://secure.phabricator.com/D8460
Summary:
Ref T2015. Currently, Drydock has a `wait-for-lease` workflow which is invoked in the background by the `lease` workflow.
The goal of this mechanism is to allow `bin/drydock lease` to print out logs as the lease is acquired. However, this predates the `runAllTasksInProcess` flags, and they provide a simpler and more robust way (potentially with `--trace` and `PhutilConsole`) to do synchronous execution and debug logging.
Simplify this whole mechanism: just run everything in-process in `bin/drydock lease`, and do logging via `--trace`. We could thread a `PhutilConsole` through things too, but this seems good enough for now.
Also various cleanup/etc.
Test Plan: Ran `bin/drydock lease`. Ran `bin/harbormaster build X --plan Y`, for `Y` being a Drydock-dependent build plan.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7835
Summary:
Fixes two issues:
- When rendering a task's details, we currently issue a policy-oblivious query. Instead, issue a policy-aware query.
- The formatting is a little bit weird, with the top half in a box and the bottom half with an older style. Make them consistent.
Test Plan: Looked at the detail pages for several tasks in queue.
Reviewers: btrahan, chad
Reviewed By: chad
CC: aran
Differential Revision: https://secure.phabricator.com/D7812
Summary:
Ref T1049. See discussion in D7745. We have some specific interest in this for D7745, but generally we want to consume tasks with expired leases in roughly FIFO order, just like we consume new tasks in roughly FIFO order. Currently, when we select an expired task we order them by `id`, but this is the original insert order, not lease expiration order. Instead, order by `leaseExpires`.
This query is actually much better than the old one was, since the WHERE part is `leaseExpries < VALUE`.
Test Plan: Ran `EXPLAIN` on the query. Ran a taskmaster in debug mode and saw it lease new and expired tasks successfully.
Reviewers: hach-que, btrahan
Reviewed By: hach-que
CC: aran
Maniphest Tasks: T1049
Differential Revision: https://secure.phabricator.com/D7746
Summary: Fixes T3610. This got un-scoped at some point, and is only used in Drydock so it escaped notice.
Test Plan: Ran `bin/drydock lease --type host` without hitting an exception about incorrect query construction.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T3610
Differential Revision: https://secure.phabricator.com/D6552
Summary: Fixes T3557. One thing which made T3557 kind of a mess was the lack of information about progress through temporary failures. Add a column which records a task's last failure time, and surface it in the console.
Test Plan: {F51277}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T3557
Differential Revision: https://secure.phabricator.com/D6550
Summary: Fixes T2569. This is the other common exception source which is ambiguous. List the task ID explicitly to make debugging easier.
Test Plan: {F51268}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2569
Differential Revision: https://secure.phabricator.com/D6549
Summary: Ref T2852. When a Differential revision is linked to an Asana task, show the related task in Differential.
Test Plan: {F49234}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2852
Differential Revision: https://secure.phabricator.com/D6387
Summary:
Ref T2852. Asana sync tasks currently have a standard retry/backoff schedule, but the defaults are quite aggressive (retry every 60s forever). Instead, retry at increasing intervals and stop retrying after a few tries.
- Retry at intervals and stop retrying after a few iterations.
- Modernize some interfaces.
- Add better information about retry behaviors to the web UI.
Test Plan: {F49194}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2852
Differential Revision: https://secure.phabricator.com/D6381
Summary:
Ref T2852. Add a `log()` method to `PhabricatorWorker` to make debugging easier.
I renamed the similar Drydock-specific method.
Test Plan:
Used logging in a future revision:
...
<<< [36] <http> 211,704 us
Updating main task.
>>> [37] <http> https://app.asana.com/api/1.0/tasks/6153776820388
...
Reviewers: btrahan, chad
Reviewed By: chad
CC: aran
Maniphest Tasks: T2852
Differential Revision: https://secure.phabricator.com/D6296
Summary:
Ref T2852. I want to model Asana integration as a response to feed events. Currently, we queue one feed event for each HTTP hook.
Instead, always queue one feed event and then have it queue any necessary followup events (now, http hooks; soon, asana).
Add a script to make it easy to reproducibly fire feed event publishing.
Test Plan:
Republished a feed event and verified it hit configured HTTP hooks correctly.
$ ./bin/feed republish 5765774156541908292 --trace
>>> [2] <connect> phabricator2_feed
<<< [2] <connect> 1,660 us
>>> [3] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
<<< [3] <query> 595 us
>>> [4] <connect> phabricator2_differential
<<< [4] <connect> 760 us
>>> [5] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [5] <query> 478 us
>>> [6] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [6] <query> 449 us
>>> [7] <connect> phabricator2_user
<<< [7] <connect> 1,062 us
>>> [8] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
<<< [8] <query> 540 us
>>> [9] <connect> phabricator2_file
<<< [9] <connect> 951 us
>>> [10] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
<<< [10] <query> 498 us
>>> [11] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
<<< [11] <query> 507 us
Republishing story...
>>> [12] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
<<< [12] <query> 685 us
>>> [13] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [13] <query> 489 us
>>> [14] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [14] <query> 512 us
>>> [15] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
<<< [15] <query> 601 us
>>> [16] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
<<< [16] <query> 405 us
>>> [17] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
<<< [17] <query> 551 us
>>> [18] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
<<< [18] <query> 507 us
>>> [19] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [19] <query> 428 us
>>> [20] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
<<< [20] <query> 419 us
>>> [21] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
<<< [21] <query> 591 us
>>> [22] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
<<< [22] <query> 406 us
>>> [23] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
<<< [23] <query> 593 us
>>> [24] <http> http://127.0.0.1/derp/
<<< [24] <http> 746,157 us
[2013-06-24 20:23:26] EXCEPTION: (HTTPFutureResponseStatusHTTP) [HTTP/500] Internal Server Error
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2852
Differential Revision: https://secure.phabricator.com/D6291
Summary: MySQL is not able to use indexes with searching for tuples.
Test Plan:
Explained the query before and after, saw `key_len` 16 instead of 8.
Also saw time 0.0 s instead of 2.9 s (but that was probably caused by warming up).
Reviewers: epriestley
Reviewed By: epriestley
CC: aran, Korvin
Differential Revision: https://secure.phabricator.com/D5580
Summary:
I wasn't able to reproduce the "recursion detected" in real web request but I saw lots of 1073741824 refcounts in `debug_zval_dump()` of $object.
I'm not sure how that happens.
Test Plan: D4807#4
Reviewers: epriestley
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2432
Differential Revision: https://secure.phabricator.com/D4839
Summary: Fixes various array_combine() warnings for PHP < 5.4
Test Plan: lint/unit/grep
Reviewers: btrahan, vrana, chad
Reviewed By: chad
CC: aran
Differential Revision: https://secure.phabricator.com/D4660
Summary:
Some time ago, we added `ORDER BY id ASC` to the worker `UPDATE ...` query, because someone reported that their MySQL read slaves were complaining about the query (I can't find the exact error message, but something to the effect of the rows the query affected not being deterministic). This seemed harmless since it should be the same as the query's implicit order (I guess?), but actually made the query dramatically slower for large numbers of rows.
On my local machine, this query takes about 2 seconds with ~1M rows. If I run `SELECT`, or run `UPDATE` without ORDER BY, the query takes < 0.01s. I don't understand exactly what's happening -- my guess is something to do with the ORDER BY implying that a lot of rows need to be locked?
In T2372, a user is seeing 20-60s rumtimes on this query.
I solved this by doing a SELECT, followed by an UPDATE. Each query runs quickly. This introduces the possibility of a race (two processes SELECT the same rows, then try to UPDATE), which we currently recover from by having the second UPDATE fail and then having that daemon try again 1 second later. This seems generally reasonable. Some alternatives I considered:
- We could SELECT ... LOCK FOR UPDATE, but failing and retrying a little later seems at least as good as blocking.
- We could select more rows than we need, and then try to lock some of them randomly. I think this would work well, but it's a bit more complex than what we're doing now so I left it until we have a clearer need.
Test Plan:
Inserted ~1M tasks into the queue. Ran `phd debug taskmaster`, saw ~2s task updates. Applied patch. Ran `phd debug taskmaster`, saw <1ms updates. Ran `phd launch 8 taskmaster`, saw rapid completion of tasks.
This stuff also has fairly thorough unit test coverage.
Reviewers: vrana, btrahan
Reviewed By: vrana
CC: aran
Maniphest Tasks: T2372
Differential Revision: https://secure.phabricator.com/D4576
Summary:
Load the data for daemon worker tasks when viewing them, and present
the information in a useful way. This defaults to printing the json data,
but for some classes of worker it will also link to the corresponding
object, to make debugging problems with workers easier.
Test Plan:
load /daemon/task/NNN for a CommitParserWorker and a MetaMTAWorker, and
see the addition of a data field with useful content and link.
Reviewers: epriestley, vrana
Reviewed By: epriestley
CC: aran, Korvin
Differential Revision: https://secure.phabricator.com/D4226
Summary: Include task ID and class when raising this exception. I took a brief stab at doing this generically, but (a) we specifically raise this exception outside of normal try/catch because we can't follow normal recovery rules for it and (b) we don't have a reasonable PhutilProxyException or similar right now which would preserve stack traces, and don't have builtin exception nesting support until PHP 5.3.
Test Plan: Faked this exception, verified we get more information in the logs.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2193
Differential Revision: https://secure.phabricator.com/D4205
Summary:
People hit three issues with D3914:
- As per T2059, we applied a schema change from a `.php` patch, which currently does not work if you use a different user to make schema changes than for normal use.
- Since the change in question is idempotent, just move it to a `.sql` patch. We'll follow up in T2059 and fix it properly.
- Rogue daemons at several installs used old code (expecting autoincrement) to insert into the new table (no autoincrement), thereby creating tasks with ID 0.
- Rename the table so they'll fail.
- This also makes the code a little more consistent.
- Some installs now have tasks with ID 0.
- Use checks against null rather than against 0 so we can process these tasks.
The major issues this fixes are the schema upgrade failure in T2059, and the infinite loops in T2072 and elsewhere.
This isn't really a fully statisfactory fix. I'll discuss some next steps in T2072.
Test Plan: Created new tasks via MetaMTA/Differential. Ran tasks with `phd debug taskmaster`. Inserted a task 0 and verified it ran and archived correctly.
Reviewers: btrahan, vrana, nh
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2072, T2059
Differential Revision: https://secure.phabricator.com/D3973
Summary: See D3912 for discussion. InnoDB may reuse autoincrement IDs after restart; provide a way to avoid it.
Test Plan: Unit tests. Scheduled and executed tasks through `drydock lease --type host` and `phd debug taskmaster`.
Reviewers: vrana, btrahan
Reviewed By: vrana
CC: aran
Differential Revision: https://secure.phabricator.com/D3914
Summary:
- Remove EC2, RemoteHost, Application, etc., blueprints for now. They're very proof-of-concept and Blueprints are getting API changes I don't want to bother propagating for now. Leave the abstract base class and the LocalHost blueprint. I'll restore the more complicated ones once better foundations are in place.
- Remove the Allocate controller from the web UI. The original vision here was that you'd manually allocate resources in some cases, but it no longer makes sense to do so as all allocations come from leases now. This simplifies allocations and makes the rule for when we can clean up resources clear-cut (if a resource has no more active leases, it can be cleaned up). Instead, we'll build resources like the localhost and remote hosts lazily, when leases come in for them.
- Add some configuration to manage the localhost blueprint.
- Refactor `canAllocateResources()` into `isEnabled()` (for config checks) and `canAllocateMoreResources()` (for quota checks, e.g. too many resources are allocated already).
- Juggle some signatures to align better with a world where blueprints generally do allocate.
- Add some more logging and error handling.
- Fix an issue with log ordering.
Test Plan: Allocated some localhost leases.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3902
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary:
We lock tasks by setting `leaseOwner` to a unique value, but the value is currently unique-to-the-process rather than unique-to-the-query. This means that if a process leases a task, then leases another task, both tasks will have the same `leaseOwner`. This can cause an issue where we go to select the task we just leased and get the other task instead, if we aren't careful about the select construction.
We can avoid this by being clever and making sure the select is constructed correctly, but making the `leaseOwner` unique to the query is much simpler and more foolproof. This guarantees we always select only the rows we just leased.
Also remove `PhabricatorGoodForNothingWorker` since `PhabricatorTestWorker` fills its role of allowing things to be tested, and simplify the unit tests since we don't need to be clever about avoiding this issue any more.
Test Plan: Ran unit tests.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3862
Summary:
This was the major goal of D3859/D3855, and to a lesser degree D3854/D3852.
As Drydock is allocating a resource, it may need to allocate other resources first. For example, if it's allocating a working copy, it may need to allocate a host first.
Currently, we have the process basically queue up the allocation (insert a task into the queue) and sleep() until it finishes. This is problematic for a bunch of reasons, but the major one is that if allocation takes more resources (host, port, machine, DNS) than you have daemons, they could all end up sleeping and waiting for some other daemon to do their work. This is really stupid. Even if you only take up some of them, you're spending slots sleeping when you could be doing useful work.
To partially get around this and make the CLI experience less dumb, there's this goofy `synchronous` flag that gets passed around everywhere and pushes the workflow through a pile of special cases. Basically the `synchronous` flag causes us to do everything in-process. But this is dumb too because we'd rather do things in parallel if we can, and we have to have a lot of special case code to make it work at all.
Get rid of all of this. Instead of sleep()ing, try to work on the tasks that need to be worked on. If another daemon grabbed them already that's fine, but in the worst case we just gracefully degrade and do everything in process. So we get the best of both worlds: if we have parallelizable tasks and free daemons, things will execute in parallel. If we have nonparallelizable tasks or no free daemons, things will execute in process.
Test Plan: Ran `drydock_control.php --trace` and saw it perform cascading allocations without sleeping or special casing.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3861
Summary:
- Clean up a TODO about permanent failures.
- Clean up a TODO about failing tasks after too many retries.
- Clean up a TODO about testing for bad leases.
- Make the lease/retry implementation more flexible and natural.
- Make completely bogus tasks fail permanently.
- Make PhabricatorMetaMTAWorker use new `getWaitBeforeRetry()` (as intended), not hackily implement logic in `getRequiredLeaseTime()`.
- Document worker hooks for failures and retries.
- Provide coverage on everything.
Test Plan: Ran unit tests. Ran `bin/phd debug taskmaster`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3859
Summary: This simplifies the fairly thorny logic of leasing tasks a bit. I'm planning to introduce another callsite shortly for Drydock.
Test Plan: Ran `bin/phd debug taskmaster`, observed sensible queries and correct operation.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3855
Summary: Make mobile-friendly and provide UI to cancel/retry tasks. Remove display of task data to arbitrary users, as it may be sensitive.
Test Plan:
{F22502}
{F22503}
{F22504}
{F22505}
{F22506}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3854
Summary:
Currently, when taskmasters complete a task it is immediately deleted. This prevents us from doing some general things, like:
- Supporting the idea of permanent failure (e.g., after N failures just stop trying).
- Showing the user how fast taskmasters are completing tasks.
- Showing the user how long tasks took to complete.
Having better visibility into this is important to Drydock, which builds on the task system. Also, generally buff debug output for task execution.
Test Plan: Ran `bin/phd debug taskmaster`. Ran `bin/phd debug garbage`. Queued some tasks via various systems.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3852
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.
NOTE: `arc diff` timed out so I'm pushing it without review.
Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.
Auditors: epriestley
Maniphest Tasks: T1103