1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-11 08:06:13 +01:00
Commit graph

55 commits

Author SHA1 Message Date
epriestley
139c63bd84 When a worker task fails permanently, log the reason
Summary: Ref T6238. This makes debugging permanent task failures easier (we log reasoning for temporary failures already, just not permanent ones).

Test Plan: Saw more useful permanent failure log.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T6238

Differential Revision: https://secure.phabricator.com/D10945
2014-12-08 11:27:10 -08:00
epriestley
9a7383121d Move cancel/retry/free task queue actions to bin/worker
Summary:
Fixes T6702. Ref T3554. Currently, tasks can be cancelled, retried and freed from the web UI by any logged in user.

This isn't appreciably dangerous (I can't come up with a way that a user could do anything security-affecting), but I think I probably intended this to be admin-only, but these actions should move to the CLI anyway.

Move them to the CLI. Lay some groundwork for some future `bin/worker cancel --class SomeTaskClass`, but don't implement that yet.

Test Plan: Used `cancel`, `retry` and `free` from the CLI. Hit all the error/success states.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T3554, T6702

Differential Revision: https://secure.phabricator.com/D10939
2014-12-06 09:14:16 -08:00
epriestley
b5f7e9eec6 Reverse meaning of task priority column
Summary:
Ref T6615. Mixing ASC and DESC ordering on a multipart key makes it dramatically less effective (or perhaps totally ineffective).

Reverse the meaning of the `priority` column so it goes in the same direction as the `id` column (both ascending, lower values execute sooner).

Test Plan:
  - Queued 1.2M tasks with `bin/worker flood`.
  - Processed ~1 task/second with `bin/phd debug taskmaster` before patch.
  - Applied patch, took ~5 seconds for ~1.2M rows.
  - Processed ~100-200 tasks/second with `bin/phd debug taskmaster` after patch.
  - "Next in Queue" query on daemon page dropped from 1.5s to <1ms.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: aklapper, 20after4, epriestley

Maniphest Tasks: T6615

Differential Revision: https://secure.phabricator.com/D10895
2014-11-24 11:10:35 -08:00
epriestley
7e1c312183 Add bin/worker flood, for flooding the task queue with work
Summary: Ref T6615. Ref T3554. We need better tooling around the queue eventually, so start here.

Test Plan: Added 100K+ tasks locally with `bin/worker flood`. Executed some of them with `bin/phd debug taskmaster` (we already have a TestWorker, used in unit tests).

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T3554, T6615

Differential Revision: https://secure.phabricator.com/D10894
2014-11-24 11:10:15 -08:00
epriestley
914b8bb32c Fix daemon task queue to respect task priority
Summary:
Fixes an issue with T5336 / D9871. We did 99% of the work here but didn't actually turn on the priority sorting. The unit test passed by default, which didn't catch this.

  - Fix the unit test (it failed).
  - Fix the query (test now passes).
  - Add a "Next in Queue" element to the UI to make this kind of thing easier to spot/understand.

Test Plan: Ran unit test. Viewed "Next in Queue". Queued some tasks, flushed the queue. Web UI tracked the state sensibly.

Reviewers: joshuaspence, btrahan

Reviewed By: btrahan

Subscribers: cburroughs, epriestley

Differential Revision: https://secure.phabricator.com/D10766
2014-10-31 09:27:04 -07:00
epriestley
8fa8415c07 Automatically build all Lisk schemata
Summary:
Ref T1191. Now that the whole database is covered, we don't need to do as much work to build expected schemata. Doing them database-by-database was helpful in converting, but is just reudndant work now.

Instead of requiring every application to build its Lisk objects, just build all Lisk objects.

I removed `harbormaster.lisk_counter` because it is unused.

It would be nice to autogenerate edge schemata, too, but that's a little trickier.

Test Plan: Database setup issues are all green.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley, hach-que

Maniphest Tasks: T1191

Differential Revision: https://secure.phabricator.com/D10620
2014-10-02 09:51:20 -07:00
epriestley
300172e799 Support AUTO_INCREMENT in bin/storage adjust
Summary:
Ref T1191. When changing the column type of an AUTO_INCREMENT column, we currently may lose the autoincrement attribute.

Instead, support it. This is a bit messy because AUTO_INCREMENT columns interact with PRIMARY KEY columns (tables may only have one AUTO_INCREMENT column, and it must be a primary key). We need to migrate in more phases to avoid this issue.

Introduce new `auto` and `auto64` types to represent autoincrement IDs.

Test Plan:
  - Saw autoincrement show up correctly in web UI.
  - Fixed an autoincrement issue on the XHProf storage table with `bin/storage adjust` safely.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T1191

Differential Revision: https://secure.phabricator.com/D10607
2014-10-01 08:24:51 -07:00
epriestley
4fcc634a99 Fix almost all remaining schemata issues
Summary:
Ref T1191. This fixes nearly every remaining blocker for utf8mb4 -- primarily, overlong keys.

Remaining issue is https://secure.phabricator.com/T1191#77467

Test Plan: I'll annotate inline.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley, hach-que

Maniphest Tasks: T6099, T6129, T6133, T6134, T6150, T6148, T6147, T6146, T6105, T1191

Differential Revision: https://secure.phabricator.com/D10601
2014-10-01 08:18:36 -07:00
epriestley
03519c53bb Mark questionable column nullability for later
Summary:
Ref T1191. Ref T6203. While generating expected schemata, I ran into these columns which seem to have sketchy nullability.

  - Mark most of them for later resolution (T6203). They work fine today and don't need to block T1191. Changing them can break the application, so we can't autofix them.
  - Forgive a couple of them that are sort-of reasonable or going to get wiped out.

Test Plan: Saw 94 remaining warnings.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: hach-que, epriestley

Maniphest Tasks: T1191, T6203

Differential Revision: https://secure.phabricator.com/D10593
2014-10-01 07:59:44 -07:00
epriestley
098d0d93d6 Generate expected schemata for User/People tables
Summary:
Ref T1191. Some notes here:

  - Drops the old LDAP and OAuth info tables. These were migrated to the ExternalAccount table a very long time ago.
  - Separates surplus/missing keys from other types of surplus/missing things. In the long run, my plan is to have only two notice levels:
    - Error: something we can't fix (missing database, table, or column; overlong key).
    - Warning: something we can fix (surplus anything, missing key, bad column type, bad key columns, bad uniqueness, bad collation or charset).
    - For now, retaining three levels is helpful in generating all the expected scheamta.

Test Plan:
  - Saw ~200 issues resolve, leaving ~1,300.
  - Grepped for removed tables.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T1191

Differential Revision: https://secure.phabricator.com/D10580
2014-10-01 07:36:47 -07:00
epriestley
7499cb24ce Generate expected schemata for Workers, XHProf, PHPAAST, Tokens, System, Slowvote
Summary: T1191. Nothing very notable here.

Test Plan: Saw more blue in web UI.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D10522
2014-09-19 05:45:24 -07:00
Joshua Spence
0151c38b10 Apply some autofix linter rules
Summary: Self-explanatory.

Test Plan: Eyeball it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D10454
2014-09-10 06:55:05 +10:00
epriestley
9309723ac4 Send graceful shutdown signals to daemons in Phabricator
Summary:
Fixes T5855. Adds a `--graceful N` flag to `phd stop` and `phd restart`.

`phd` will send SIGINT, wait `N` seconds, SIGTERM, wait 15 seconds, and SIGKILL. By default, `N` is 15.

Test Plan:
  - Ran `bin/phd debug ...` and used `^C` to interrupt daemons. Saw graceful shutdown behavior, and abrupt termination on multiple `^C`.
  - Ran `bin/phd start`, `bin/phd stop` and `bin/phd restart` with `--graceful` set to various things, notably `0`. Saw graceful shutdowns on the CLI and in the web UI. With `0`, abrupt shutdowns.

Reviewers: btrahan, hach-que

Reviewed By: hach-que

Subscribers: epriestley

Maniphest Tasks: T5855

Differential Revision: https://secure.phabricator.com/D10228
2014-08-11 20:18:31 -07:00
Joshua Spence
9a679bf374 Allow worker tasks to have priorities
Summary: Fixes T5336. Currently, `PhabricatorWorkerLeaseQuery` is basically FIFO. It makes more sense for the queue to be a priority-queue, and to assign higher priorities to alerts (email and SMS).

Test Plan: Created dummy tasks in the queue (with different priorities). Verified that the priority field was set correctly in the DB and that the priority was shown on the `/daemon/` page. Started a `PhabricatorTaskmasterDaemon` and verified that the higher priority tasks were executed before lower priority tasks.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Maniphest Tasks: T5336

Differential Revision: https://secure.phabricator.com/D9871
2014-07-12 03:02:06 +10:00
Joshua Spence
8756d82cf6 Remove @group annotations
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.

Test Plan: Eye-ball it.

Reviewers: #blessed_reviewers, epriestley, chad

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9859
2014-07-10 08:12:48 +10:00
Joshua Spence
0a62f13464 Change double quotes to single quotes.
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.

Test Plan: Eyeballed it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9431
2014-06-09 11:36:50 -07:00
Bob Trahan
61dd5ab6c1 Worker - supporting running queued tasks in process
Summary: Ref D8930. My "send test" for SMS was failing before this patch, and now it works nicely.

Test Plan: Used new code in D8930 that uses $this->queueTask() to get some work done and it got done in process

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D9018
2014-05-08 13:14:58 -07:00
epriestley
4a6d2e9c97 Allow tasks to yield to other tasks
Summary:
For Harbormaster tasks which want to poll or wait, this lets them say "try again a little later" without having to sleep and hold a queue slot.

This is basically the same as failing, except that we don't increment the failure counter. Instead, we just set the current lease to the correct length and then exit. The task will be retried after the lease expires.

Test Plan: Using both `bin/harbormaster` and `phd debug taskmaster`, ran a lot of waiting tasks through the queue, faking them to either yield or not yield in a controlled manner. The queue responded as expected, yielding tasks appropraitely and retrying them later.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D8792
2014-04-16 13:02:12 -07:00
epriestley
cb545856a9 Make task queue more robust against long-running tasks
Summary:
See discussion in D8773. Three small adjustments which should help prevent this kind of issue:

  - When queueing followup tasks, hold them on the worker until we finish the task, then queue them only if the work was successful.
  - Increase the default lease time from 60 seconds to 2 hours. Although most tasks finish in far fewer than 60 seconds, the daemons are generally stable nowadays and these short leases don't serve much of a purpose. I think they also date from an era where lease expiry and failure were less clearly distinguished.
  - Increase the default wait-after-failure from 60 seconds to 5 minutes. This largely dates from the MetaMTA era, where Facebook ran services with high failure rates and it was appropriate to repeatedly hammer them until things went through. In modern infrastructure, such failures are rare.

Test Plan:
  - Verified that tasks queued properly after the main task was updated.
  - Verified that leases default to 7200 seconds.
  - Intentionally failed a task and verified default 300 second wait before retry.
  - Removed all default leases shorter than 7200 seconds (there was only one).
  - Checked all the wait before retry implementations for anything much shorter than 5 minutes (they all seem reasonable).

Reviewers: btrahan, sowedance

Reviewed By: sowedance

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D8774
2014-04-15 08:42:02 -07:00
Joshua Spence
e11adc4ad7 Added some additional assertion methods.
Summary:
There are quite a few tests in Arcanist, libphutil and Phabricator that do something similar to `$this->assertEqual(false, ...)` or `$this->assertEqual(true, ...)`.

This is unnecessarily verbose and it would be cleaner if we had `assertFalse` and `assertTrue` methods.

Test Plan: I contemplated adding a unit test for the `getCallerInfo` method but wasn't sure if it was required / where it should live.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley

CC: Korvin, epriestley, aran

Differential Revision: https://secure.phabricator.com/D8460
2014-03-08 19:16:21 -08:00
epriestley
1650874004 Modernize Drydock CLI management of task execution
Summary:
Ref T2015. Currently, Drydock has a `wait-for-lease` workflow which is invoked in the background by the `lease` workflow.

The goal of this mechanism is to allow `bin/drydock lease` to print out logs as the lease is acquired. However, this predates the `runAllTasksInProcess` flags, and they provide a simpler and more robust way (potentially with `--trace` and `PhutilConsole`) to do synchronous execution and debug logging.

Simplify this whole mechanism: just run everything in-process in `bin/drydock lease`, and do logging via `--trace`. We could thread a `PhutilConsole` through things too, but this seems good enough for now.

Also various cleanup/etc.

Test Plan: Ran `bin/drydock lease`. Ran `bin/harbormaster build X --plan Y`, for `Y` being a Drydock-dependent build plan.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7835
2013-12-27 13:15:12 -08:00
epriestley
c462713584 Minor cleanup for task rendering in Daemons
Summary:
Fixes two issues:

  - When rendering a task's details, we currently issue a policy-oblivious query. Instead, issue a policy-aware query.
  - The formatting is a little bit weird, with the top half in a box and the bottom half with an older style. Make them consistent.

Test Plan: Looked at the detail pages for several tasks in queue.

Reviewers: btrahan, chad

Reviewed By: chad

CC: aran

Differential Revision: https://secure.phabricator.com/D7812
2013-12-20 18:02:32 -08:00
epriestley
0ed281d25e Make taskmaster consumption of failed tasks more FIFO-ey
Summary:
Ref T1049. See discussion in D7745. We have some specific interest in this for D7745, but generally we want to consume tasks with expired leases in roughly FIFO order, just like we consume new tasks in roughly FIFO order. Currently, when we select an expired task we order them by `id`, but this is the original insert order, not lease expiration order. Instead, order by `leaseExpires`.

This query is actually much better than the old one was, since the WHERE part is `leaseExpries < VALUE`.

Test Plan: Ran `EXPLAIN` on the query. Ran a taskmaster in debug mode and saw it lease new and expired tasks successfully.

Reviewers: hach-que, btrahan

Reviewed By: hach-que

CC: aran

Maniphest Tasks: T1049

Differential Revision: https://secure.phabricator.com/D7746
2013-12-08 21:07:13 -08:00
epriestley
af321df3c0 Fix a bad query in LeaseQuery
Summary: Fixes T3610. This got un-scoped at some point, and is only used in Drydock so it escaped notice.

Test Plan: Ran `bin/drydock lease --type host` without hitting an exception about incorrect query construction.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T3610

Differential Revision: https://secure.phabricator.com/D6552
2013-07-24 11:32:03 -07:00
epriestley
6750a48951 Surface task queue temporary failure rate in Daemon console
Summary: Fixes T3557. One thing which made T3557 kind of a mess was the lack of information about progress through temporary failures. Add a column which records a task's last failure time, and surface it in the console.

Test Plan: {F51277}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T3557

Differential Revision: https://secure.phabricator.com/D6550
2013-07-23 16:58:22 -07:00
epriestley
45db6001ea Use PhutilProxyException to disambiguate work queue failures
Summary: Fixes T2569. This is the other common exception source which is ambiguous. List the task ID explicitly to make debugging easier.

Test Plan: {F51268}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2569

Differential Revision: https://secure.phabricator.com/D6549
2013-07-23 16:58:21 -07:00
epriestley
13e2489739 Add a link to the main Asana task from Differential
Summary: Ref T2852. When a Differential revision is linked to an Asana task, show the related task in Differential.

Test Plan: {F49234}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2852

Differential Revision: https://secure.phabricator.com/D6387
2013-07-09 16:22:33 -07:00
epriestley
984bc8ea63 Give Asana feed publishing tasks a less aggressive retry/backoff schedule
Summary:
Ref T2852. Asana sync tasks currently have a standard retry/backoff schedule, but the defaults are quite aggressive (retry every 60s forever). Instead, retry at increasing intervals and stop retrying after a few tries.

  - Retry at intervals and stop retrying after a few iterations.
  - Modernize some interfaces.
  - Add better information about retry behaviors to the web UI.

Test Plan: {F49194}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2852

Differential Revision: https://secure.phabricator.com/D6381
2013-07-08 14:34:18 -07:00
epriestley
d4ca508d2b Add PhabricatorWorker->log()
Summary:
Ref T2852. Add a `log()` method to `PhabricatorWorker` to make debugging easier.

I renamed the similar Drydock-specific method.

Test Plan:
Used logging in a future revision:

  ...
  <<< [36] <http> 211,704 us
  Updating main task.
  >>> [37] <http> https://app.asana.com/api/1.0/tasks/6153776820388
  ...

Reviewers: btrahan, chad

Reviewed By: chad

CC: aran

Maniphest Tasks: T2852

Differential Revision: https://secure.phabricator.com/D6296
2013-06-25 16:31:37 -07:00
epriestley
42c0f060d5 Push feed publishing deeper into the task queue
Summary:
Ref T2852. I want to model Asana integration as a response to feed events. Currently, we queue one feed event for each HTTP hook.

Instead, always queue one feed event and then have it queue any necessary followup events (now, http hooks; soon, asana).

Add a script to make it easy to reproducibly fire feed event publishing.

Test Plan:
Republished a feed event and verified it hit configured HTTP hooks correctly.

  $ ./bin/feed republish 5765774156541908292 --trace
  >>> [2] <connect> phabricator2_feed
  <<< [2] <connect> 1,660 us
  >>> [3] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
  <<< [3] <query> 595 us
  >>> [4] <connect> phabricator2_differential
  <<< [4] <connect> 760 us
  >>> [5] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [5] <query> 478 us
  >>> [6] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [6] <query> 449 us
  >>> [7] <connect> phabricator2_user
  <<< [7] <connect> 1,062 us
  >>> [8] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
  <<< [8] <query> 540 us
  >>> [9] <connect> phabricator2_file
  <<< [9] <connect> 951 us
  >>> [10] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
  <<< [10] <query> 498 us
  >>> [11] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
  <<< [11] <query> 507 us
  Republishing story...
  >>> [12] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
  <<< [12] <query> 685 us
  >>> [13] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [13] <query> 489 us
  >>> [14] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [14] <query> 512 us
  >>> [15] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
  <<< [15] <query> 601 us
  >>> [16] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
  <<< [16] <query> 405 us
  >>> [17] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
  <<< [17] <query> 551 us
  >>> [18] <query> SELECT story.* FROM `feed_storydata` story JOIN `feed_storyreference` ref ON ref.chronologicalKey = story.chronologicalKey WHERE (ref.chronologicalKey IN (5765774156541908292)) GROUP BY story.chronologicalKey ORDER BY story.chronologicalKey DESC
  <<< [18] <query> 507 us
  >>> [19] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [19] <query> 428 us
  >>> [20] <query> SELECT * FROM `differential_revision` WHERE phid IN ('PHID-DREV-ywqmrj5zgkdloqh5p3c5')
  <<< [20] <query> 419 us
  >>> [21] <query> SELECT * FROM `user` WHERE phid in ('PHID-USER-lqiz3yd7wmk64ejugvov')
  <<< [21] <query> 591 us
  >>> [22] <query> SELECT * FROM `file` WHERE phid IN ('PHID-FILE-gq6dlsysvxbn3dgwvky7')
  <<< [22] <query> 406 us
  >>> [23] <query> SELECT * FROM `user_status` WHERE userPHID IN ('PHID-USER-lqiz3yd7wmk64ejugvov') AND UNIX_TIMESTAMP() BETWEEN dateFrom AND dateTo
  <<< [23] <query> 593 us
  >>> [24] <http> http://127.0.0.1/derp/
  <<< [24] <http> 746,157 us
  [2013-06-24 20:23:26] EXCEPTION: (HTTPFutureResponseStatusHTTP) [HTTP/500] Internal Server Error

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2852

Differential Revision: https://secure.phabricator.com/D6291
2013-06-25 16:29:47 -07:00
Jakub Vrana
3391e3d34b Use (a = ? AND b = ?) instead of (a, b) IN (?, ?)
Summary: MySQL is not able to use indexes with searching for tuples.

Test Plan:
Explained the query before and after, saw `key_len` 16 instead of 8.
Also saw time 0.0 s instead of 2.9 s (but that was probably caused by warming up).

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D5580
2013-04-05 23:02:06 -07:00
vrana
f864d9e611 Fix double escaping in phutil_tag
Summary:
I wasn't able to reproduce the "recursion detected" in real web request but I saw lots of 1073741824 refcounts in `debug_zval_dump()` of $object.
I'm not sure how that happens.

Test Plan: D4807#4

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T2432

Differential Revision: https://secure.phabricator.com/D4839
2013-02-06 15:21:05 -08:00
epriestley
f6622f43e6 Replace all array_combine(x, x) with array_fuse(x) in Phabricator
Summary: Fixes various array_combine() warnings for PHP < 5.4

Test Plan: lint/unit/grep

Reviewers: btrahan, vrana, chad

Reviewed By: chad

CC: aran

Differential Revision: https://secure.phabricator.com/D4660
2013-01-25 17:06:55 -08:00
epriestley
22c64c67ff Fix performance problem for large task queues
Summary:
Some time ago, we added `ORDER BY id ASC` to the worker `UPDATE ...` query, because someone reported that their MySQL read slaves were complaining about the query (I can't find the exact error message, but something to the effect of the rows the query affected not being deterministic). This seemed harmless since it should be the same as the query's implicit order (I guess?), but actually made the query dramatically slower for large numbers of rows.

On my local machine, this query takes about 2 seconds with ~1M rows. If I run `SELECT`, or run `UPDATE` without ORDER BY, the query takes < 0.01s. I don't understand exactly what's happening -- my guess is something to do with the ORDER BY implying that a lot of rows need to be locked?

In T2372, a user is seeing 20-60s rumtimes on this query.

I solved this by doing a SELECT, followed by an UPDATE. Each query runs quickly. This introduces the possibility of a race (two processes SELECT the same rows, then try to UPDATE), which we currently recover from by having the second UPDATE fail and then having that daemon try again 1 second later. This seems generally reasonable. Some alternatives I considered:

  - We could SELECT ... LOCK FOR UPDATE, but failing and retrying a little later seems at least as good as blocking.
  - We could select more rows than we need, and then try to lock some of them randomly. I think this would work well, but it's a bit more complex than what we're doing now so I left it until we have a clearer need.

Test Plan:
Inserted ~1M tasks into the queue. Ran `phd debug taskmaster`, saw ~2s task updates. Applied patch. Ran `phd debug taskmaster`, saw <1ms updates. Ran `phd launch 8 taskmaster`, saw rapid completion of tasks.

This stuff also has fairly thorough unit test coverage.

Reviewers: vrana, btrahan

Reviewed By: vrana

CC: aran

Maniphest Tasks: T2372

Differential Revision: https://secure.phabricator.com/D4576
2013-01-22 12:27:18 -08:00
Nick Harper
4a81ae6d6d Add data information to daemon task view
Summary:
Load the data for daemon worker tasks when viewing them, and present
the information in a useful way. This defaults to printing the json data,
but for some classes of worker it will also link to the corresponding
object, to make debugging problems with workers easier.

Test Plan:
load /daemon/task/NNN for a CommitParserWorker and a MetaMTAWorker, and
see the addition of a data field with useful content and link.

Reviewers: epriestley, vrana

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D4226
2012-12-17 17:12:55 -08:00
epriestley
4c7c518c63 Throw a richer exception when updating tasks with expired leases
Summary: Include task ID and class when raising this exception. I took a brief stab at doing this generically, but (a) we specifically raise this exception outside of normal try/catch because we can't follow normal recovery rules for it and (b) we don't have a reasonable PhutilProxyException or similar right now which would preserve stack traces, and don't have builtin exception nesting support until PHP 5.3.

Test Plan: Faked this exception, verified we get more information in the logs.

Reviewers: btrahan, vrana

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2193

Differential Revision: https://secure.phabricator.com/D4205
2012-12-16 16:27:32 -08:00
epriestley
ee2e85a0bb Fix several migration issues with the Task/Counter patch
Summary:
People hit three issues with D3914:

  - As per T2059, we applied a schema change from a `.php` patch, which currently does not work if you use a different user to make schema changes than for normal use.
    - Since the change in question is idempotent, just move it to a `.sql` patch. We'll follow up in T2059 and fix it properly.
  - Rogue daemons at several installs used old code (expecting autoincrement) to insert into the new table (no autoincrement), thereby creating tasks with ID 0.
    - Rename the table so they'll fail.
    - This also makes the code a little more consistent.
  - Some installs now have tasks with ID 0.
    - Use checks against null rather than against 0 so we can process these tasks.

The major issues this fixes are the schema upgrade failure in T2059, and the infinite loops in T2072 and elsewhere.

This isn't really a fully statisfactory fix. I'll discuss some next steps in T2072.

Test Plan: Created new tasks via MetaMTA/Differential. Ran tasks with `phd debug taskmaster`. Inserted a task 0 and verified it ran and archived correctly.

Reviewers: btrahan, vrana, nh

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2072, T2059

Differential Revision: https://secure.phabricator.com/D3973
2012-11-16 10:19:22 -08:00
epriestley
7332599e03 Provide an IDS_COUNTER mechanism for ID assignment
Summary: See D3912 for discussion. InnoDB may reuse autoincrement IDs after restart; provide a way to avoid it.

Test Plan: Unit tests. Scheduled and executed tasks through `drydock lease --type host` and `phd debug taskmaster`.

Reviewers: vrana, btrahan

Reviewed By: vrana

CC: aran

Differential Revision: https://secure.phabricator.com/D3914
2012-11-07 13:33:07 -08:00
epriestley
7e0ce08154 Make various Drydock CLI/Allocator improvements
Summary:
  - Remove EC2, RemoteHost, Application, etc., blueprints for now. They're very proof-of-concept and Blueprints are getting API changes I don't want to bother propagating for now. Leave the abstract base class and the LocalHost blueprint. I'll restore the more complicated ones once better foundations are in place.
  - Remove the Allocate controller from the web UI. The original vision here was that you'd manually allocate resources in some cases, but it no longer makes sense to do so as all allocations come from leases now. This simplifies allocations and makes the rule for when we can clean up resources clear-cut (if a resource has no more active leases, it can be cleaned up). Instead, we'll build resources like the localhost and remote hosts lazily, when leases come in for them.
  - Add some configuration to manage the localhost blueprint.
  - Refactor `canAllocateResources()` into `isEnabled()` (for config checks) and `canAllocateMoreResources()` (for quota checks, e.g. too many resources are allocated already).
  - Juggle some signatures to align better with a world where blueprints generally do allocate.
  - Add some more logging and error handling.
  - Fix an issue with log ordering.

Test Plan: Allocated some localhost leases.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3902
2012-11-06 15:30:11 -08:00
vrana
ef85f49adc Delete license headers from files
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).

We are removing the headers for these reasons:

- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.

This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).

Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.

Reviewers: epriestley, davidrecordon

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T2035

Differential Revision: https://secure.phabricator.com/D3886
2012-11-05 11:16:51 -08:00
epriestley
0364ccdd5f Fix an issue where PhabricatorWorkerLeaseQuery may lease different tasks than it selects
Summary:
We lock tasks by setting `leaseOwner` to a unique value, but the value is currently unique-to-the-process rather than unique-to-the-query. This means that if a process leases a task, then leases another task, both tasks will have the same `leaseOwner`. This can cause an issue where we go to select the task we just leased and get the other task instead, if we aren't careful about the select construction.

We can avoid this by being clever and making sure the select is constructed correctly, but making the `leaseOwner` unique to the query is much simpler and more foolproof. This guarantees we always select only the rows we just leased.

Also remove `PhabricatorGoodForNothingWorker` since `PhabricatorTestWorker` fills its role of allowing things to be tested, and simplify the unit tests since we don't need to be clever about avoiding this issue any more.

Test Plan: Ran unit tests.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3862
2012-11-01 11:30:49 -07:00
epriestley
f0fdcf1a51 Undumb the Drydock resource allocator pipeline
Summary:
This was the major goal of D3859/D3855, and to a lesser degree D3854/D3852.

As Drydock is allocating a resource, it may need to allocate other resources first. For example, if it's allocating a working copy, it may need to allocate a host first.

Currently, we have the process basically queue up the allocation (insert a task into the queue) and sleep() until it finishes. This is problematic for a bunch of reasons, but the major one is that if allocation takes more resources (host, port, machine, DNS) than you have daemons, they could all end up sleeping and waiting for some other daemon to do their work. This is really stupid. Even if you only take up some of them, you're spending slots sleeping when you could be doing useful work.

To partially get around this and make the CLI experience less dumb, there's this goofy `synchronous` flag that gets passed around everywhere and pushes the workflow through a pile of special cases. Basically the `synchronous` flag causes us to do everything in-process. But this is dumb too because we'd rather do things in parallel if we can, and we have to have a lot of special case code to make it work at all.

Get rid of all of this. Instead of sleep()ing, try to work on the tasks that need to be worked on. If another daemon grabbed them already that's fine, but in the worst case we just gracefully degrade and do everything in process. So we get the best of both worlds: if we have parallelizable tasks and free daemons, things will execute in parallel. If we have nonparallelizable tasks or no free daemons, things will execute in process.

Test Plan: Ran `drydock_control.php --trace` and saw it perform cascading allocations without sleeping or special casing.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3861
2012-11-01 11:30:42 -07:00
epriestley
84ee4cd9f6 Factor out task execution and formalize permanent failures
Summary:
  - Clean up a TODO about permanent failures.
  - Clean up a TODO about failing tasks after too many retries.
  - Clean up a TODO about testing for bad leases.
  - Make the lease/retry implementation more flexible and natural.
  - Make completely bogus tasks fail permanently.
  - Make PhabricatorMetaMTAWorker use new `getWaitBeforeRetry()` (as intended), not hackily implement logic in `getRequiredLeaseTime()`.
  - Document worker hooks for failures and retries.
  - Provide coverage on everything.

Test Plan: Ran unit tests. Ran `bin/phd debug taskmaster`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3859
2012-11-01 11:30:23 -07:00
epriestley
88fad90c1c Move task leasing to a dedicated query
Summary: This simplifies the fairly thorny logic of leasing tasks a bit. I'm planning to introduce another callsite shortly for Drydock.

Test Plan: Ran `bin/phd debug taskmaster`, observed sensible queries and correct operation.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3855
2012-11-01 11:30:16 -07:00
epriestley
fe329b9738 Modernize worker task detail view
Summary: Make mobile-friendly and provide UI to cancel/retry tasks. Remove display of task data to arbitrary users, as it may be sensitive.

Test Plan:
{F22502}
{F22503}
{F22504}
{F22505}
{F22506}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3854
2012-10-31 15:22:32 -07:00
epriestley
5903ed650c Move completed tasks to an "archive" table and delete them in the GC
Summary:
Currently, when taskmasters complete a task it is immediately deleted. This prevents us from doing some general things, like:

  - Supporting the idea of permanent failure (e.g., after N failures just stop trying).
  - Showing the user how fast taskmasters are completing tasks.
  - Showing the user how long tasks took to complete.

Having better visibility into this is important to Drydock, which builds on the task system. Also, generally buff debug output for task execution.

Test Plan: Ran `bin/phd debug taskmaster`. Ran `bin/phd debug garbage`. Queued some tasks via various systems.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3852
2012-10-31 15:22:16 -07:00
vrana
2e484e257d Fix lint errors found by Nemo
Summary:
See also:

- https://github.com/tpyo/amazon-s3-php-class/pull/33
- https://github.com/stripe/stripe-php/pull/13

Test Plan: Ran a script analyzing sources by HPHP.

Reviewers: btrahan, jungejason, epriestley

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D2713
2012-06-11 19:09:42 -07:00
vrana
6cc196a2e5 Move files in Phabricator one level up
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.

NOTE: `arc diff` timed out so I'm pushing it without review.

Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.

Auditors: epriestley

Maniphest Tasks: T1103
2012-06-01 12:32:44 -07:00
vrana
1ebf9186b4 Depend on class autoloading
Test Plan:
Run setup.
/differential/

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Koolvin

Maniphest Tasks: T1103

Differential Revision: https://secure.phabricator.com/D2612
2012-05-30 16:57:21 -07:00
epriestley
09c8af4de0 Upgrade phabricator to libphutil v2
Summary: Mechanical changes from D2588. No "Class.php" moves yet.

Test Plan: See D2588.

Reviewers: vrana, btrahan, jungejason

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1103

Differential Revision: https://secure.phabricator.com/D2591
2012-05-30 14:26:29 -07:00