1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-12 08:36:13 +01:00
Commit graph

13 commits

Author SHA1 Message Date
Joshua Spence
9a679bf374 Allow worker tasks to have priorities
Summary: Fixes T5336. Currently, `PhabricatorWorkerLeaseQuery` is basically FIFO. It makes more sense for the queue to be a priority-queue, and to assign higher priorities to alerts (email and SMS).

Test Plan: Created dummy tasks in the queue (with different priorities). Verified that the priority field was set correctly in the DB and that the priority was shown on the `/daemon/` page. Started a `PhabricatorTaskmasterDaemon` and verified that the higher priority tasks were executed before lower priority tasks.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Maniphest Tasks: T5336

Differential Revision: https://secure.phabricator.com/D9871
2014-07-12 03:02:06 +10:00
Joshua Spence
8756d82cf6 Remove @group annotations
Summary: I'm pretty sure that `@group` annotations are useless now... see D9855. Also fixed various other minor issues.

Test Plan: Eye-ball it.

Reviewers: #blessed_reviewers, epriestley, chad

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9859
2014-07-10 08:12:48 +10:00
Joshua Spence
0a62f13464 Change double quotes to single quotes.
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.

Test Plan: Eyeballed it.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin, hach-que

Differential Revision: https://secure.phabricator.com/D9431
2014-06-09 11:36:50 -07:00
epriestley
cb545856a9 Make task queue more robust against long-running tasks
Summary:
See discussion in D8773. Three small adjustments which should help prevent this kind of issue:

  - When queueing followup tasks, hold them on the worker until we finish the task, then queue them only if the work was successful.
  - Increase the default lease time from 60 seconds to 2 hours. Although most tasks finish in far fewer than 60 seconds, the daemons are generally stable nowadays and these short leases don't serve much of a purpose. I think they also date from an era where lease expiry and failure were less clearly distinguished.
  - Increase the default wait-after-failure from 60 seconds to 5 minutes. This largely dates from the MetaMTA era, where Facebook ran services with high failure rates and it was appropriate to repeatedly hammer them until things went through. In modern infrastructure, such failures are rare.

Test Plan:
  - Verified that tasks queued properly after the main task was updated.
  - Verified that leases default to 7200 seconds.
  - Intentionally failed a task and verified default 300 second wait before retry.
  - Removed all default leases shorter than 7200 seconds (there was only one).
  - Checked all the wait before retry implementations for anything much shorter than 5 minutes (they all seem reasonable).

Reviewers: btrahan, sowedance

Reviewed By: sowedance

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D8774
2014-04-15 08:42:02 -07:00
epriestley
0ed281d25e Make taskmaster consumption of failed tasks more FIFO-ey
Summary:
Ref T1049. See discussion in D7745. We have some specific interest in this for D7745, but generally we want to consume tasks with expired leases in roughly FIFO order, just like we consume new tasks in roughly FIFO order. Currently, when we select an expired task we order them by `id`, but this is the original insert order, not lease expiration order. Instead, order by `leaseExpires`.

This query is actually much better than the old one was, since the WHERE part is `leaseExpries < VALUE`.

Test Plan: Ran `EXPLAIN` on the query. Ran a taskmaster in debug mode and saw it lease new and expired tasks successfully.

Reviewers: hach-que, btrahan

Reviewed By: hach-que

CC: aran

Maniphest Tasks: T1049

Differential Revision: https://secure.phabricator.com/D7746
2013-12-08 21:07:13 -08:00
epriestley
af321df3c0 Fix a bad query in LeaseQuery
Summary: Fixes T3610. This got un-scoped at some point, and is only used in Drydock so it escaped notice.

Test Plan: Ran `bin/drydock lease --type host` without hitting an exception about incorrect query construction.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T3610

Differential Revision: https://secure.phabricator.com/D6552
2013-07-24 11:32:03 -07:00
Jakub Vrana
3391e3d34b Use (a = ? AND b = ?) instead of (a, b) IN (?, ?)
Summary: MySQL is not able to use indexes with searching for tuples.

Test Plan:
Explained the query before and after, saw `key_len` 16 instead of 8.
Also saw time 0.0 s instead of 2.9 s (but that was probably caused by warming up).

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D5580
2013-04-05 23:02:06 -07:00
epriestley
22c64c67ff Fix performance problem for large task queues
Summary:
Some time ago, we added `ORDER BY id ASC` to the worker `UPDATE ...` query, because someone reported that their MySQL read slaves were complaining about the query (I can't find the exact error message, but something to the effect of the rows the query affected not being deterministic). This seemed harmless since it should be the same as the query's implicit order (I guess?), but actually made the query dramatically slower for large numbers of rows.

On my local machine, this query takes about 2 seconds with ~1M rows. If I run `SELECT`, or run `UPDATE` without ORDER BY, the query takes < 0.01s. I don't understand exactly what's happening -- my guess is something to do with the ORDER BY implying that a lot of rows need to be locked?

In T2372, a user is seeing 20-60s rumtimes on this query.

I solved this by doing a SELECT, followed by an UPDATE. Each query runs quickly. This introduces the possibility of a race (two processes SELECT the same rows, then try to UPDATE), which we currently recover from by having the second UPDATE fail and then having that daemon try again 1 second later. This seems generally reasonable. Some alternatives I considered:

  - We could SELECT ... LOCK FOR UPDATE, but failing and retrying a little later seems at least as good as blocking.
  - We could select more rows than we need, and then try to lock some of them randomly. I think this would work well, but it's a bit more complex than what we're doing now so I left it until we have a clearer need.

Test Plan:
Inserted ~1M tasks into the queue. Ran `phd debug taskmaster`, saw ~2s task updates. Applied patch. Ran `phd debug taskmaster`, saw <1ms updates. Ran `phd launch 8 taskmaster`, saw rapid completion of tasks.

This stuff also has fairly thorough unit test coverage.

Reviewers: vrana, btrahan

Reviewed By: vrana

CC: aran

Maniphest Tasks: T2372

Differential Revision: https://secure.phabricator.com/D4576
2013-01-22 12:27:18 -08:00
vrana
ef85f49adc Delete license headers from files
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).

We are removing the headers for these reasons:

- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.

This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).

Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.

Reviewers: epriestley, davidrecordon

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T2035

Differential Revision: https://secure.phabricator.com/D3886
2012-11-05 11:16:51 -08:00
epriestley
0364ccdd5f Fix an issue where PhabricatorWorkerLeaseQuery may lease different tasks than it selects
Summary:
We lock tasks by setting `leaseOwner` to a unique value, but the value is currently unique-to-the-process rather than unique-to-the-query. This means that if a process leases a task, then leases another task, both tasks will have the same `leaseOwner`. This can cause an issue where we go to select the task we just leased and get the other task instead, if we aren't careful about the select construction.

We can avoid this by being clever and making sure the select is constructed correctly, but making the `leaseOwner` unique to the query is much simpler and more foolproof. This guarantees we always select only the rows we just leased.

Also remove `PhabricatorGoodForNothingWorker` since `PhabricatorTestWorker` fills its role of allowing things to be tested, and simplify the unit tests since we don't need to be clever about avoiding this issue any more.

Test Plan: Ran unit tests.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3862
2012-11-01 11:30:49 -07:00
epriestley
f0fdcf1a51 Undumb the Drydock resource allocator pipeline
Summary:
This was the major goal of D3859/D3855, and to a lesser degree D3854/D3852.

As Drydock is allocating a resource, it may need to allocate other resources first. For example, if it's allocating a working copy, it may need to allocate a host first.

Currently, we have the process basically queue up the allocation (insert a task into the queue) and sleep() until it finishes. This is problematic for a bunch of reasons, but the major one is that if allocation takes more resources (host, port, machine, DNS) than you have daemons, they could all end up sleeping and waiting for some other daemon to do their work. This is really stupid. Even if you only take up some of them, you're spending slots sleeping when you could be doing useful work.

To partially get around this and make the CLI experience less dumb, there's this goofy `synchronous` flag that gets passed around everywhere and pushes the workflow through a pile of special cases. Basically the `synchronous` flag causes us to do everything in-process. But this is dumb too because we'd rather do things in parallel if we can, and we have to have a lot of special case code to make it work at all.

Get rid of all of this. Instead of sleep()ing, try to work on the tasks that need to be worked on. If another daemon grabbed them already that's fine, but in the worst case we just gracefully degrade and do everything in process. So we get the best of both worlds: if we have parallelizable tasks and free daemons, things will execute in parallel. If we have nonparallelizable tasks or no free daemons, things will execute in process.

Test Plan: Ran `drydock_control.php --trace` and saw it perform cascading allocations without sleeping or special casing.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3861
2012-11-01 11:30:42 -07:00
epriestley
84ee4cd9f6 Factor out task execution and formalize permanent failures
Summary:
  - Clean up a TODO about permanent failures.
  - Clean up a TODO about failing tasks after too many retries.
  - Clean up a TODO about testing for bad leases.
  - Make the lease/retry implementation more flexible and natural.
  - Make completely bogus tasks fail permanently.
  - Make PhabricatorMetaMTAWorker use new `getWaitBeforeRetry()` (as intended), not hackily implement logic in `getRequiredLeaseTime()`.
  - Document worker hooks for failures and retries.
  - Provide coverage on everything.

Test Plan: Ran unit tests. Ran `bin/phd debug taskmaster`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3859
2012-11-01 11:30:23 -07:00
epriestley
88fad90c1c Move task leasing to a dedicated query
Summary: This simplifies the fairly thorny logic of leasing tasks a bit. I'm planning to introduce another callsite shortly for Drydock.

Test Plan: Ran `bin/phd debug taskmaster`, observed sensible queries and correct operation.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D3855
2012-11-01 11:30:16 -07:00