Summary:
Ref T7811. Fixes two minor issues I observed in the cluster:
- Sometimes APC doesn't give us key names. Not sure exactly what's up here, but we can do a better job with this.
- The `%` in `25%` actually needs more escaping, since it's interpreted by both `pht()` (immediately) and `console_format()` (later).
Test Plan:
- First one is just from an error log, not sure how to repro offhand.
- Ran `bin/phd help start` for the second one.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7814, T7811
Differential Revision: https://secure.phabricator.com/D12395
Summary: Ref T7384. This just sends SIGHUP to specified overseers in a nice package.
Test Plan: See D11898.
Reviewers: hach-que, btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7384
Differential Revision: https://secure.phabricator.com/D11899
Summary:
Fixes T7352. This reduces the memory footprint for instances by combining these two similar daemons into one daemon which handles the responsibilities of both.
The fit isn't 100% perfect here but it's pretty close, and the GC daemon is fairly trivial.
Test Plan:
- Adjusted all the numbers to small numbers (5 second sleep, 120 second GC length).
- Added a ton of logging.
- Started trigger daemon.
- Saw it run a GC cycle.
- Saw it reschedule another cycle after 120 seconds (adjusted down from 4 hours).
- Reverted all the logging/small numbers.
- Ran `bin/phd start`, saw stable trigger daemon running.
- Grepped for removed daemon class name.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11872
Summary: Ref T7352. This is pretty straightforward. I renamed `phd.start-taskmasters` to `phd.taskmasters` for clarity.
Test Plan:
- Ran `phd start`, `phd start --autoscale-reserve 0.25`, `phd restart --autoscale-reserve 0.25`, etc.
- Examined PID file to see options were passed.
- I'm defaulting this off (0 reserve) and making it a flag rather than an option because it's a very advanced feature which is probably not useful outside of instancing.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11871
Summary:
Ref T7352. We were previously identifying things by `<daemonClass, overseerPID, startTime>` but that's not unique in a world where one overseer can run multiple daemons.
We already have an internal "daemonID", it just doesn't get written into the DB right now.
Start writing it, then use it to clean up `phd status`.
Test Plan: Ran `phd status`, got more accurate/useful output than previously.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11865
Summary:
Ref T7352. This isn't wildly useful for us but seems generally reasonable, can be helpful with testing, and @hach-que has a use case for it.
The only reason we issue this warning is to prevent user error; you can still launch all the daemons with `phd launch` manually and daemons all use locks to protect critical regions.
Test Plan: Ran `phd start --force` a bunch, saw zillions of daemons.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, hach-que
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11861
Summary:
Ref T7352. This moves all the daemons under one overseer. The primary goal is to reduce the minimum footprint of an instance in the Phacility cluster, by reducing the number of processes each instance needs to run on daemon-tier hosts.
This improves scalability by roughly a factor of 2.
Test Plan:
- Ran `phd debug`, `phd launch, `phd start`. Saw normal behavior, with only one total overseer.
- Fataled dameons and saw the overseer restar them normally.
- Used `phd status` and `phd stop` and got reasonable results (`phd status` is still a touch off).
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11857
Summary: Ref T7352. This makes `phd stop` and `phd status` produce more reasonable output with the new PID file format.
Test Plan: Ran `phd stop`, `phd status`, etc.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11856
Summary:
Ref T7352. This changes `phd` to pass configuration to overseers over stdin. We still run one overseer per daemon.
The "status" stuff needs some cleanup, but it's mostly just UI/cosmetic.
Test Plan:
- Ran `phd debug`, `phd launch`, `phd start`, `phd status`, `phd stop`, etc.
- Verified PID files write in a reasonable format.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11855
Summary: Even if you --force, we can't kill PID 0. This sends the process itself the signal, and terminates it.
Test Plan: See D11786.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11787
Summary:
In the cluster, the box has a ton of stuff that "looks like a daemon" beacuse it is some other instance's daemon.
Stop `phd restart` from complaining about this if given a "--gently" flag, which is like the opposite of "--force".
(I'll make it `stop --force` at the beginning of a whole-box restart to kill stragglers.)
Test Plan: Ran `bin/phd restart --gently`, etc.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11784
Summary: Ref T6881. This won't do much of interest on third party installs yet, but it's stable and we don't need to hold it back any longer.
Test Plan: Ran `phd start`, saw the trigger daemon start up.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6881
Differential Revision: https://secure.phabricator.com/D11603
Summary: Ref T6822.
Test Plan: `grep`. This method is only called from within `PhutilArgumentWorkflow::__construct`.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Maniphest Tasks: T6822
Differential Revision: https://secure.phabricator.com/D11415
Summary: See rP2fedb6f941d8. We might need a more general version of this since we do some `sudo` stuff elsewhere, but at least on my machine `sudo -n` exits with code 0 when the target user exists but needs a password.
Test Plan:
- Tried to run daemons as root, with no automatic sudo to root. Got a bad result before (phd believed it had executed the daemons) and a good result afterward (phd recognized that sudo failed).
- Tried to run daemons from root, as a non-root user. Got a good result in both cases.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: fabe, epriestley
Differential Revision: https://secure.phabricator.com/D11041
Summary:
Fixes T5196
If no phd.user is configured the behaviour is unchanged besides printing a warning when run as root (Usually i would add an exit(1) here but that would break existing installs who do that).
If phd.user is set and the current user is root it will run the daemon as: su USER -c "command" (I'm not sure if this works for every platform needed)
Otherwise it will refuse to start if configured and current user mismatch.
Test Plan: Stopped & Started phd daemon with various users and different phd.user settings including root
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: vinzent, epriestley
Maniphest Tasks: T5196
Differential Revision: https://secure.phabricator.com/D11036
Summary:
Ref T2374. Fixes T5988.
Keep track of what's been killed and not been killed, and surface that maybe you need sudo if things don't get killed with --force
...also basically make this force thing work. I managed to convinced myself stuff was getting killed with --force when it mostly wasn't. Make sure the --force parameter gets pushed as low as it needs to go to have things get killed.
Test Plan:
- `sudo ./bin/phd restart`
- `rm -rf /var/tmp/phd/pid/*`
- `./bin/phd stop` --> get warning about rogue daemons
- `./bin/phd stop X` --> get warning about no running daemons
- `./bin/phd stop --force` --> get warning about not being able to kill daemons
- `sudo ./bin/phd stop --force` --> kill daemons successfully
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T2374, T5988
Differential Revision: https://secure.phabricator.com/D10386
Summary:
Ref T5405.
- `--limit` wasn't actually used anywhere.
- Make it mean "the N newest lines".
Test Plan: Ran `bin/phd log`, `bin/phd log --limit 3`.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5405
Differential Revision: https://secure.phabricator.com/D10385
Summary: Ref T2374. While building D10367 I noticed that phd was finding rogue daemons way more than it should be. Re-jigger this code path so rogue daemons are checked for *after* we've dealt with known daemons. This keeps the logic pretty simple overall.
Test Plan: phd start; kill pid files; phd stop and get the right warning; phd stop --force and it kills the rogue demons. phd stop in normal conditions no longer reporting rogue daemons erroneously
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T2374
Differential Revision: https://secure.phabricator.com/D10368
Summary:
If daemon data is mangled, `bin/phd restart` will SIGINT process `0`, which kills it.
uh oh T.T so sad
Test Plan: Used `bin/phd start` to start daemons; removed PID information from one; saw `bin/phd stop` shut down cleanly and not kill itself.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: mholden, epriestley
Differential Revision: https://secure.phabricator.com/D10308
Summary:
Fixes T5855. Adds a `--graceful N` flag to `phd stop` and `phd restart`.
`phd` will send SIGINT, wait `N` seconds, SIGTERM, wait 15 seconds, and SIGKILL. By default, `N` is 15.
Test Plan:
- Ran `bin/phd debug ...` and used `^C` to interrupt daemons. Saw graceful shutdown behavior, and abrupt termination on multiple `^C`.
- Ran `bin/phd start`, `bin/phd stop` and `bin/phd restart` with `--graceful` set to various things, notably `0`. Saw graceful shutdowns on the CLI and in the web UI. With `0`, abrupt shutdowns.
Reviewers: btrahan, hach-que
Reviewed By: hach-que
Subscribers: epriestley
Maniphest Tasks: T5855
Differential Revision: https://secure.phabricator.com/D10228
Summary: It is sometimes useful to use `./bin/phd status` as a means to determine if daemons //are// actually running on the current host. For example, a common practice in upstart scripts is something similar to `./bin/phd status || ./bin/phd status`.
Test Plan:
```
> ./bin/phd status
ID Host PID Started Daemon Arguments
1162 ip-10-127-58-93 4046 Jun 20 2014, 3:17:43 AM PhabricatorFactDaemon
1161 ip-10-127-58-93 3984 Jun 20 2014, 3:17:43 AM PhabricatorTaskmasterDaemon
1160 ip-10-127-58-93 3973 Jun 20 2014, 3:17:42 AM PhabricatorTaskmasterDaemon
1159 ip-10-127-58-93 3968 Jun 20 2014, 3:17:42 AM PhabricatorTaskmasterDaemon
1158 ip-10-127-58-93 3943 Jun 20 2014, 3:17:42 AM PhabricatorTaskmasterDaemon
1157 ip-10-127-58-93 3914 Jun 20 2014, 3:17:41 AM PhabricatorGarbageCollectorDaemon
1156 ip-10-127-58-93 3909 Jun 20 2014, 3:17:41 AM PhabricatorRepositoryPullLocalDaemon
> ./bin/phd status --local
There are no running Phabricator daemons.
```
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D9645
Summary:
Ref T4209. Unifies the local (`./bin/phd status`) and global (`./bin/phd status --all`) view into a single table. This generally makes it easy to administer daemons running across multiple hosts.
Depends on D9606.
Test Plan:
```
> sudo ./bin/phd status
ID Host PID Started Daemon Arguments
38 localhost 2282 Jun 18 2014, 7:52:56 AM PhabricatorRepositoryPullLocalDaemon
39 localhost 2289 Jun 18 2014, 7:52:57 AM PhabricatorGarbageCollectorDaemon
40 localhost 2294 Jun 18 2014, 7:52:57 AM PhabricatorTaskmasterDaemon
41 localhost 2314 Jun 18 2014, 7:52:58 AM PhabricatorTaskmasterDaemon
42 localhost 2319 Jun 18 2014, 7:52:59 AM PhabricatorTaskmasterDaemon
43 localhost 2328 Jun 18 2014, 7:53:00 AM PhabricatorTaskmasterDaemon
44 localhost 2354 Jun 18 2014, 7:53:08 AM PhabricatorRepositoryPullLocalDaemon X --not Y
```
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T4209
Differential Revision: https://secure.phabricator.com/D9607
Summary: This was previously submitted as D9497, but I had accidentally `arc land`ed some not-reviewed not-yet-complete changes in addition to the accepted diff.
Test Plan: Same as D9497.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Maniphest Tasks: T5388, T4209
Differential Revision: https://secure.phabricator.com/D9589
Summary: Ref T4209. Currently, `./bin/phd status` prints a table showing the daemons that are executing on the current host. It would be useful to be able to conventiently query the daemons running across all hosts. This would also (theoretically) make it possible to conditionally start daemons on a host depending upon the current state and on the daemons running on other hosts.
Test Plan:
```
> ./bin/phd status --all
ID Host PID Started Daemon Arguments
18 phabricator 6969 Jun 12 2014, 4:44:22 PM PhabricatorTaskmasterDaemon
17 phabricator 6961 Jun 12 2014, 4:44:19 PM PhabricatorTaskmasterDaemon
16 phabricator 6955 Jun 12 2014, 4:44:15 PM PhabricatorTaskmasterDaemon
15 phabricator 6950 Jun 12 2014, 4:44:14 PM PhabricatorTaskmasterDaemon
14 phabricator 6936 Jun 12 2014, 4:44:13 PM PhabricatorGarbageCollectorDaemon
13 phabricator 6931 Jun 12 2014, 4:44:12 PM PhabricatorRepositoryPullLocalDaemon
```
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T4209
Differential Revision: https://secure.phabricator.com/D9497
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.
Test Plan: Eyeballed it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9431
Summary:
Fixes T5154. Currently, "phd stop" terminates daemons relatively abruptly (and other things do too, like killing them). This can leave them with long leases that won't expire any time soon. Normally this isn't a big deal, since it just means an email or an import takes a bit longer (often 2 hours, but up to 24 hours) to run. However:
- We've increased default lease durations a lot fairly recently -- the 2 hours used to be 15 minutes.
- Harbormaster and Drydock add new types of tasks which are more dependent on other tasks, so waiting 2 hours for something to free up can hold up more stuff in queue.
When `phd start` is run, we can be confident (at least, in normal circumstances) that leases are safe to free, since we do a check. This undoes any damage done by abrupt stops in "phd stop" or by users or systems killing stuff.
(It would be nice to make "phd stop" more graceful at some point, but we always have to deal with abrupt termination in some cases no matter how gentle "phd stop" is.)
One sort-of-questionable thing here is that we don't distinguish between tasks which had an active lease and tasks which had been released, since the system itself does not make a distiction. So, for example, if you have a task that retries 5 times and waits an hour between retries, you'll get a retry on every `phd start` now, and could exhaust them all in a few minutes if you cycle `phd start` aggressively. I think this is OK. In the future, we could try to distinguish between these types of tasks, and only free the ones with active leases.
Test Plan:
- Used `phd start` normally, saw it free leases.
- Used `phd start`, killed it real quick so no taskmasters spawned, ran it again an saw no leases freed.
- Used `phd start --keep-leases`.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5154
Differential Revision: https://secure.phabricator.com/D9256
Summary: Fixes T4735. When running `./bin/phd`, show daemon arguments.
Test Plan:
```
./bin/phd status
PID Started Daemon Arguments
12711 May 20 2014, 9:02:52 AM PhabricatorRepositoryPullLocalDaemon []
12716 May 20 2014, 9:02:52 AM PhabricatorGarbageCollectorDaemon []
12733 May 20 2014, 9:02:53 AM PhabricatorTaskmasterDaemon []
12768 May 20 2014, 9:02:53 AM PhabricatorTaskmasterDaemon []
12775 May 20 2014, 9:02:53 AM PhabricatorTaskmasterDaemon []
12780 May 20 2014, 9:02:54 AM PhabricatorTaskmasterDaemon []
12838 May 20 2014, 9:02:54 AM PhabricatorFactDaemon []
13436 May 20 2014, 9:03:23 AM PhabricatorRepositoryPullLocalDaemon ["X","--not","Y"]
```
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Maniphest Tasks: T4735
Differential Revision: https://secure.phabricator.com/D9208
Summary:
See <http://github.com/facebook/phabricator/issues/487>. By default, we perform a write in this query to moved daemons to "dead" status after a timeout. This is normally reasonable, but after D7964 we do a setup check against the daemons, which means this query is invoked very early in the stack, before we have a write guard.
Since doing this write unconditionally is unnecessarily, surprising, and overly ambitious, make the write conditional and do not attempt to perform it from the setup check.
(We could also move this to a GC/cron sort of thing eventually, maybe -- it's a bit awkward here, but we don't have other infrastructure which is a great fit right now.)
Test Plan: Hit setup issues and daemon pages. Will confirm with user that this fixes things.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D8023
Summary:
Ref T2015. Not directly related to Drydock, but I've wanted to do this for a bit.
Introduce a common base class for all the workflows in the scripts in `bin/*`. This slightly reduces code duplication by moving `isExecutable()` to the base, but also provides `getViewer()`. This is a little nicer than `PhabricatorUser::getOmnipotentUser()` and gives us a layer of indirection if we ever want to introduce more general viewer mechanisms in scripts.
Test Plan: Lint; ran some of the scripts.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7838
Summary: Fixes T3680. One description was wrong, and clean up some of the other stuff.
Test Plan: Ran `phd`.
Reviewers: btrahan, Korvin
Reviewed By: Korvin
CC: aran, jifriedman, Korvin
Maniphest Tasks: T3680
Differential Revision: https://secure.phabricator.com/D6683
Summary: Ref T3557. Make it easier to access full daemon logs from the CLI.
Test Plan: {F51265}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T3557
Differential Revision: https://secure.phabricator.com/D6547
Summary: Ref T1670. Use events and direct database writes instead of Conduit. Deprecate the Conduit methods.
Test Plan: Ran daemons, used the console to review daemon event logs.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1670
Differential Revision: https://secure.phabricator.com/D6536
Summary:
Ref T1670. Prepare for the overseers to talk directly to the database instead of using Conduit. See T1670 for discussion.
This shouldn't impact anything, except it has a very small chance of destabilizing the overseers.
Test Plan:
Ran `phd launch`, `phd debug`, `phd start`.
Ran with `--trace-memory` and verified elevated but mostly steady memory usage (8MB / overseer). This climbed by 0.05KB / sec (4MB / day) but the source of the leaks seems to be the cURL calls we're making over Conduit so this will actually fix that. Disabling `--conduit-uri` reported steady memory usage. I wasn't able to identify anything leaking within code we control. This may be something like a dynamic but capped buffer in cURL, since we haven't seen any issues in the wild.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1670
Differential Revision: https://secure.phabricator.com/D6534
Summary:
Ref T1670. Mostly, use PhutilArgumentParser. This breaks up the mismash of functional stuff and PhabriatorDaemonControl into proper argumentparser Workflows.
There are no functional changes, except that I removed the "pingConduit()" call prior to starting daemons, because I intend to remove all Conduit integration.
Test Plan:
- Ran `phd list`.
- Ran `phd status` (running daemons).
- Ran `phd status` (no running daemons).
- Ran `phd stop <pid>` (dead task).
- Ran `phd stop <pid>` (live task).
- Ran `phd stop zebra` (invalid PID).
- Ran `phd stop 1` (bad PID).
- Ran `phd stop`.
- Ran `phd debug zebra` (no match).
- Ran `phd debug e` (ambiguous).
- Ran `phd debug task`.
- Ran `phd launch task`.
- Ran `phd launch 0 task` (invalid arg).
- Ran `phd launch 2 task`.
- Ran `phd help`.
- Ran `phd help list`.
- Ran `phd start`.
- Ran `phd restart`.
- Looked at Repositories (daemon running).
- Looked at Repositories (daemon not running).
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1670
Differential Revision: https://secure.phabricator.com/D6490