Summary:
Right now, if a daemon dies it can leave the setup warning around for like 10 minutes or something until we reap it.
Tighten the warning so we only care about actively running daemons.
Test Plan: Checked setup issues.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D12088
Summary: Swaps out to modern UI components, update for mobile, fix some phts.
Test Plan: Test each Daemon page on desktop and mobile. Verify modern layout.
Reviewers: btrahan, epriestley
Reviewed By: epriestley
Subscribers: Korvin, epriestley
Maniphest Tasks: T7427
Differential Revision: https://secure.phabricator.com/D11958
Summary: Since this element isn't strictly about errors, re-label as info view instead.
Test Plan: Grepped for all callsites, tested UIExamples and a few other random pages.
Reviewers: btrahan, epriestley
Reviewed By: epriestley
Subscribers: hach-que, Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D11867
Summary: Ref T7384. This just sends SIGHUP to specified overseers in a nice package.
Test Plan: See D11898.
Reviewers: hach-que, btrahan
Reviewed By: btrahan
Subscribers: joshuaspence, epriestley
Maniphest Tasks: T7384
Differential Revision: https://secure.phabricator.com/D11899
Summary:
Fixes T7352. This reduces the memory footprint for instances by combining these two similar daemons into one daemon which handles the responsibilities of both.
The fit isn't 100% perfect here but it's pretty close, and the GC daemon is fairly trivial.
Test Plan:
- Adjusted all the numbers to small numbers (5 second sleep, 120 second GC length).
- Added a ton of logging.
- Started trigger daemon.
- Saw it run a GC cycle.
- Saw it reschedule another cycle after 120 seconds (adjusted down from 4 hours).
- Reverted all the logging/small numbers.
- Ran `bin/phd start`, saw stable trigger daemon running.
- Grepped for removed daemon class name.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11872
Summary: Ref T7352. This is pretty straightforward. I renamed `phd.start-taskmasters` to `phd.taskmasters` for clarity.
Test Plan:
- Ran `phd start`, `phd start --autoscale-reserve 0.25`, `phd restart --autoscale-reserve 0.25`, etc.
- Examined PID file to see options were passed.
- I'm defaulting this off (0 reserve) and making it a flag rather than an option because it's a very advanced feature which is probably not useful outside of instancing.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11871
Summary:
Ref T7352. We were previously identifying things by `<daemonClass, overseerPID, startTime>` but that's not unique in a world where one overseer can run multiple daemons.
We already have an internal "daemonID", it just doesn't get written into the DB right now.
Start writing it, then use it to clean up `phd status`.
Test Plan: Ran `phd status`, got more accurate/useful output than previously.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11865
Summary:
Ref T7352. This isn't wildly useful for us but seems generally reasonable, can be helpful with testing, and @hach-que has a use case for it.
The only reason we issue this warning is to prevent user error; you can still launch all the daemons with `phd launch` manually and daemons all use locks to protect critical regions.
Test Plan: Ran `phd start --force` a bunch, saw zillions of daemons.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, hach-que
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11861
Summary:
Ref T7352. This moves all the daemons under one overseer. The primary goal is to reduce the minimum footprint of an instance in the Phacility cluster, by reducing the number of processes each instance needs to run on daemon-tier hosts.
This improves scalability by roughly a factor of 2.
Test Plan:
- Ran `phd debug`, `phd launch, `phd start`. Saw normal behavior, with only one total overseer.
- Fataled dameons and saw the overseer restar them normally.
- Used `phd status` and `phd stop` and got reasonable results (`phd status` is still a touch off).
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11857
Summary: Ref T7352. This makes `phd stop` and `phd status` produce more reasonable output with the new PID file format.
Test Plan: Ran `phd stop`, `phd status`, etc.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11856
Summary:
Ref T7352. This changes `phd` to pass configuration to overseers over stdin. We still run one overseer per daemon.
The "status" stuff needs some cleanup, but it's mostly just UI/cosmetic.
Test Plan:
- Ran `phd debug`, `phd launch`, `phd start`, `phd status`, `phd stop`, etc.
- Verified PID files write in a reasonable format.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11855
Summary: Ref T7352. A couple of the APIs changed slightly with D11851.
Test Plan: See D11851.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7352
Differential Revision: https://secure.phabricator.com/D11852
Summary: Even if you --force, we can't kill PID 0. This sends the process itself the signal, and terminates it.
Test Plan: See D11786.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11787
Summary:
In the cluster, the box has a ton of stuff that "looks like a daemon" beacuse it is some other instance's daemon.
Stop `phd restart` from complaining about this if given a "--gently" flag, which is like the opposite of "--force".
(I'll make it `stop --force` at the beginning of a whole-box restart to kill stragglers.)
Test Plan: Ran `bin/phd restart --gently`, etc.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11784
Summary:
I'm hitting this in the cluster and couldn't figure it out after staring at it for a couple minutes. Produce a better error.
This dumps a hash of each configuration key value which is set to a non-default value into the daemon log. This is much more compact than the full config, and doesn't spread secrets around, so it seems like a good balance between providing information and going crazy with it.
Test Plan: {F284139}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D11699
Summary: Clean up the error view styling.
Test Plan:
Tested as many as I could find, built additional tests in UIExamples
{F280452}
{F280453}
Reviewers: btrahan, epriestley
Reviewed By: epriestley
Subscribers: hach-que, Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D11605
Summary: Ref T6881. This won't do much of interest on third party installs yet, but it's stable and we don't need to hold it back any longer.
Test Plan: Ran `phd start`, saw the trigger daemon start up.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6881
Differential Revision: https://secure.phabricator.com/D11603
Summary: Select a similar or better FontAwesome icon to represent each application
Test Plan: Visual inspection
Reviewers: epriestley, btrahan
Subscribers: hach-que, Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D11489
Summary:
Ref T6881. I tried to cheat here by not implementing this, but we need it for destroying triggers directly with `bin/remove destroy`, since that needs to load them by PHID.
So, cheat slightly less. Implement PolicyAware but not CursorPagedPolicyAware.
Test Plan:
- Used `bin/remove destroy` to destroy a trigger by PHID.
- Browsed daemon console.
- Ran trigger daemon.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6881
Differential Revision: https://secure.phabricator.com/D11445
Summary:
Ref T6881. Hopefully, this is the hard part.
This adds a new daemon (the "trigger" daemon) which processes triggers, schedules them, and then executes them at the scheduled time. The design is a little complicated, but has these goals:
- High resistance to race conditions: only the application writes to the trigger table; only the daemon writes to the event table. We won't lose events if someone saves a meeting at the same time as we're sending a reminder out for it.
- Execution guarantees: scheduled events are guaranteed to execute exactly once.
- Support for arbitrarily large queues: the daemon will make progress even if there are millions of triggers in queue. The cost to update the queue is proportional to the number of changes in it; the cost to process the queue is proportional to the number of events to execute.
- Relatively good observability: you can monitor the state of the trigger queue reasonably well from the web UI.
- Modular Infrastructure: this is a very low-level construct that Calendar, Phortune, etc., should be able to build on top of.
It doesn't have this stuff yet:
- Not very robust to bad actions: a misbehaving trigger can stop the queue fairly easily. This is OK for now since we aren't planning to make it part of any other applications for a while. We do still get execute-exaclty-once, but it might not happen for a long time (until someone goes and fixes the queue), when we could theoretically continue executing other events.
- Doesn't start automatically: normal users don't need to run this thing yet so I'm not starting it by default.
- Not super well tested: I've vetted the basics but haven't run real workloads through this yet.
- No sophisticated tooling: I added some basic stuff but it's missing some pieces we'll have to build sooner or later, e.g. `bin/trigger cancel` or whatever.
- Intentionally not realtime: This design puts execution guarantees far above realtime concerns, and will not give you precise event execution at 1-second resolution. I think this is the correct goal to pursue architecturally, and certainly correct for subscriptions and meeting reminders. Events which execute after they have become irrelevant can simply decline to do anything (like a meeting reminder which executes after the meeting is over).
In general, the expectation for applications is:
- When creating an object (like a calendar event) that needs to trigger a scheduled action, write a trigger (and save the PHID if you plan to update it later).
- The daemon will process the event and schedule the action efficiently, in a race-free way.
- If you want to move the action, update the trigger and the daemon will take care of it.
- Your action will eventually dump a task into the task queue, and the task daemons will actually perform it.
Test Plan:
Using a test script like this:
```
<?php
require_once 'scripts/__init_script__.php';
$trigger = id(new PhabricatorWorkerTrigger())
->setAction(
new PhabricatorLogTriggerAction(
array(
'message' => 'test',
)))
->setClock(
new PhabricatorMetronomicTriggerClock(
array(
'period' => 33,
)))
->save();
var_dump($trigger);
```
...I queued triggers and ran the daemon:
- Verified triggers fire;
- verified triggers reschedule;
- verified trigger events show up in the web UI;
- tried different periods;
- added some triggers while the daemon was running;
- examined `phd debug` output for anything suspicious.
It seems to work in trivial use case, at least.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6881
Differential Revision: https://secure.phabricator.com/D11419
Summary: Ref T6822.
Test Plan: `grep`. This method is only called from within `PhutilArgumentWorkflow::__construct`.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Maniphest Tasks: T6822
Differential Revision: https://secure.phabricator.com/D11415
Summary: Ref T5402.
Test Plan:
- Queried archived tasks.
- Grepped for use sites and verified no other callsites are order-sensitive.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5402
Differential Revision: https://secure.phabricator.com/D11089
Summary: Ref T5402, T6238. Pull this out into a class so the Instances app can embed task views.
Test Plan: Loaded `/daemon/` and examined the content in the tables.
Reviewers: btrahan, chad
Reviewed By: chad
Subscribers: epriestley
Maniphest Tasks: T6238, T5402
Differential Revision: https://secure.phabricator.com/D11090
Summary: Ref T5402. This cleans up some code and sets us up to use this sort of data more easily later.
Test Plan: viewed the daemon console from the web and the log of a specific archived daemon. both looked good. for other callsites looked really, really carefully.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, epriestley
Maniphest Tasks: T5402
Differential Revision: https://secure.phabricator.com/D11042
Summary: See rP2fedb6f941d8. We might need a more general version of this since we do some `sudo` stuff elsewhere, but at least on my machine `sudo -n` exits with code 0 when the target user exists but needs a password.
Test Plan:
- Tried to run daemons as root, with no automatic sudo to root. Got a bad result before (phd believed it had executed the daemons) and a good result afterward (phd recognized that sudo failed).
- Tried to run daemons from root, as a non-root user. Got a good result in both cases.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: fabe, epriestley
Differential Revision: https://secure.phabricator.com/D11041
Summary:
Fixes T5196
If no phd.user is configured the behaviour is unchanged besides printing a warning when run as root (Usually i would add an exit(1) here but that would break existing installs who do that).
If phd.user is set and the current user is root it will run the daemon as: su USER -c "command" (I'm not sure if this works for every platform needed)
Otherwise it will refuse to start if configured and current user mismatch.
Test Plan: Stopped & Started phd daemon with various users and different phd.user settings including root
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: vinzent, epriestley
Maniphest Tasks: T5196
Differential Revision: https://secure.phabricator.com/D11036
Summary:
Fixes T6702. Ref T3554. Currently, tasks can be cancelled, retried and freed from the web UI by any logged in user.
This isn't appreciably dangerous (I can't come up with a way that a user could do anything security-affecting), but I think I probably intended this to be admin-only, but these actions should move to the CLI anyway.
Move them to the CLI. Lay some groundwork for some future `bin/worker cancel --class SomeTaskClass`, but don't implement that yet.
Test Plan: Used `cancel`, `retry` and `free` from the CLI. Hit all the error/success states.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T3554, T6702
Differential Revision: https://secure.phabricator.com/D10939
Summary:
Fixes an issue with T5336 / D9871. We did 99% of the work here but didn't actually turn on the priority sorting. The unit test passed by default, which didn't catch this.
- Fix the unit test (it failed).
- Fix the query (test now passes).
- Add a "Next in Queue" element to the UI to make this kind of thing easier to spot/understand.
Test Plan: Ran unit test. Viewed "Next in Queue". Queued some tasks, flushed the queue. Web UI tracked the state sensibly.
Reviewers: joshuaspence, btrahan
Reviewed By: btrahan
Subscribers: cburroughs, epriestley
Differential Revision: https://secure.phabricator.com/D10766
Summary:
Ref T1191. Now that the whole database is covered, we don't need to do as much work to build expected schemata. Doing them database-by-database was helpful in converting, but is just reudndant work now.
Instead of requiring every application to build its Lisk objects, just build all Lisk objects.
I removed `harbormaster.lisk_counter` because it is unused.
It would be nice to autogenerate edge schemata, too, but that's a little trickier.
Test Plan: Database setup issues are all green.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, hach-que
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10620
Summary:
Ref T1191.
- Adds definitions for missing keys and keys with wrong uniqueness. Generally, I defined these before fixing the key query to actually pull all keys and support uniqueness.
- Moves "key uniqueness" to note severity; this is fixable (probably?) and there are no remaining issues.
- Moves "Missing Key" to note severity; missing keys are fixable and all remaining missing keys are really missing (either missing edge keys, or missing PHID keys):
{F210089}
- Moves "Surplus Key" to note seveirty; surplus keys are fixable all remaining surplus keys are really surplus (duplicate key in Harbormaster, key on unused column in Worker):
{F210090}
Test Plan:
- Vetted missing/surplus/unique messages.
- 146 issues remaining.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10590
Summary: Ref T1191. Fills in some more of the databases. Nothing very notable here. I didn't encounter any issues or overlong keys.
Test Plan: Used web UI to click around and verify expected schemata match up against actual schemata well.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T1191
Differential Revision: https://secure.phabricator.com/D10516
Summary:
Ref T2374. Fixes T5988.
Keep track of what's been killed and not been killed, and surface that maybe you need sudo if things don't get killed with --force
...also basically make this force thing work. I managed to convinced myself stuff was getting killed with --force when it mostly wasn't. Make sure the --force parameter gets pushed as low as it needs to go to have things get killed.
Test Plan:
- `sudo ./bin/phd restart`
- `rm -rf /var/tmp/phd/pid/*`
- `./bin/phd stop` --> get warning about rogue daemons
- `./bin/phd stop X` --> get warning about no running daemons
- `./bin/phd stop --force` --> get warning about not being able to kill daemons
- `sudo ./bin/phd stop --force` --> kill daemons successfully
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T2374, T5988
Differential Revision: https://secure.phabricator.com/D10386
Summary:
Ref T5405.
- `--limit` wasn't actually used anywhere.
- Make it mean "the N newest lines".
Test Plan: Ran `bin/phd log`, `bin/phd log --limit 3`.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5405
Differential Revision: https://secure.phabricator.com/D10385
Summary: Ref T2374. While building D10367 I noticed that phd was finding rogue daemons way more than it should be. Re-jigger this code path so rogue daemons are checked for *after* we've dealt with known daemons. This keeps the logic pretty simple overall.
Test Plan: phd start; kill pid files; phd stop and get the right warning; phd stop --force and it kills the rogue demons. phd stop in normal conditions no longer reporting rogue daemons erroneously
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T2374
Differential Revision: https://secure.phabricator.com/D10368
Summary: Shows the UI everywhere. Also asort() the keys before calculating the environment hash as that is probably an issue for someone at some point we just don't need to have. Ref T5968.
Test Plan: Viewed the setup check and saw a link to the daemon console. Viewed the daemon console and saw the various stale config daemons. Clicked a daemon and saw a "stale config" header icon where expected. Restarted daemons and all of this went away.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T5968
Differential Revision: https://secure.phabricator.com/D10367
Summary: Fixes T4881.
Test Plan: made a config change, saw the issue, restarted daemons and it went away
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T4881
Differential Revision: https://secure.phabricator.com/D10339
Summary:
If daemon data is mangled, `bin/phd restart` will SIGINT process `0`, which kills it.
uh oh T.T so sad
Test Plan: Used `bin/phd start` to start daemons; removed PID information from one; saw `bin/phd stop` shut down cleanly and not kill itself.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: mholden, epriestley
Differential Revision: https://secure.phabricator.com/D10308
Summary:
Fixes T5855. Adds a `--graceful N` flag to `phd stop` and `phd restart`.
`phd` will send SIGINT, wait `N` seconds, SIGTERM, wait 15 seconds, and SIGKILL. By default, `N` is 15.
Test Plan:
- Ran `bin/phd debug ...` and used `^C` to interrupt daemons. Saw graceful shutdown behavior, and abrupt termination on multiple `^C`.
- Ran `bin/phd start`, `bin/phd stop` and `bin/phd restart` with `--graceful` set to various things, notably `0`. Saw graceful shutdowns on the CLI and in the web UI. With `0`, abrupt shutdowns.
Reviewers: btrahan, hach-que
Reviewed By: hach-que
Subscribers: epriestley
Maniphest Tasks: T5855
Differential Revision: https://secure.phabricator.com/D10228
Summary: Ref T5655. Some discussion in D9839. Generally speaking, `Phabricator{$name}Application` is clearer than `PhabricatorApplication{$name}`.
Test Plan:
# Pinned and uninstalled some applications.
# Applied patch and performed migrations.
# Verified that the pinned applications were still pinned and that the uninstalled applications were still uninstalled.
# Performed a sanity check on the database contents.
Reviewers: btrahan, epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: hach-que, epriestley, Korvin
Maniphest Tasks: T5655
Differential Revision: https://secure.phabricator.com/D9982
Summary: These have been moved into libphutil.
Test Plan: Browsed Phabricator, didn't see a crash.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D9907
Summary: Fixes T5336. Currently, `PhabricatorWorkerLeaseQuery` is basically FIFO. It makes more sense for the queue to be a priority-queue, and to assign higher priorities to alerts (email and SMS).
Test Plan: Created dummy tasks in the queue (with different priorities). Verified that the priority field was set correctly in the DB and that the priority was shown on the `/daemon/` page. Started a `PhabricatorTaskmasterDaemon` and verified that the higher priority tasks were executed before lower priority tasks.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Maniphest Tasks: T5336
Differential Revision: https://secure.phabricator.com/D9871
Summary: The rest of this code works if we hand off `array()`, and fataling here, while more correct, is harder for users to get out of (they have to go manually remove files) and not obvious.
Test Plan: Corrupted pid file and ran `phd stop`.
Reviewers: joshuaspence
Reviewed By: joshuaspence
Subscribers: epriestley
Differential Revision: https://secure.phabricator.com/D9749