phorge-phorge

mirror of https://we.phorge.it/source/phorge.git synced 2024-12-16 10:30:56 +01:00

Author	SHA1	Message	Date
epriestley	5c1e4488de	Remove all "Phabricator Bot" code Summary: Closes T7829 as wontfix. Closes T7965 as wontfix. Closes T7800 as wontfix. Closes T2731 as wontfix. Closes T1271 as wontfix. We aren't maintaining this at all (see, e.g., T7829) and a user reported a technically accurate security issue via HackerOne: <https://hackerone.com/reports/222870> Just throw it away until we get to the eventual Conphernece bot/API update and can do this stuff correctly. Test Plan: Grepped for `phabricatorbot`. Reviewers: chad Reviewed By: chad Maniphest Tasks: T7965, T7829, T7800, T2731, T1271 Differential Revision: https://secure.phabricator.com/D17756	2017-04-21 12:48:35 -07:00
epriestley	a41d158490	Only hibernate the Taskmaster after 15 seconds of inactivity Under some workloads, the taskmaster may hibernate and launch more rapidly than it should. Require 15 seconds of inactivity before hibernating. Also hibernate for longer. Auditors: chad	2017-03-25 05:01:32 -07:00
epriestley	2cda280cde	Make the default Trigger hibernation 3 minutes instead of 5 seconds The `min()` vs `max()` fix in D17560 meant that the Trigger daemon only hibernates for 5 seconds, so we do a full GC sweep every 5 seconds. This ends up eating a fair amount of CPU for no real benefit. The GC cursors should move to persistent storage, but just bump this default up in the meantime. Auditors: chad	2017-03-25 04:14:32 -07:00
epriestley	8b553d2f18	Allow taskmaster daemons to hibernate Summary: Ref T12298. Like PullLocal daemons, this allows the last daemon in the pool to hibernate if there's no work to be done, and awakens the pool when work arrives. Test Plan: - Ran `bin/phd debug task --trace`. - Saw the pool hibernate and look for tasks. - Commented on an object. - Saw the pool wake up and process the queue. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12298 Differential Revision: https://secure.phabricator.com/D17559	2017-03-24 13:51:37 -07:00
epriestley	f13637627d	Improve daemon "waiting" message, config reload behavior Summary: Ref T12298. Two minor daemon improvements: - Make the "waiting" message reflect hibernation. - Don't trigger a reload right after launching. Test Plan: - Read "waiting" message. - Ran "bin/phd start", didn't see an immediate SIGHUP in the log. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12298 Differential Revision: https://secure.phabricator.com/D17550	2017-03-24 08:32:08 -07:00
epriestley	9099485a71	Allow the PullLocal daemon to hibernate, and wake it when repositories need an update Summary: Ref T12298. This allows the PullLocal daemon to hibernate like the Trigger daemon, but automatically wakes it back up when it needs to do something. Test Plan: - Ran `bin/phd debug pulllocal --trace`. - Saw the daemon hibernate after doing a checkup on repositories. - Saw periodic queries to look for new update messages. - After clicking "Update Now" in the web UI to schedule an update, saw the daemon wake up immediately. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12298 Differential Revision: https://secure.phabricator.com/D17540	2017-03-23 10:52:28 -07:00
epriestley	90ec21f999	Add "--pool" and "--duration" flags to daemon CLI tools Summary: Ref T12331. These changes are intended to make it easier to debug T12331 since I'm having difficulty reproducing the issue locally. Test Plan: - Ran `bin/phd debug task --pool 4` and got an autoscaling pool. - Ran `bin/worker flood --duration 3` and got some 3-second-long tasks to execute with `bin/worker execute ...`. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12331 Differential Revision: https://secure.phabricator.com/D17431	2017-02-28 07:43:46 -08:00
epriestley	40cc403d23	Allow the Trigger daemon to hibernate, reducing processes to 0 Summary: Ref T12298. The trigger daemon already has routine long-term sleep, and few external events can impact when it should ideally wake up. The relevant events are: - Someone creates a new Nuance source (ideally, we should wake up right away and start polling it). - Someone creates a Calendar event about 16 minutes in the future (ideally, we should send them a reminder in about a minute). - Someone changes GC config to be extremely aggressive (ideally, we should immediately respect the change). None of these cases are very important. We don't hibernate for more than 3 minutes, so the worst case is that your Nuance source takes 3 minutes to start importing or your Calendar notification comes two minutes too late (13 minutes before the event instead of 15). This change makes GC sightly more CPU-expensive on average: currently, we do a GC sweep every 4 hours. After this change, we'll end up doing one every 3 minutes, because we lose the fact that we did a sweep recently when the daemon restarts. We could fix this by keeping track of when the last GC sweep was in the database, instead of in the Daemon process, but the cost of a sweep is normally very small so I don't plan to do this anytime soon. Test Plan: - Ran `bin/phd debug trigger`, saw daemon go through 3-minute hibernate + restart cycles. - Ran `bin/phd debug task`, saw daemon run normally. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12298 Differential Revision: https://secure.phabricator.com/D17408	2017-02-24 10:54:05 -08:00
Chad Little	bf44210dc8	Reduce application search engine results list for Dashboards Summary: Ref T10390. Simplifies dropdown by rolling out canUseInPanel in useless panels Test Plan: Add a query panel, see less options. Reviewers: epriestley Reviewed By: epriestley Subscribers: Korvin Maniphest Tasks: T10390 Differential Revision: https://secure.phabricator.com/D17341	2017-02-22 12:42:43 -08:00
Jakub Vrana	a778151f28	Fix errors found by PHPStan Test Plan: Ran `phpstan analyze -a autoload.php phabricator/src`. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin, hach-que Differential Revision: https://secure.phabricator.com/D17371	2017-02-17 10:10:15 +00:00
Josh Cox	ac66522c2e	Add a flag to ./bin/worker to select tasks based on their failureCount Summary: I frequently run into a situation where I want to kill tasks that have accumulated a lot of failures regardless of what class they are. Or I'll want to kill every worker of a certain class but only if it has failed at least once. This change allows me to run `./bin/worker cancel --class <MYCLASS> --min-failure-count 5` to only kill tasks with at least 5 failed attempts. The `--min-failure-count N` argument can be used by itself as well as with `--class CLASSNAME`. I don't think it makes sense for it to work with `--id ID`, but I'm not dead set on that or anything. Test Plan: I ran the worker management workflow with and without the `--min-failure-count` argument and it worked as expected. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin, epriestley, yelirekim Differential Revision: https://secure.phabricator.com/D16906	2016-10-12 09:49:29 -04:00
epriestley	706c21375e	Remove empty implementations of `describeAutomaticCapabilities()` Summary: This has been replaced by `PolicyCodex` after D16830. Also: - Rebuild Celerity map to fix grumpy unit test. - Fix one issue on the policy exception workflow to accommodate the new code. Test Plan: - `arc unit --everything` - Viewed policy explanations. - Viewed policy errors. Reviewers: chad Reviewed By: chad Subscribers: hach-que, PHID-OPKG-gm6ozazyms6q6i22gyam Differential Revision: https://secure.phabricator.com/D16831	2016-11-09 15:24:22 -08:00
epriestley	960c0be689	Fix some issues with Phabricator i18n string extraction Summary: Ref T5267. Fix one minor bug (paths were not being resolved properly) and one minor string issue (missing `%d` in a string). Test Plan: Extracted strings, got a cleaner result. Reviewers: chad Reviewed By: chad Maniphest Tasks: T5267 Differential Revision: https://secure.phabricator.com/D16808	2016-11-06 11:12:45 -08:00
epriestley	6b16f930c4	Automatically send (not-so-great) email notifications for upcoming events Summary: Ref T7931. This is still quite rough, but should technically send vaguely-useful email as part of the standard trigger infrastructure. Test Plan: Ran `bin/phd start`, created an event shortly, saw reminder email send in `bin/mail list-outbound`. Reviewers: chad Reviewed By: chad Maniphest Tasks: T7931 Differential Revision: https://secure.phabricator.com/D16784	2016-11-01 13:24:40 -07:00
epriestley	7678f412be	Hold a lock while collecting garbage Summary: Fixes T11771. Adds a lock around each GC process so we don't try to, e.g., delete old files on two machines at once just because they're both running trigger daemons. The other aspects of this daemon (actual triggers; nuance importers) already have separate locks. Test Plan: Ran `bin/phd debug trigger --trace`, saw daemon acquire locks and collect garbage. Reviewers: chad Reviewed By: chad Maniphest Tasks: T11771 Differential Revision: https://secure.phabricator.com/D16739	2016-10-20 13:40:00 -07:00
epriestley	db2425b300	Do initial repository imports at a lower priority and finish importing commits before starting new ones Summary: Fixes T11677. This makes two minor adjustments to the repository import daemons: - The first step ("Message") now queues at a slightly-lower-than-default (for already-imported repositories) or very-low (for newly importing repositories) priority level. - The other steps now queue at "default" priority level. This is actually what they already did, but without this change their behavior would be to inherit the priority level of their parents. This has two effects: - When adding new repositories to an existing install, they shouldn't block other things from happening anymore. - The daemons will tend to start one commit and run through all of its steps before starting another commit. This makes progress through the queue more even and predictable. - Before, they did ALL the message tasks, then ALL the change tasks, etc. This works fine but is confusing/uneven/less-predictable because each type of task takes a different amount of time. Test Plan: - Added a new repository. - Saw all of its "message" steps queue at priority 4000. - Saw followups queue at priority 2000. - Saw progress generally "finish what you started" -- go through the queue one commit at a time, instead of one type of task at a time. Reviewers: chad Reviewed By: chad Maniphest Tasks: T11677 Differential Revision: https://secure.phabricator.com/D16585	2016-09-21 16:41:01 -07:00
Josh Cox	8cdf1a890a	Updated the docs so chatbots can use the Conduit API Summary: Previously, the chatbot docs instructed users to get certificates for the conduit API and put the cert in a `conduit.cert` config key. In order to get the chatbot to work, I needed to instead get an API key and put it in the `conduit.token` config entry. Test Plan: Doc fix. Tried the new documented way and it worked. Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: Korvin, epriestley Differential Revision: https://secure.phabricator.com/D16443	2016-08-24 19:05:30 -04:00
Josh Cox	605210bc95	Make the chatbot obey the object name blacklist Summary: Fixes T11508. The config entry `remarkup.ignored-object-names` already contains a blacklist of object names that should be ignored in the web UI. This change makes that blacklist also apply to the chatbot. This makes it possible to have a chatbot ignore things like V1, V2, Q1 and any other phrases the user may not want to generate links to objects. Test Plan: Create objects (tasks, slowvotes, etc.) then mention the object names in chat (with the bot running). The bot should respond with helpful links to the given objects. Then add the object names to the blacklist through the config web UI. This apparently triggers the bot to restart itself. Then mention the object names in chat again. The bot should no longer respond with links because those object names have been added to the blacklist regex. Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: epriestley Maniphest Tasks: T11508 Differential Revision: https://secure.phabricator.com/D16442	2016-08-23 07:38:27 -05:00
epriestley	3bd0da0ec2	Add a missing table key to improve performance of "Recently Completed Tasks" query Summary: Fixes T11490. Currently, this query can not use a key and the table size may be quite large. Adjust the query so it can use a key for both selection and ordering, and add that key. Test Plan: Ran `EXPLAIN` on the old query in production, then added the key and ran `EXPLAIN` on the new query. Saw key in use, and "rows" examined drop from 29,273 to 15. Reviewers: chad Reviewed By: chad Maniphest Tasks: T11490 Differential Revision: https://secure.phabricator.com/D16423	2016-08-19 11:53:09 -07:00
epriestley	ca78c1825a	When already running as the daemon user, don't "sudo" daemon commands Summary: The cluster synchronization code runs either actively (before returning a response to `git clone`, for example) or passively (routinely, as the daemons update reposiories). The active sync runs as the web user (if running `git clone http://...`) or the VCS user (if running `git clone ssh://...`). But the passive sync runs as the daemon user. All of these sync processes need to run actual commands as the daemon user (`git fetch ...`). For the active ones, we must `sudo`. For the passive ones, we're already the right user. We run the same code, and end up trying to sudo to ourselves, which `sudo` isn't happy about by default. Depending on how `sudo` is configured and which users things are running as this might work anyway, but it's silly and if it doesn't work it requires you to go make non-obvious, weird config changes that are unintuitive and somewhat nonsensical. This is probably worse on the balance than adding a bit of complexity to the code. Instead, test which user we're running as. If it's already the right user, don't sudo. Test Plan: - Ran `bin/repository update --trace` as daemon user, saw no more `sudo`. - Ran a `git clone` to make sure that didn't break. Reviewers: chad, avivey Reviewed By: avivey Differential Revision: https://secure.phabricator.com/D16391	2016-08-11 16:41:19 -07:00
epriestley	5e3efca08a	In taskmaster daemons, only close connections which were not used recently Summary: Ref T11458. Depends on D16388. Currently, we're very aggressive about closing connections in the taskmaster daemons. This can end up taking up a lot of resources. In particular, because the outgoing port for outbound connections normally can not be reused for 60 seconds after a connection closes, we may exhaust outbound ports on the host if there's a big queue full of stuff that's being processed very quickly. At a minimum, we //always// are holding open a `worker` connection, which we always need again right away. So even in the best case we end up opening/closing this about once per second and each daemon takes up about ~60 outbound ports when it should take up ~1. So, make two adjustments: - First, only close connections which we haven't issued a query on in the last 60 seconds. This should prevent us from closing connections that we'll need again immediately in most cases. In the worst case, we shouldn't be eating up any extra ports under default TCP behavior. - Second, explicitly close connections. We were relying on implicit/GC behavior (maybe as a holdover from very long ago, before we got connection wrappers in place?), which probably did about the same thing but isn't as predictable and can't be profiled or instrumented. Test Plan: This is somewhat difficult to test completely convincingly in isolation since the problem behavior depends on production scales and the workload, and to some degree on configuration. I tested that this stuff baiscally works by adding logging to connect/close and running the daemons, verifying that they churned connections a lot before this change (e.g., ~1/s even at no load) and churn rarely afterward (e.g., almost never at no load). I ran some workload through them to make sure I didn't completely break anything. The best real test is just seeing how production responds. Current inbound/outbound connections on `secure001` are 1,200: ``` secure001 $ netstat -t \| grep :mysql \| wc -l 1164 ``` Current outbound from `repo001` are 18,600: ``` repo001 $ netstat -t \| grep :mysql \| wc -l 18663 ``` Reviewers: chad Reviewed By: chad Maniphest Tasks: T11458 Differential Revision: https://secure.phabricator.com/D16389	2016-08-11 12:03:56 -07:00
epriestley	4068ee2a75	Make permanent worker failures more user-friendly Summary: Ref T11309. In that task, a user misunderstood two parts of this error: - They took "exception" to mean "unexpected failure", when it was intended to mean "rare circumstance". - They intereted the internal ID number of a commit to mean that Phabricator was malfunctioning. Make the language of this condition more direct, explaining what the situation means in greater detail. Additionally, we would previously re-throw this exception, which would make the daemon exit, wait a moment, and restart. This was normal and expected. When //unexpected// failures occur, it's important do to this: it prevents a daemon failing in a loop from causing too many side effects (e.g., limit of 1 email per 5 seconds instead of thousands per second). When expected, permanent failures occur, we do not need to do this: the task will not be retried. I just did it because it was slightly more consistent ("failures restart daemons") and we had few permanent failure types at the time. We have more now, and restarting the daemons generates some additional logs which have the potential to confuse. Cycling the daemon also (intentionally) reduces the rate at which we process tasks, which can be bad for permanent failures like "deleted commit" because users can delete a huge number of commits and possibly clog up the queue with cycle-after-failure actions. Test Plan: Tried to process a deleted commit, saw a new message: ``` 2016-07-11 9:30:22 AM [STDE] <VERB> PhabricatorTaskmasterDaemon Task 1428658 was cancelled: Commit "R55:6c46b7d0fb82a859ca3f87a95dc8dcceef8088c9" (with internal ID "282161") is no longer reachable from any branch, tag, or ref in this repository, so it will not be imported. This usually means that the branch the commit was on was deleted or overwritten. ``` Reviewers: chad Reviewed By: chad Maniphest Tasks: T11309 Differential Revision: https://secure.phabricator.com/D16268	2016-07-11 09:21:39 -07:00
epriestley	c510c925cf	Allow worker tasks to be cancelled by classname Summary: Ref T3554. Makes `bin/worker cancel --class <classname>` work (cancel all tasks with that type). This is useful in development if your queue is full of a bunch of gunk, and a need has occasionally arisen in production environments (usually "one option is cancel everything and move on"). Test Plan: Ran `bin/worker cancel` to cancel blocks of tasks by class name. Reviewers: chad Reviewed By: chad Maniphest Tasks: T3554 Differential Revision: https://secure.phabricator.com/D16267	2016-07-11 09:21:16 -07:00
Aviv Eyal	a3bb35e9d2	make Trigger Daemon sleep correctly when one-time triggers exist Summary: Trigger daemon is trying to find the next event to invoke before sleeping, but the query includes already-elapsed triggers. It then tries to sleep for 0 seconds. Test Plan: On a new instance, schedule a single trigger of type `PhabricatorOneTimeTriggerClock` to a very near time. Use top to see trigger daemon not going to 100% CPU once the event has elapsed. Reviewers: #blessed_reviewers, epriestley Subscribers: Korvin Differential Revision: https://secure.phabricator.com/D15750	2016-04-18 14:17:10 -07:00
epriestley	601aaa5a86	Modularize content sources Summary: Ref T10537. For Nuance, I want to introduce new sources (like "GitHub" or "GitHub via Nuance" or something) but this needs to modularize eventually. Split ContentSource apart so applications can add new content sources. Test Plan: This change has huge surface area, so I'll hold it until post-release. I think it's fairly safe (and if it does break anything, the breaks should be fatals, not anything subtle or difficult to fix), there's just no reason not to hold it for a few hours. - Viewed new module page. - Grepped for all removed functions/constants. - Viewed some transactions. - Hovered over timestamps to get content source details. - Added a comment via Conduit. - Added a comment via web. - Ran `bin/storage upgrade --namespace XXXXX --no-quickstart -f` to re-run all historic migrations. - Generated some objects with `bin/lipsum`. - Ran a bulk job on some tasks. - Ran unit tests. {F1190182} Reviewers: chad Reviewed By: chad Maniphest Tasks: T10537 Differential Revision: https://secure.phabricator.com/D15521	2016-03-26 11:59:45 -07:00
epriestley	de23ba0002	Fix a minor issue in Nuance which could cause the trigger daemon to poll too often Summary: Ref T10537. Currently, when you have at least two cursors, the daemon can poll too frequently when processing the last source because it never hits the end-of-list condition. Test Plan: - Ran `bin/phd debug trigger`. - Observed huge volumes of output before change as triggers fired as fast as possible. - Observed reasonable poll frequency after change. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10537 Differential Revision: https://secure.phabricator.com/D15464	2016-03-12 05:04:42 -08:00
epriestley	2a3c3b2b98	Provide `bin/nuance import` and ngram indexes for sources Summary: Ref T10537. More infrastructure: - Put a `bin/nuance` in place with `bin/nuance import`. This has no useful behavior yet. - Allow sources to be searched by substring. This supports `bin/nuance import --source whatever` so you don't have to dig up PHIDs. Test Plan: - Applied migrations. - Ran `bin/nuance import --source ...` (no meaningful effect, but works fine). - Searched for sources by substring in the UI. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10537 Differential Revision: https://secure.phabricator.com/D15436	2016-03-08 10:30:24 -08:00
epriestley	3f4cc3ad6e	Allow Nuances sources to provide import cursors Summary: Ref T10537. Some sources (like the future "GitHub Repository" source) need to poll remotes. - Provide a mechanism for sources to emit import cursors. - Hook them into the trigger daemon so they'll fire periodically. - Provide some storage. This diff does nothing useful or interesting, and is pure infrastructure. Test Plan: - Ran `bin/storage upgrade -f`, no adjustment issues. - Poked around Nuance. - Ran the trigger daemon, verified it didn't crash and checked for Nuance stuff to do. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10537 Differential Revision: https://secure.phabricator.com/D15435	2016-03-08 10:30:04 -08:00
epriestley	abb4c03b47	Remove shouldShowSubscribersProperty() from SubscribableInterface Summary: Every caller returns `true`. This was added a long time ago for Projects, but projects are no longer subscribable. I don't anticipate needing this in the future. Test Plan: Grepped for this method. Reviewers: chad Reviewed By: chad Differential Revision: https://secure.phabricator.com/D15409	2016-03-06 06:01:36 -08:00
Sébastien Santoro	a4db6f387d	Fix typo: discsussions → discussions Test Plan: Read again the sentence. Reviewers: joshuaspence, #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley Differential Revision: https://secure.phabricator.com/D15316	2016-02-21 01:51:03 -08:00
epriestley	5c2e49a812	Allow any user to watch any project they can see Summary: Ref T6183. Ref T10054. Historically, only members could watch projects because there were some weird special cases with policies. These policy issues have been resolved and Herald is generally powerful enough to do equivalent watches on most objects anyway. Also puts a "Watch Project" button on the feed panel to make the behavior and meaning more obvious. Test Plan: - Watched a project I was not a member of. - Clicked the feed watch/unwatch button. {F1064909} Reviewers: chad Reviewed By: chad Maniphest Tasks: T6183, T10054 Differential Revision: https://secure.phabricator.com/D15063	2016-01-19 19:38:30 -08:00
epriestley	96b1665eaa	Link "continue" action to confirm dialog in bulk jobs that are unconfirmed Summary: See Q266. Test Plan: Created a bulk job, clicked "Details" instead of "Confirm", clicked "Continue" to get back to confirmation dialog. Reviewers: chad Reviewed By: chad Differential Revision: https://secure.phabricator.com/D14985	2016-01-10 10:55:58 -08:00
epriestley	4bba3fd4c1	Fully modularize DestructionEngine Summary: Ref T9979. Convert all DestructionEngine behaviors to extensions. Test Plan: {F1033244} Destroyed an object, verifying: - Herald transcripts were destroyed; - edges were destroyed; - flags were destroyed; - tokens were destroyed; - transactions were destroyed; - worker tasks were cancelled. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9979 Differential Revision: https://secure.phabricator.com/D14832	2015-12-21 17:03:44 -08:00
epriestley	e9af4f8970	Fix an issue where Drydock followup tasks would not queue if the main task failed Summary: Ref T9994. This fixes the first issue discussed on that task, which is that when a merge fails after "arc land", we would not clean up all the leases properly. Specifically, when a merge fails, we use `queueTask()` to schedule a followup task. This followup destroys the lease and frees the underlying resource. However, the default behavior of `queueTask()` is to //not queue tasks// if the parent task fails. This is a reasonable, safe behavior that was originally introduced in D8774, where it kept us from sending too much mail if a task did "send some mail" and then failed a little later on and got retried. Since I think the default behavior is correct, I just special cased the behavior for Drydock to make it queue even on failure. These are the only types of followup tasks we currently want to queue on main task failure. (It's possible that future Blueprints might want some kind of more specialized behavior, where some tasks queue only on success, but we can cross that bridge when we come to it.) Test Plan: - See T9994#149878 for test case setup. - I ran that test case again with this patch, and saw the followup task queue properly in the `--trace` log, a correspoinding update task show up in `/daemon/`, and the lease get destroyed when I ran it a moment later. {F1029915} Reviewers: chad Reviewed By: chad Maniphest Tasks: T9994 Differential Revision: https://secure.phabricator.com/D14818	2015-12-18 08:17:04 -08:00
epriestley	b964f8873b	Fix daemon restart behavior to check once every 10 seconds Summary: This logic is flipped. Test Plan: - Before change: ran `bin/phd debug task`, saw queries to the config table every second. - After change: ran `bin/phd debug task`, saw queries to the config table every 10 seconds. Reviewers: chad, joshuaspence Reviewed By: chad, joshuaspence Differential Revision: https://secure.phabricator.com/D14542	2015-11-23 05:59:04 -08:00
epriestley	2e09a93dc1	Improve efficiency of worker task GC for huge loads Summary: Fixes T9808. An instance imported a very large repository, generating approximately 4 million tasks over the course of a few days. A week later, these tasks started expiring and became candidates for garbage collection. The GC works by deleting 100 rows at at time over and over again. It finds the rows it's going to delete by querying for old rows. Currently, this query generates a `WHERE dateCreated < X ORDER BY id DESC` query. This query can not efficiently execute using a single key, because it relies on `dateCreated` order to find the rows, then on `id` order to sort them. With a table with 4M rows, this is slow. This would still be OK, except that the query has to execute a lot of times since it only deletes 100 rows each time. Particularly, it needs to execute a total of ~40K times. Instead, generate `WHERE dateCreated < X ORDER BY dateCreated DESC, id DESC`. This should have the same effect in general and the GC definitely doesn't care about the difference, but it should be more efficient at large scales. Test Plan: I had to `TRUNCATE` the problem table so I don't have a perfect repro to completely convincingly test this anymore. Both queries behave fine at small scales, which is why we haven't seen this before. I was able to run the newer query in production before I nuked the table and have it complete in a reasonable amount of time, while the old query hung longer than I wanted to wait (several minutes?). The query plan for the new query was also a good one, while the query plan for the old query was terrible. I loaded the daemon console and ran `bin/garbage collect --collector worker.tasks --trace`. I verified the queries looked reasonable and produced reasonable results in production. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9808 Differential Revision: https://secure.phabricator.com/D14505	2015-11-17 17:05:10 -08:00
Joshua Spence	a07a8aca24	Add a daemon overseer module to restart daemons when config changes Summary: Fixes T7053. Depends on D14452. Test Plan: Created a custom daemon which dumps out the config hash (by querying `PhabricatorEnv::calculateEnvironmentHash()`). Ran this daemon with `./bin/phd debug PhabricatorDebugDaemon` and saw the config hash update within 30 seconds. {P1886} Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin Maniphest Tasks: T7053 Differential Revision: https://secure.phabricator.com/D14458	2015-11-11 08:44:18 +11:00
Joshua Spence	495cb7a2e0	Mark `PhabricatorPHIDType::getPHIDTypeApplicationClass()` as abstract Summary: Fixes T9625. As explained in a `TODO` comment, seems reasonable enough. Test Plan: Unit tests. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: Korvin, hach-que Maniphest Tasks: T9625 Differential Revision: https://secure.phabricator.com/D14068	2015-11-03 06:47:12 +11:00
epriestley	de2bbfef7d	Allow PhabricatorWorker->queueTask() to take full $options Summary: Ref T9252. Currently, `queueTask()` accepts `$priority` as its third argument. Allow it to take a full range of `$options` instead. This API just never got updated after we expanded avialable options. Arguably this whole API should be some kind of "TaskQueueRequest" object but I'll leave that for another day. Test Plan: - Grepped for `queueTask()` and verified no other callsites are affected by this API change. - Ran some daemons. - See also next diff. Reviewers: hach-que, chad Reviewed By: hach-que, chad Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14235	2015-10-05 09:46:29 -07:00
epriestley	4cf1270ecd	In Harbormaster, make sure artifacts are destroyed even if a build is aborted Summary: Ref T9252. Currently, Harbormaster and Drydock work like this in some cases: # Queue a lease for activation. # Then, a little later, save the lease PHID somewhere. # When the target/resource is destroyed, destroy the lease. However, something can happen between (1) and (2). In Drydock this window is very short and the "something" would have to be a lighting strike or something similar, but in Harbormaster we wait until the resource activates to do (2) so the window can be many minutes long. In particular, a user can use "Abort Build" during those many minutes. If they do, the target is destroyed but it doesn't yet have a record of the artifact, so the artifact isn't cleaned up. Make these things work like this instead: # Create a new lease and pre-generate a PHID for it. # Save that PHID as something that needs to be cleaned up. # Queue the lease for activation. # When the target/resource is destroyed, destroy the lease if it exists. This makes sure there's no step in the process where we might lose track of a lease/resource. Also, clean up and standardize some other stuff I hit. Test Plan: - Stopped daemons. - Restarted a build in Harbormaster. - Stepped through the build one stage at a time using `bin/worker execute ...`. - After the lease was queued, but before it activated, aborted the build. - Processed the Harbormaster side of things only. - Saw the lease get destroyed properly. Reviewers: chad, hach-que Reviewed By: hach-que Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14234	2015-10-05 05:58:53 -07:00
epriestley	9c798e5cca	Provide `bin/garbage` for interacting with garbage collection Summary: Fixes T9494. This: - Removes all the random GC.x.y.z config. - Puts it all in one place that's locked and which you use `bin/garbage set-policy ...` to adjust. - Makes every TTL-based GC configurable. - Simplifies the code in the actual GCs. Test Plan: - Ran `bin/garbage collect` to collect some garbage, until it stopped collecting. - Ran `bin/garbage set-policy ...` to shorten policy. Saw change in web UI. Ran `bin/garbage collect` again and saw it collect more garbage. - Set policy to indefinite and saw it not collect garabge. - Set policy to default and saw it reflected in web UI / `collect`. - Ran `bin/phd debug trigger` and saw all GCs fire with reasonable looking queries. - Read new docs. {F857928} Reviewers: chad Reviewed By: chad Maniphest Tasks: T9494 Differential Revision: https://secure.phabricator.com/D14219	2015-10-02 09:17:24 -07:00
epriestley	878a493301	Begin standardizing garbage collectors Summary: Ref T9494. Improve support infrastructure for garbage collectors. Test Plan: - Ran `bin/phd debug trigger`, saw collectors execute. {F857852} Reviewers: chad Reviewed By: chad Maniphest Tasks: T9494 Differential Revision: https://secure.phabricator.com/D14218	2015-10-01 16:58:43 -07:00
epriestley	4496176924	Add staging area support to Harbormaster/Drydock + various fixes Summary: Ref T9252. This primarily allows Harbormaster to request (and Drydock to fulfill) working copies with a patch from a staging area. Doing this means we can do builds on in-review changes from `arc diff`. This is a little cobbled-together but should basically work. Also fix some other issues: - Yielded, awakend workers are fine to update but could complain. - We can't log slot lock failures to resources if we don't end up saving them. - Killing the transaction would wipe out the log. - Fix some TODOs, etc. Test Plan: Ran Harbormaster builds on a local revision. Reviewers: hach-que, chad Reviewed By: chad Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14214	2015-10-01 16:55:01 -07:00
epriestley	4ac82be5ed	Merge the DrydockLease workers into a single worker Summary: Ref T9252. This is the same as D14201, but for lease stuff instead of resource stuff. This one is a little heavier but still feels pretty reasonable to me at the end of the day (worker is <1K lines and has a ton of comment stuff). Also fixes a few random bugs I hit in the task queue. Test Plan: - Restarted some Harbormaster builds, saw them go through cleanly. - Released pre-activation resources/leases. - Probably still kinda buggy but I'll iron the details out over time. Logs are starting to look somewhat plausible: {F855747} Reviewers: chad Reviewed By: chad Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14202	2015-10-01 08:11:02 -07:00
epriestley	55767aac0f	Fix an issue where followup tasks could fail to queue with string priorities Auditors: chad	2015-09-28 19:46:41 -07:00
epriestley	bfaa93aa9b	Allow Harbormaster build plans to request additional working copies Summary: Ref T9123. To run upstream builds in Harbormaster/Drydock, we need to be able to check out `libphutil`, `arcanist` and `phabricator` next to one another. This adds an "Also Clone: ..." field to Harbormaster working copy build steps so I can type all three repos into it and get a proper clone with everything we need. This is somewhat upstream-centric and a bit narrow, but I don't think it's totally unreasonable, and most of the underlying stuff is relatively general. This adds some more typechecking and improves data/type handling for custom fields, too. In particular, it prevents users from entering an invalid/restricted value in a field (for example, you can't "Also Clone" a repository you don't have permission to see). Test Plan: Restarted build, got a Drydock resource with multiple repositories in it. Reviewers: chad Reviewed By: chad Maniphest Tasks: T9123 Differential Revision: https://secure.phabricator.com/D14183	2015-09-28 17:57:41 -07:00
epriestley	9b29d46e60	Make Drydock lease infrastructure more nimble Summary: Ref T9252. Currently, Harbormaster does this when trying to acquire a working copy: - Ask for a working copy. - Yield for 15 seconds. - Check if we have a working copy yet. That's OK, but Drydock takes ~1s to acquire a working copy lease if a resource is already available, so we end up doing this: - T+0: Ask for a working copy. - T+0: Yield for 15 seconds. - T+1: Working copy lease activates. - T+15: Working copy lease is used. - T+16: Build finishes. So we end up spending about 2 seconds doing work and 14 seconds sleeping. One way to fix this would be to fiddle with the yield duration, so we yield for 1, 2, 4, ... seconds or something. This probably isn't a bad idea for longer leases (i.e., wait for 15, 30, 45 ... seconds or similar) but it implies a lot of churn for short leases. Instead, let tasks "awaken" other tasks when they complete. The "awaken" operation means: if a task is in a yielded state (no failures, no owner, explicitly yielded, future expires time), pretend it only yielded until right now instead of whenever it really yielded to. Basically, this rewrites history so that even though Harbormaster did a `yield(15)`, we pretend it did a `yield(4)` after we activate the lease if lease activation took 4 seconds. If this misses, it's fine: we fall back to the normal yield behavior and things move forward normally a few seconds later. If it hits, we get a more nimble process pretty cleanly. Test Plan: - Restarted a build plan (lease working copy + run `ls`) with this patch no-op'd, took about 16 seconds. - Restarted a build plan with this patch active, took about 1 second. Reviewers: hach-que, chad Reviewed By: chad Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14178	2015-09-28 09:35:40 -07:00
epriestley	ec6d69e74d	Give Drydock resources a proper expiry mechanism Summary: Fixes T6569. This implements an expiry mechanism for Drydock resources which parallels the mechanism for leases. A few things are missing that we'll probably need in the future: - An "EXPIRES" command to update the expiration time. This would let resources be permanent while leased, then expire after, say, 24 hours without any leases. - A callback like `shouldActuallyExpireRightNow()` for resources and leases that lets them decide not to expire at the last second. - A callback like `didAcquireLease()` for resource blueprints, to parallel `didReleaseLease()`, letting them clear or extend their timer. However, this stuff would mostly just let us tune behaviors, not really open up new capabilities. Test Plan: Changed host resources to expire after 60 seconds, leased one, saw it vanish 60 seconds later. Reviewers: hach-que, chad Reviewed By: chad Maniphest Tasks: T6569 Differential Revision: https://secure.phabricator.com/D14176	2015-09-28 09:35:14 -07:00
epriestley	3379904237	Allow Drydock leases to expire after a time limit Summary: Ref T6569. If a lease is activated with an expiration date, schedule a task to try to clean it up after that time. Test Plan: - Used `bin/drydock lease ... --until ...` to activate a lease in the near future. - Waited for a bit. - Saw it expire and get destroyed at the scheduled time. Reviewers: hach-que, chad Reviewed By: chad Maniphest Tasks: T6569 Differential Revision: https://secure.phabricator.com/D14148	2015-09-23 13:54:27 -07:00
epriestley	fcb6d1e2fa	Strip some obsolete code out of Drydock Summary: Ref T9252. This simplifies some Drydock code. Most of this code relates to the old notion of Drydock being able to enumerate all the tasks it needs to complete in order to acquire a lease. The code has stepped back from this, since it's unnecessary, the queue is more powerful than it used to be, and it would be a lot of work to keep track of. The ~only thing that should ever wait for leases in modern code is `bin/drydock lease`, and it's fine for it to just sit there sleeping, so this just does that. This reduces the granularity of logging, but I'll address that separately in future logging-focused changes. Test Plan: Used `bin/drydock lease` to acquire a lease, saw it acquire cleanly. Reviewers: hach-que, chad Reviewed By: chad Maniphest Tasks: T9252 Differential Revision: https://secure.phabricator.com/D14147	2015-09-23 13:21:41 -07:00

1 2 3 4 5 ...

312 commits