phorge-phorge/src/infrastructure/daemon/PhabricatorDaemon.php

<?php

abstract class PhabricatorDaemon extends PhutilDaemon {

  protected function willRun() {
    parent::willRun();

    $phabricator = phutil_get_library_root('phabricator');
    $root = dirname($phabricator);
    require_once $root.'/scripts/__init_script__.php';
  }

  protected function willSleep($duration) {
    LiskDAO::closeInactiveConnections(60);
    return;
  }

  public function getViewer() {
    return PhabricatorUser::getOmnipotentUser();
  }


  /**
   * Format a command so it executes as the daemon user, if a daemon user is
   * defined. This wraps the provided command in `sudo -u ...`, roughly.
   *
   * @param   PhutilCommandString Command to execute.
   * @return  PhutilCommandString `sudo` version of the command.
   */
  public static function sudoCommandAsDaemonUser($command) {
    $user = PhabricatorEnv::getEnvConfig('phd.user');
    if (!$user) {
      // No daemon user is set, so just run this as ourselves.
      return $command;
    }

    // We may reach this method while already running as the daemon user: for
    // example, active and passive synchronization of clustered repositories
    // run the same commands through the same code, but as different users.

    // By default, `sudo` won't let you sudo to yourself, so we can get into
    // trouble if we're already running as the daemon user unless the host has
    // been configured to let the daemon user run commands as itself.

    // Since this is silly and more complicated than doing this check, don't
    // use `sudo` if we're already running as the correct user.
    if (function_exists('posix_getuid')) {
      $uid = posix_getuid();
      $info = posix_getpwuid($uid);
      if ($info && $info['name'] == $user) {
        return $command;
      }
    }

    // Get the absolute path so we're safe against the caller wiping out
    // PATH.
    $sudo = Filesystem::resolveBinary('sudo');
    if (!$sudo) {
      throw new Exception(pht("Unable to find 'sudo'!"));
    }

    // Flags here are:
    //
    //   -E: Preserve the environment.
    //   -n: Non-interactive. Exit with an error instead of prompting.
    //   -u: Which user to sudo to.

    return csprintf('%s -E -n -u %s -- %C', $sudo, $user, $command);
  }

}
Rough cut of repository tracking Summary: Basic scaffolding for repository tracking, plus daemon infrastructure (Timelines, Cursors) and some fixes (memory usage, mysql_connect() junk). Test Plan: parsed Javelin git commit history via daemon Reviewers: CC: 2011-03-07 07:29:22 +01:00			`<?php`

			`abstract class PhabricatorDaemon extends PhutilDaemon {`

			`protected function willRun() {`
			`parent::willRun();`

			`$phabricator = phutil_get_library_root('phabricator');`
			`$root = dirname($phabricator);`
Merge __init_env__.php into __init_script__.php Summary: There are currently two files, but all scripts require both of them, which is clearly silly. In the longer term I want to rewrite all of this init stuff to be more structured (e.g., merge webroot/index.php and __init_script__ better) but this reduces the surface area of the ad-hoc "include files" API we have now, at least. Test Plan: - Grepped for __init_env__.php (no hits) - Ran a unit test (to test unit changes) - Ran a daemon (to test daemon changes) Reviewers: jungejason, nh, tuomaspelkonen, aran Reviewed By: jungejason CC: aran, jungejason Differential Revision: 976 2011-10-01 17:59:42 +02:00			`require_once $root.'/scripts/__init_script__.php';`
Rough cut of repository tracking Summary: Basic scaffolding for repository tracking, plus daemon infrastructure (Timelines, Cursors) and some fixes (memory usage, mysql_connect() junk). Test Plan: parsed Javelin git commit history via daemon Reviewers: CC: 2011-03-07 07:29:22 +01:00			`}`
Close all DB connections when Daemons sleep Summary: Fixes T2933 Test Plan: As guided by Evan - by setting $tasks = array(); in PhabricatorTaskmasterDaemon.php and running 'phd debug taskmaster' and 'show full processlist' on mysql as root. No extra connections detected. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T2933 Differential Revision: https://secure.phabricator.com/D5654 2013-04-10 23:52:29 +02:00
			`protected function willSleep($duration) {`
In taskmaster daemons, only close connections which were not used recently Summary: Ref T11458. Depends on D16388. Currently, we're very aggressive about closing connections in the taskmaster daemons. This can end up taking up a lot of resources. In particular, because the outgoing port for outbound connections normally can not be reused for 60 seconds after a connection closes, we may exhaust outbound ports on the host if there's a big queue full of stuff that's being processed very quickly. At a minimum, we //always// are holding open a `worker` connection, which we always need again right away. So even in the best case we end up opening/closing this about once per second and each daemon takes up about ~60 outbound ports when it should take up ~1. So, make two adjustments: - First, only close connections which we haven't issued a query on in the last 60 seconds. This should prevent us from closing connections that we'll need again immediately in most cases. In the worst case, we shouldn't be eating up any extra ports under default TCP behavior. - Second, explicitly close connections. We were relying on implicit/GC behavior (maybe as a holdover from very long ago, before we got connection wrappers in place?), which probably did about the same thing but isn't as predictable and can't be profiled or instrumented. Test Plan: This is somewhat difficult to test completely convincingly in isolation since the problem behavior depends on production scales and the workload, and to some degree on configuration. I tested that this stuff baiscally works by adding logging to connect/close and running the daemons, verifying that they churned connections a lot before this change (e.g., ~1/s even at no load) and churn rarely afterward (e.g., almost never at no load). I ran some workload through them to make sure I didn't completely break anything. The best real test is just seeing how production responds. Current inbound/outbound connections on `secure001` are 1,200: ``` secure001 $ netstat -t \| grep :mysql \| wc -l 1164 ``` Current outbound from `repo001` are 18,600: ``` repo001 $ netstat -t \| grep :mysql \| wc -l 18663 ``` Reviewers: chad Reviewed By: chad Maniphest Tasks: T11458 Differential Revision: https://secure.phabricator.com/D16389 2016-08-11 17:47:47 +02:00			`LiskDAO::closeInactiveConnections(60);`
Close all DB connections when Daemons sleep Summary: Fixes T2933 Test Plan: As guided by Evan - by setting $tasks = array(); in PhabricatorTaskmasterDaemon.php and running 'phd debug taskmaster' and 'show full processlist' on mysql as root. No extra connections detected. Reviewers: epriestley Reviewed By: epriestley CC: aran, Korvin Maniphest Tasks: T2933 Differential Revision: https://secure.phabricator.com/D5654 2013-04-10 23:52:29 +02:00			`return;`
			`}`
Remove PhabricatorRepository::loadAllByPHIDOrCallsign() Summary: Ref T603. Move to real Query classes. Test Plan: - Ran `phd debug pull X` (where `X` does not match a repository). - Ran `phd debug pull Y` (where `Y` does match a repository). - Ran `phd debug pull`. - Ran `repository pull`. - Ran `repository pull X`. - Ran `repository pull Y`. - Ran `repository discover`. - Ran `repository delete`. - Ran `grep`. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T603 Differential Revision: https://secure.phabricator.com/D7137 2013-09-26 21:36:24 +02:00
			`public function getViewer() {`
			`return PhabricatorUser::getOmnipotentUser();`
			`}`

Add "phd.user" with `sudo` hooks for SSH/HTTP writes Summary: Ref T2230. When fully set up, we have up to three users who all need to write into the repositories: - The webserver needs to write for HTTP receives. - The SSH user needs to write for SSH receives. - The daemons need to write for "git fetch", "git clone", etc. These three users don't need to be different, but in practice they are often not likely to all be the same user. If for no other reason, making them all the same user requires you to "git clone httpd@host.com", and installs are likely to prefer "git clone git@host.com". Using three different users also allows better privilege separation. Particularly, the daemon user can be the //only// user with write access to the repositories. The webserver and SSH user can accomplish their writes through `sudo`, with a whitelisted set of commands. This means that even if you compromise the `ssh` user, you need to find a way to escallate from there to the daemon user in order to, e.g., write arbitrary stuff into the repository or bypass commit hooks. This lays some of the groundwork for a highly-separated configuration where the SSH and HTTP users have the fewest privileges possible and use `sudo` to interact with repositories. Some future work which might make sense: - Make `bin/phd` respect this (require start as the right user, or as root and drop privileges, if this configuration is set). - Execute all `git/hg/svn` commands via sudo? Users aren't expected to configure this yet so I haven't written any documentation. Test Plan: Added an SSH user ("dweller") and gave it sudo by adding this to `/etc/sudoers`: dweller ALL=(epriestley) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack Then I ran git pushes and pulls over SSH via "dweller@localhost". They successfully interacted with the repository on disk as the "epriestley" user. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7589 2013-11-18 17:58:35 +01:00
			`/**`
			`* Format a command so it executes as the daemon user, if a daemon user is`
			* defined. This wraps the provided command in `sudo -u ...`, roughly.
			`*`
			`* @param PhutilCommandString Command to execute.`
			* @return PhutilCommandString `sudo` version of the command.
			`*/`
			`public static function sudoCommandAsDaemonUser($command) {`
			`$user = PhabricatorEnv::getEnvConfig('phd.user');`
			`if (!$user) {`
			`// No daemon user is set, so just run this as ourselves.`
			`return $command;`
			`}`

When already running as the daemon user, don't "sudo" daemon commands Summary: The cluster synchronization code runs either actively (before returning a response to `git clone`, for example) or passively (routinely, as the daemons update reposiories). The active sync runs as the web user (if running `git clone http://...`) or the VCS user (if running `git clone ssh://...`). But the passive sync runs as the daemon user. All of these sync processes need to run actual commands as the daemon user (`git fetch ...`). For the active ones, we must `sudo`. For the passive ones, we're already the right user. We run the same code, and end up trying to sudo to ourselves, which `sudo` isn't happy about by default. Depending on how `sudo` is configured and which users things are running as this might work anyway, but it's silly and if it doesn't work it requires you to go make non-obvious, weird config changes that are unintuitive and somewhat nonsensical. This is probably worse on the balance than adding a bit of complexity to the code. Instead, test which user we're running as. If it's already the right user, don't sudo. Test Plan: - Ran `bin/repository update --trace` as daemon user, saw no more `sudo`. - Ran a `git clone` to make sure that didn't break. Reviewers: chad, avivey Reviewed By: avivey Differential Revision: https://secure.phabricator.com/D16391 2016-08-12 01:02:57 +02:00			`// We may reach this method while already running as the daemon user: for`
			`// example, active and passive synchronization of clustered repositories`
			`// run the same commands through the same code, but as different users.`

			// By default, `sudo` won't let you sudo to yourself, so we can get into
			`// trouble if we're already running as the daemon user unless the host has`
			`// been configured to let the daemon user run commands as itself.`

			`// Since this is silly and more complicated than doing this check, don't`
			// use `sudo` if we're already running as the correct user.
			`if (function_exists('posix_getuid')) {`
			`$uid = posix_getuid();`
			`$info = posix_getpwuid($uid);`
			`if ($info && $info['name'] == $user) {`
			`return $command;`
			`}`
			`}`

Add "phd.user" with `sudo` hooks for SSH/HTTP writes Summary: Ref T2230. When fully set up, we have up to three users who all need to write into the repositories: - The webserver needs to write for HTTP receives. - The SSH user needs to write for SSH receives. - The daemons need to write for "git fetch", "git clone", etc. These three users don't need to be different, but in practice they are often not likely to all be the same user. If for no other reason, making them all the same user requires you to "git clone httpd@host.com", and installs are likely to prefer "git clone git@host.com". Using three different users also allows better privilege separation. Particularly, the daemon user can be the //only// user with write access to the repositories. The webserver and SSH user can accomplish their writes through `sudo`, with a whitelisted set of commands. This means that even if you compromise the `ssh` user, you need to find a way to escallate from there to the daemon user in order to, e.g., write arbitrary stuff into the repository or bypass commit hooks. This lays some of the groundwork for a highly-separated configuration where the SSH and HTTP users have the fewest privileges possible and use `sudo` to interact with repositories. Some future work which might make sense: - Make `bin/phd` respect this (require start as the right user, or as root and drop privileges, if this configuration is set). - Execute all `git/hg/svn` commands via sudo? Users aren't expected to configure this yet so I haven't written any documentation. Test Plan: Added an SSH user ("dweller") and gave it sudo by adding this to `/etc/sudoers`: dweller ALL=(epriestley) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack Then I ran git pushes and pulls over SSH via "dweller@localhost". They successfully interacted with the repository on disk as the "epriestley" user. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7589 2013-11-18 17:58:35 +01:00			`// Get the absolute path so we're safe against the caller wiping out`
			`// PATH.`
			`$sudo = Filesystem::resolveBinary('sudo');`
			`if (!$sudo) {`
			`throw new Exception(pht("Unable to find 'sudo'!"));`
			`}`

			`// Flags here are:`
			`//`
			`// -E: Preserve the environment.`
			`// -n: Non-interactive. Exit with an error instead of prompting.`
			`// -u: Which user to sudo to.`

			`return csprintf('%s -E -n -u %s -- %C', $sudo, $user, $command);`
			`}`

Rough cut of repository tracking Summary: Basic scaffolding for repository tracking, plus daemon infrastructure (Timelines, Cursors) and some fixes (memory usage, mysql_connect() junk). Test Plan: parsed Javelin git commit history via daemon Reviewers: CC: 2011-03-07 07:29:22 +01:00			`}`