phorge-phorge/src/applications/diffusion/ssh/DiffusionSSHMercurialServeWorkflow.php

<?php

final class DiffusionSSHMercurialServeWorkflow
  extends DiffusionSSHMercurialWorkflow {

  protected $didSeeWrite;

  public function didConstruct() {
    $this->setName('hg');
    $this->setArguments(
      array(
        array(
          'name' => 'repository',
          'short' => 'R',
          'param' => 'repo',
        ),
        array(
          'name' => 'stdio',
        ),
        array(
          'name' => 'command',
          'wildcard' => true,
        ),
      ));
  }

  protected function executeRepositoryOperations() {
    $args = $this->getArgs();
    $path = $args->getArg('repository');
    $repository = $this->loadRepository($path);

    $args = $this->getArgs();

    if (!$args->getArg('stdio')) {
      throw new Exception('Expected `hg ... --stdio`!');
    }

    if ($args->getArg('command') !== array('serve')) {
      throw new Exception('Expected `hg ... serve`!');
    }

    $command = csprintf('hg -R %s serve --stdio', $repository->getLocalPath());
    $command = PhabricatorDaemon::sudoCommandAsDaemonUser($command);

    $future = id(new ExecFuture('%C', $command))
      ->setEnv($this->getEnvironment());

    $io_channel = $this->getIOChannel();
    $protocol_channel = new DiffusionSSHMercurialWireClientProtocolChannel(
      $io_channel);

    $err = id($this->newPassthruCommand())
      ->setIOChannel($protocol_channel)
      ->setCommandChannelFromExecFuture($future)
      ->setWillWriteCallback(array($this, 'willWriteMessageCallback'))
      ->execute();

    // TODO: It's apparently technically possible to communicate errors to
    // Mercurial over SSH by writing a special "\n<error>\n-\n" string. However,
    // my attempt to implement that resulted in Mercurial closing the socket and
    // then hanging, without showing the error. This might be an issue on our
    // side (we need to close our half of the socket?), or maybe the code
    // for this in Mercurial doesn't actually work, or maybe something else
    // is afoot. At some point, we should look into doing this more cleanly.
    // For now, when we, e.g., reject writes for policy reasons, the user will
    // see "abort: unexpected response: empty string" after the diagnostically
    // useful, e.g., "remote: This repository is read-only over SSH." message.

    if (!$err && $this->didSeeWrite) {
      $repository->writeStatusMessage(
        PhabricatorRepositoryStatusMessage::TYPE_NEEDS_UPDATE,
        PhabricatorRepositoryStatusMessage::CODE_OKAY);
    }

    return $err;
  }

  public function willWriteMessageCallback(
    PhabricatorSSHPassthruCommand $command,
    $message) {

    $command = $message['command'];

    // Check if this is a readonly command.

    $is_readonly = false;
    if ($command == 'batch') {
      $cmds = idx($message['arguments'], 'cmds');
      if (DiffusionMercurialWireProtocol::isReadOnlyBatchCommand($cmds)) {
        $is_readonly = true;
      }
    } else if (DiffusionMercurialWireProtocol::isReadOnlyCommand($command)) {
      $is_readonly = true;
    }

    if (!$is_readonly) {
      $this->requireWriteAccess();
      $this->didSeeWrite = true;
    }

    // If we're good, return the raw message data.
    return $message['raw'];
  }

}
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00			`<?php`

			`final class DiffusionSSHMercurialServeWorkflow`
			`extends DiffusionSSHMercurialWorkflow {`

			`protected $didSeeWrite;`

			`public function didConstruct() {`
			`$this->setName('hg');`
			`$this->setArguments(`
			`array(`
			`array(`
			`'name' => 'repository',`
			`'short' => 'R',`
			`'param' => 'repo',`
			`),`
			`array(`
			`'name' => 'stdio',`
			`),`
			`array(`
			`'name' => 'command',`
			`'wildcard' => true,`
			`),`
			`));`
			`}`

Support serving SVN repositories over SSH Summary: Ref T2230. The SVN protocol has a sensible protocol format with a good spec here: http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol Particularly, compare this statement to the clown show that is the Mercurial wire protocol: > It is possible to parse an item without knowing its type in advance. WHAT A REASONABLE STATEMENT TO BE ABLE TO MAKE ABOUT A WIRE PROTOCOL Although it makes substantially more sense than Mercurial, it's much heavier-weight than the Git or Mercurial protocols, since it isn't distributed. It's also not possible to figure out if a request is a write request (or even which repository it is against) without proxying some of the protocol frames. Finally, several protocol commands embed repository URLs, and we need to reach into the protocol and translate them. Test Plan: Ran various SVN commands over SSH (`svn log`, `svn up`, `svn commit`, etc). Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7556 2013-11-11 12:19:06 -08:00			`protected function executeRepositoryOperations() {`
			`$args = $this->getArgs();`
			`$path = $args->getArg('repository');`
			`$repository = $this->loadRepository($path);`
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00
			`$args = $this->getArgs();`

			`if (!$args->getArg('stdio')) {`
Change double quotes to single quotes. Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected. Test Plan: Eyeballed it. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin, hach-que Differential Revision: https://secure.phabricator.com/D9431 2014-06-09 11:36:49 -07:00			throw new Exception('Expected `hg ... --stdio`!');
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00			`}`

			`if ($args->getArg('command') !== array('serve')) {`
Change double quotes to single quotes. Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected. Test Plan: Eyeballed it. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin, hach-que Differential Revision: https://secure.phabricator.com/D9431 2014-06-09 11:36:49 -07:00			throw new Exception('Expected `hg ... serve`!');
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00			`}`

Add "phd.user" with `sudo` hooks for SSH/HTTP writes Summary: Ref T2230. When fully set up, we have up to three users who all need to write into the repositories: - The webserver needs to write for HTTP receives. - The SSH user needs to write for SSH receives. - The daemons need to write for "git fetch", "git clone", etc. These three users don't need to be different, but in practice they are often not likely to all be the same user. If for no other reason, making them all the same user requires you to "git clone httpd@host.com", and installs are likely to prefer "git clone git@host.com". Using three different users also allows better privilege separation. Particularly, the daemon user can be the //only// user with write access to the repositories. The webserver and SSH user can accomplish their writes through `sudo`, with a whitelisted set of commands. This means that even if you compromise the `ssh` user, you need to find a way to escallate from there to the daemon user in order to, e.g., write arbitrary stuff into the repository or bypass commit hooks. This lays some of the groundwork for a highly-separated configuration where the SSH and HTTP users have the fewest privileges possible and use `sudo` to interact with repositories. Some future work which might make sense: - Make `bin/phd` respect this (require start as the right user, or as root and drop privileges, if this configuration is set). - Execute all `git/hg/svn` commands via sudo? Users aren't expected to configure this yet so I haven't written any documentation. Test Plan: Added an SSH user ("dweller") and gave it sudo by adding this to `/etc/sudoers`: dweller ALL=(epriestley) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack Then I ran git pushes and pulls over SSH via "dweller@localhost". They successfully interacted with the repository on disk as the "epriestley" user. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7589 2013-11-18 08:58:35 -08:00			`$command = csprintf('hg -R %s serve --stdio', $repository->getLocalPath());`
			`$command = PhabricatorDaemon::sudoCommandAsDaemonUser($command);`
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00
Support Mercurial pretxnchangegroup hooks Summary: Ref T4189. Fixes T2066. Mercurial has a //lot// of hooks so I'm not 100% sure this is all we need to install (we may need separate hooks for tags/bookmarks) but it should cover most of what we're after at least. Test Plan: - `bin/repository pull`'d a Mercurial repo and got a hook install. - Pushed to a Mercurial repository over SSH and HTTP, with good/bad hooks. Saw hooks fire. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2066, T4189 Differential Revision: https://secure.phabricator.com/D7685 2013-12-02 15:46:03 -08:00			`$future = id(new ExecFuture('%C', $command))`
			`->setEnv($this->getEnvironment());`
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00
Add "phd.user" with `sudo` hooks for SSH/HTTP writes Summary: Ref T2230. When fully set up, we have up to three users who all need to write into the repositories: - The webserver needs to write for HTTP receives. - The SSH user needs to write for SSH receives. - The daemons need to write for "git fetch", "git clone", etc. These three users don't need to be different, but in practice they are often not likely to all be the same user. If for no other reason, making them all the same user requires you to "git clone httpd@host.com", and installs are likely to prefer "git clone git@host.com". Using three different users also allows better privilege separation. Particularly, the daemon user can be the //only// user with write access to the repositories. The webserver and SSH user can accomplish their writes through `sudo`, with a whitelisted set of commands. This means that even if you compromise the `ssh` user, you need to find a way to escallate from there to the daemon user in order to, e.g., write arbitrary stuff into the repository or bypass commit hooks. This lays some of the groundwork for a highly-separated configuration where the SSH and HTTP users have the fewest privileges possible and use `sudo` to interact with repositories. Some future work which might make sense: - Make `bin/phd` respect this (require start as the right user, or as root and drop privileges, if this configuration is set). - Execute all `git/hg/svn` commands via sudo? Users aren't expected to configure this yet so I haven't written any documentation. Test Plan: Added an SSH user ("dweller") and gave it sudo by adding this to `/etc/sudoers`: dweller ALL=(epriestley) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack Then I ran git pushes and pulls over SSH via "dweller@localhost". They successfully interacted with the repository on disk as the "epriestley" user. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7589 2013-11-18 08:58:35 -08:00			`$io_channel = $this->getIOChannel();`
Enable Mercurial reads and writes over SSH Summary: Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol. Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong. A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct. Test Plan: - Ran `hg clone` over SSH. - Ran `hg fetch` over SSH. - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success). Reviewers: btrahan, asherkin Reviewed By: btrahan CC: aran Maniphest Tasks: T2230 Differential Revision: https://secure.phabricator.com/D7553 2013-11-11 12:18:27 -08:00			`$protocol_channel = new DiffusionSSHMercurialWireClientProtocolChannel(`
			`$io_channel);`

			`$err = id($this->newPassthruCommand())`
			`->setIOChannel($protocol_channel)`
			`->setCommandChannelFromExecFuture($future)`
			`->setWillWriteCallback(array($this, 'willWriteMessageCallback'))`
			`->execute();`

			`// TODO: It's apparently technically possible to communicate errors to`
			`// Mercurial over SSH by writing a special "\n<error>\n-\n" string. However,`
			`// my attempt to implement that resulted in Mercurial closing the socket and`
			`// then hanging, without showing the error. This might be an issue on our`
			`// side (we need to close our half of the socket?), or maybe the code`
			`// for this in Mercurial doesn't actually work, or maybe something else`
			`// is afoot. At some point, we should look into doing this more cleanly.`
			`// For now, when we, e.g., reject writes for policy reasons, the user will`
			`// see "abort: unexpected response: empty string" after the diagnostically`
			`// useful, e.g., "remote: This repository is read-only over SSH." message.`

			`if (!$err && $this->didSeeWrite) {`
			`$repository->writeStatusMessage(`
			`PhabricatorRepositoryStatusMessage::TYPE_NEEDS_UPDATE,`
			`PhabricatorRepositoryStatusMessage::CODE_OKAY);`
			`}`

			`return $err;`
			`}`

			`public function willWriteMessageCallback(`
			`PhabricatorSSHPassthruCommand $command,`
			`$message) {`

			`$command = $message['command'];`

			`// Check if this is a readonly command.`

			`$is_readonly = false;`
			`if ($command == 'batch') {`
			`$cmds = idx($message['arguments'], 'cmds');`
			`if (DiffusionMercurialWireProtocol::isReadOnlyBatchCommand($cmds)) {`
			`$is_readonly = true;`
			`}`
			`} else if (DiffusionMercurialWireProtocol::isReadOnlyCommand($command)) {`
			`$is_readonly = true;`
			`}`

			`if (!$is_readonly) {`
			`$this->requireWriteAccess();`
			`$this->didSeeWrite = true;`
			`}`

			`// If we're good, return the raw message data.`
			`return $message['raw'];`
			`}`

			`}`