1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-11 16:16:14 +01:00
Commit graph

49 commits

Author SHA1 Message Date
epriestley
dda5c13ef5 Parse "shallow" frames in the Git "upload-pack" wire protocol parser
Summary:
Fixes T13309. If you void the warranty on a repository on disk and turn it into a shallow clone, Phabricator currently can't serve it.

We don't support hosting shallow working copies, but we should still parse and proxy the protocol rather than breaking in a mysterious way.

Test Plan:
  - Created a shallow working copy with `mv X X.full; git clone --depth Y file://.../X.full X` in the storage directory on disk.
  - Cloned it with `git clone <uri>`.
  - Deleted all the refs inside it so the wire only has "shallow" frames; cloned it.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13309

Differential Revision: https://secure.phabricator.com/D20577
2019-06-18 15:14:52 -07:00
epriestley
904dbf0db6 Make the "git upload-pack" proxy more robust
Summary:
Depends on D20381. Ref T8093. This makes minor improvements to the protocol proxy to handle cases where we add, remove, or replace refs and may need to move the "capabilities" section.

Rather than invoking a callback on every ref: parse the whole ref list into a data structure, mutate it if necessary (in a future diff), then dump it back into wire format.

This allows us to shift the capabilities data (which needs to be coupled with the first ref) around if we modify the first ref, and reorder the reflist alphabetically like git does.

When the server has no refs, Git sends no capabilities data. This is easy to emulate, just surprising.

Test Plan:
Tested the cases not covered by D20381:

  - Fetching where the fetch actually fetches data.
  - `ls-remote` when we hide the first ref (capabilities data properly moves to the first visible ref).
  - `ls-remote` when the remote is empty (we just drop the capabilities frame completely).

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T8093

Differential Revision: https://secure.phabricator.com/D20436
2019-04-18 05:04:05 -07:00
epriestley
e08ba99dd3 Proxy the "git upload-pack" wire protocol
Summary:
Depends on D20380. Ref T8093. When prototypes are enabled, inject a (hopefully?) no-op proxy into the Git wire protocol.

This proxy decodes "git upload-pack" and allows the list of references to be rewritten, in a similar way to how we already proxy the Subversion protocol to rewrite URIs and proxy the Mercurial protocol to distinguish between read and write operations.

The piece we care about comes at the beginning, and looks like this:

```
<frame-length><ref-hash> <ref-name>\0<server-capabilities>\n
<frame-length><ref-hash> <ref-name>\n
<frame-length><ref-hash> <ref-name>\n
...
<0000>
```

We can add, remove, or modify this section to make it appear that the server has different refs than the refs that exist on disk.

Things I have tried:

  - `git ls-remote`
  - `git ls-remote` where the server hides some refs.
  - `git fetch` where the fetch is a no-op.

Things I have not tried:

  - `git fetch` where the fetch is not a no-op.
  - Tricking things into doing protocol v2. Or: I tried this, I wasn't successful. In v2, additional "\0" tricks are used to hide data in the capabilities, I think?
  - `git ls-remote` where we rewrite/hide the first ref in the list, and need to move the capabilities frame elsewhere.
  - `git ls-remote` where the server has no refs at all, or we remove every ref.

So the "interesting" piece of this works, but it almost certainly needs some cleanup to survive interaction with the real world.

Test Plan: See above.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T8093

Differential Revision: https://secure.phabricator.com/D20381
2019-04-18 04:57:51 -07:00
epriestley
d254c1f8b1 Use "null", not "-1", as a local "no version" marker when performing intracluster repository sync
Summary:
Ref T13242. See <https://discourse.phabricator-community.org/t/out-of-range-value-for-column-deviceversion/2218>.

The synchronization log column is `uint32?` and `-1` doesn't go into that column.

Since we're only using `-1` for convenience to cheat through `$max_version > $this_version` checks, use `null` instead and make the checks more explicit.

Test Plan: Reproducing this is a bit tricky and I cheated fairly heavily to force the code down this pathway without actually building a multi-device cluster, but I did reproduce the original exception, apply the patch, and observe that it fixed things.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13242

Differential Revision: https://secure.phabricator.com/D20047
2019-01-28 18:50:01 -08:00
epriestley
c86c5749ba Make the repository "Filesize Limit" and "Clone/Fetch Timeout" configurable in the UI
Summary: Depends on D19830. Ref T13216. See PHI908. See PHI750. See PHI885. Allow users to configure a filesize limit, and allow them to adjust the clone/fetch timeout.

Test Plan:
{F6021356}

  - Configured a filesize limit and pushed, hit it. Made the limit larger and pushed, change went through.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19831
2018-11-28 14:34:00 -08:00
epriestley
cb033673b6 Unify intracluster sync and Drydock working copy construction timeouts as a repository "copy time limit"
Summary:
Depends on D19814. Ref T13216. See PHI885. For various eldritch reasons, `git fetch` can hang. Although we'd probably like to fix this with `git fetch --require-sustained-network-transfer-rate=512KB/5s` or similar, that flag doesn't exist and we don't have a reasonable way to build it.

Short of that, move toward formalizing a repository "copy time limit": the longest amount of time anything may spend trying to make a copy of this repository.

This grows out of the existing intracluster sync limit, which is effectively the same thing. Here, apply it to `git clone` and `git fetch` in Drydock working copy construction, too. A future change may make it configurable.

Test Plan:
  - Set the limit to 0.001.
  - Tried to build and lease working copies, got sensible timeout errors (see D19815).

```
<Activation Failed> Lease activation failed: [CommandException] Command killed by timeout after running for more than 0.001 seconds.
COMMAND
ssh '-o' 'LogLevel=quiet' '-o' 'StrictHostKeyChecking=no' '-o' 'UserKnownHostsFile=/dev/null' '-o' 'BatchMode=yes' -l '********' -p '2222' -i '********' '127.0.0.1' -- '(cd '\''/var/drydock/workingcopy-163/repo/spellbook/'\'' && git clean -d --force && git fetch && git reset --hard)'
```

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19816
2018-11-16 13:08:12 -08:00
epriestley
1d7c960531 Put push log "hookWait" to data export and add all wait values to UI
Summary:
Depends on D19797. Ref T13216.

  - Put the new `hookWait` in the export and the UI.
  - Put the existing waits in the UI, not just the export.
  - Make order consistent: host, write, read, hook (this is the order the timers start in).

Test Plan: Pushed some stuff, viewed web UI and saw sensible numbers, exported data and got the same values.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19798
2018-11-10 04:47:38 -08:00
epriestley
c32fa06266 Use phutil_microseconds_since(...) to simplify some timing arithmetic
Summary: Depends on D19796. Simplify some timing code by using phutil_microseconds_since() instead of duplicate casting and arithmetic.

Test Plan: Grepped for `1000000` to find these. Pulled, pushed, made a conduit call. This isn't exhaustive but it should be hard for these to break in a bad way since they're all just diagnostic.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D19797
2018-11-08 16:46:32 -08:00
epriestley
966db4d38e Add an intracluster synchronization log for cluster repositories
Summary:
Depends on D19778. Ref T13216. See PHI943, PHI889, et al.

We currently have a push log and a pull log, but do not separately log intracluster synchronization events. We've encountered several specific cases where having this kind of log would be helpful:

  - In PHI943, an install was accidentally aborting locks early. Having timing information in the sync log would let us identify this more quickly.
  - In PHI889, an install hit an issue with `MaxStartups` configuration in `sshd`. A log would let us identify when this is an issue.
  - In PHI889, I floated a "push the linux kernel + fetch timeout" theory. A sync log would let us see sync/fetch timeouts to confirm this being a problem in practice.
  - A sync log will help us assess, develop, test, and monitor intracluster routing sync changes (likely those in T13211) in the future.

Some of these events are present in the pull log already, but only if they make it as far as running a `git upload-pack` subprocess (not the case with `MaxStartups` problems) -- and they can't record end-to-end timing.

No UI yet, I'll add that in a future change.

Test Plan:
  - Forced all operations to synchronize by adding `|| true` to the version check.
  - Pulled, got a sync log in the database.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19779
2018-11-07 18:24:20 -08:00
epriestley
bc6c8c0e93 Explicitly shuffle nodes before selecting one for cluster sync
Summary:
Depends on D19734. Ref T13202. Ref T13109. Ref T10884. See PHI905. See PHI889. We currently rank cluster nodes in three cases:

  # when performing a write, we can go to any node (D19734 should make our ranking good);
  # when performing a read, we can go to any node (currently random, but T10884 discusses ideas to improve our ranking);
  # when performing an internal synchronization before a read or a write, we must go to an up-to-date node.

Currently, case (3) is not-exactly-deterministic but not random, and we won't spread intracluster traffic acrosss the cluster evenly if, say, half of it is up to date and half of it is still synchronizing. For a given write, I believe all nodes will tend to synchronize from whichever node first received the write today.

Instead, shuffle the list and synchronize from any up-to-date node.

(I think we could improve upon this only by knowing which nodes actually have load and selecting the least-loaded -- doable, but not trivial.)

Test Plan: Poked at it locally, will deploy to `secure`. This is hard to measure/test terribly convincingly.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13202, T13109, T10884

Differential Revision: https://secure.phabricator.com/D19735
2018-10-17 08:11:23 -07:00
Austin McKinley
dbf2302b6c Fix self-cancelling typo
Summary: Ref D18268. This typo cancelled itself out, and I can't find any other callers.

Test Plan: arc unit

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin

Differential Revision: https://secure.phabricator.com/D19718
2018-09-28 17:44:33 -07:00
epriestley
021c612cb2 When we fail to acquire a repository lock, try to provide a hint about why
Summary:
Ref T13202. See PHI889. If the lock log is enabled, we can try to offer more details about lock holders.

When we fail to acquire a lock:

  - check for recent acquisitions and suggest that this is a bottleneck issue;
  - if there are no recent acquisitions, check for the last acquisition and print details about it (what process, how long ago, whether or not we believe it was released).

Test Plan:
  - Enabled the lock log.
  - Changed the lock wait time to 1 second.
  - Added a `sleep(10)` after grabbing the lock.
  - In one window, ran a Conduit call or a `git fetch`.
  - In another window, ran another operation.
  - Got useful/sensible errors for both ssh and web lock holders, for example:

> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=12609, host=orbital-3.local, sapi=apache2handler, controller=PhabricatorConduitAPIController, method=diffusion.rawdiffquery) 3 second(s) ago. There is no record of this lock being released.)

> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=65251, host=orbital-3.local, sapi=cli, argv=/Users/epriestley/dev/core/lib/phabricator/bin/ssh-exec --phabricator-ssh-device local.phacility.net --phabricator-ssh-key 2) 2 second(s) ago. There is no record of this lock being released.)

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13202

Differential Revision: https://secure.phabricator.com/D19702
2018-09-24 15:20:07 -07:00
epriestley
7ef2bb1b56 Support Mercurial "protocaps" wire command
Summary:
Ref T13187. See PHI834. Mercurial has somewhat-recently (changeset is from Jan 2018) introduced a new "protocaps" command, that appears in Mercurial 4.7 and possibly before then.

We must explicitly enumerate all protocol commands because you can't decode the protocol without knowing how many arguments the command expects, so enumerate it.

(Also fix an issue where the related error message had an extra apostrophe.)

Test Plan:
  - Ran `hg clone ...` with client and server on Mercurial 4.7.
  - Before: fatal on unknown "protocaps" command.
  - Midway: better typography in error message.
  - After: clean clone.

Reviewers: amckinley

Maniphest Tasks: T13187

Differential Revision: https://secure.phabricator.com/D19596
2018-08-23 15:06:25 -07:00
epriestley
3544620209 Parse unusual Subversion protocol frames which contain extra whitespace
Summary:
Fixes T13140. See PHI660.

Recent versions of Subversion can send a `(get-file true false  false )` protocol frame with extra space between "false" and "false". This is allowed by the protocol spec, but never normally happens, and we do not parse it correctly.

Instead, parse it correctly.

Test Plan:
  - Added unit tests.
  - Ran `svn proplist svn+ssh://.../diffusion/X/file.c` under SVN 1.10 before and after the change.
    - Before: indefinite hang.
    - After: completed in finite time.

Reviewers: amckinley, asherkin

Reviewed By: amckinley, asherkin

Maniphest Tasks: T13140

Differential Revision: https://secure.phabricator.com/D19451
2018-05-16 17:12:41 -07:00
epriestley
df3c937dab Record lock timing information on PushEvents
Summary:
Depends on D19249. Ref T13109. Add timing information to the `PushEvent`:

  - `writeWait`: Time spent waiting for a write lock.
  - `readWait`: Time spent waiting for a read lock.
  - `hostWait`: Roughly, total time spent on the leaf node.

The primary goal here is to see if `readWait` is meaningful in the wild. If it is, that motivates smarter routing, and the value of smarter routing can be demonstrated by looking for a reduction in read wait times.

Test Plan: Pushed some stuff, saw reasonable timing values in the table. Saw timing information in "Export Data".

Maniphest Tasks: T13109

Differential Revision: https://secure.phabricator.com/D19250
2018-03-22 13:46:01 -07:00
epriestley
859b274970 Provide more information to users during git push while waiting for write locks
Summary:
Ref T13109. Make it slightly more clear what the scope of the write and read locks are, and slightly more clear that we're actively acquiring locks, not just sitting around waiting.

While waiting on another writer, show who we're waiting on so you can walk over to their desk and glare at them.

Test Plan:
Added `sleep(15)` after `willWrite()`. Pushed in two windows. Saw new, more informative messages. In the second window, saw the new guidance:

> # Waiting for hector to finish writing (on device "repo1.local.phacility.net" for 11s)...

Reviewers: asherkin

Reviewed By: asherkin

Subscribers: asherkin

Maniphest Tasks: T13109

Differential Revision: https://secure.phabricator.com/D19247
2018-03-22 13:42:18 -07:00
epriestley
7e43b74055 Give all commands from DiffusionCommandEngine a default 15 minute timeout
Summary:
Ref T13108. See PHI364. See the task and issue for discussion.

If a `git fetch` during synchronization hangs, the whole node currently hangs. While the causes of a `git fetch` hang aren't clear, we don't expect synchronization to ever reasonably take more than 15 minutes, so add a default timeout.

Test Plan: Will deploy and observe; this is difficult to reproduce or test directly.

Maniphest Tasks: T13108

Differential Revision: https://secure.phabricator.com/D19235
2018-03-16 17:22:03 -07:00
epriestley
7e7720803c Set GIT_SSH_VARIANT alongside GIT_SSH
A recent version of Git has changed some piece of behavior here and we
now get "fatal: ssh variant 'simple' does not support setting port"
when using a port. Explicitly setting GIT_SSH_VARIANT to `ssh` likely
fixes this.
2018-01-26 13:21:10 -08:00
epriestley
a7921a4448 Filter and reject "--config" and "--debugger" flags to Mercurial in any position
Summary:
Ref T13012. These flags can be exploited by attackers to execute code remotely. See T13012 for discussion and context.

Additionally, harden some Mercurial commands where possible (by using additional quoting or embedding arguments in other constructs) so they resist these flags and behave properly when passed arguments with these values.

Test Plan:
  - Added unit tests.
  - Verified "--config" and "--debugger" commands are rejected.
  - Verified more commands now work properly even with branches and files named `--debugger`, although not all of them do.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13012

Differential Revision: https://secure.phabricator.com/D18769
2017-11-10 08:42:07 -08:00
Dmitri Iouchtchenko
9bd6a37055 Fix spelling
Summary: Noticed a couple of typos in the docs, and then things got out of hand.

Test Plan:
  - Stared at the words until my eyes watered and the letters began to swim on the screen.
  - Consulted a dictionary.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D18693
2017-10-09 10:48:04 -07:00
epriestley
794e185bf9 Pass SSH wrappers to VCS commands unconditonally, not just if there's an SSH remote
Summary:
Ref T12961. In Mercurial, it's possible to have "subrepos" which may use a different protocol than the main repository.

By putting an SSH repository inside an HTTP repository, an attacker can theoretically get us to execute `hg` without overriding `ui.ssh`, then execute code via the SSH hostname attack.

As an immediate mitigation to this attack, specify `ui.ssh` unconditionally. Normally, this will have no effect (it will just be ignored). In the specific case of an SSH repo inside an HTTP repo, it will defuse the `ssh` protocol.

For good measure and consistency, do the same for Subversion and Git. However, we don't normally maintain working copies for either Subversion or Git so it's unlikely that similar attacks exist there.

Test Plan:
  - Put an SSH subrepo with an attack URI inside an HTTP outer repo in Mercurial.
  - Ran `hg up` with and without `ui.ssh` specified.
  - Got dangerous badness without `ui.ssh` and safe `ssh` subprocesses with `ui.ssh`.

I'm not yet able to confirm that `hg pull -u -- <uri>` can actually trigger this, but this can't hurt and our SSH wrapper is safer than the native behavior for all Subversion, Git and Mercurial versions released prior to today.

Reviewers: chad

Reviewed By: chad

Subscribers: cspeckmim

Maniphest Tasks: T12961

Differential Revision: https://secure.phabricator.com/D18389
2017-08-10 17:49:55 -07:00
epriestley
8034b9d819 Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:

  - In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
  - We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
  - If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.

Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.

If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.

Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:

```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
  return null;
}
```

Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12893

Differential Revision: https://secure.phabricator.com/D18273
2017-07-25 05:12:10 -07:00
epriestley
b8cd5b0eb8 Use a less-esoteric spelling of "capabilities" in several places
Summary: This spelling can definitely feel a little overplayed at times, but I still think it's a gold standard in spellings of "capabilities".

Test Plan: Felt old and uncool.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D18215
2017-07-12 15:27:57 -07:00
epriestley
0d5538672c Detect unsynchronizable repositories on multiple cluster hosts
Summary:
Ref T12613. Currently, the SVNTEST and HGTEST repositories are improperly configured on `secure`. These repositories use VCS systems which do not support synchronization, so they can not be served from cluster services with multiple hosts.

However, I've incorrectly configured them the same way as all the Git repositories, which support synchronization. This causes about 50% of requests to randomly fail (when they reach the wrong host).

Detect this issue and warn the user that the configuration is not valid.

It should be exceptionally difficult for normal installs to run into this.

Test Plan:
  - Mostly faked these conditions locally, verified that `secure` really has this configuration.
  - I'll push this, verify that the issue is detected correctly in production, then fix the config which should resolve the intermittent issues with SVNTEST.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12613

Differential Revision: https://secure.phabricator.com/D17774
2017-04-24 10:43:05 -07:00
Josh Cox
e0675b28d8 Pass exception to PhutilProxyException
Summary: Fixes T12243. That error occured due to network flakiness with some mounted filesystems so I'm not sure how best to simulate it. But you can look and see that the PhutilProxyException does indeed expect an exception as its second arg.

Test Plan: Look at method signature... look at callsite... now back at the method. Smile and nod.

Reviewers: #blessed_reviewers, yelirekim, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley

Maniphest Tasks: T12243

Differential Revision: https://secure.phabricator.com/D17335
2017-02-08 13:24:44 -05:00
epriestley
ccff47682f Provide more useful guidance if a repository is clusterized into an existing multi-device cluster
Summary:
Fixes T12087. When transitioning into a clustered configuration for the first time, the documentation recommends using a one-device cluster as a transitional step.

However, installs may not do this for whatever reason, and we aren't as clear as we could be in warning about clusterizing directly into a multi-device cluster.

Roughly, when you do this, we end up believing that working copies exist on several different devices, but have no information about which copy or copies are up to date. //Usually// they all were already synchronized and are all up to date, but we can't make this assumption safely without risking data.

Instead, we err on the side of caution, and require a human to tell us which copy we should consider to be up-to-date, using `bin/repository thaw --promote`.

Test Plan:
```
$ ./bin/repository clusterize rLOCKS --service repos001.phacility.net
Service "repos001.phacility.net" is actively bound to more than one device
(local002.local, local001.phacility.net).

If you clusterize a repository onto this service it will be unclear which
devices have up-to-date copies of the repository. This leader/follower
ambiguity will freeze the repository. You may need to manually promote a
device to unfreeze it. See "Ambiguous Leaders" in the documentation for
discussion.

    Continue anyway? [y/N]
```

Read other changes.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12087

Differential Revision: https://secure.phabricator.com/D17169
2017-01-10 12:45:55 -08:00
epriestley
005d8493b0 Pass GIT_ENVIRONMENTAL_MAGIC through to hook subprocesses to support Git 2.11.0
Summary:
Fixes T11940. In 2.11.0, Git has made a change so that newly-pushed changes are held in a temporary area until the hook accepts or rejects them.

This magic temporary area is only readable if the appropriate `GIT_ENVIRONMENTAL_MAGIC` variables are available. When executing `git` commands, pass them through from the calling context.

We're intentionally conservative about which variables we pass, and with good reason (see "httpoxy" in T11359). I think this continues to be the correct default behavior.

Test Plan:
  - Upgraded to Git 2.11.0.
  - Tried to push over SSH, got a hook error.
  - Applied patch.
  - Pulled and pushed over SSH.
  - Pulled and pushed over HTTP.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11940

Differential Revision: https://secure.phabricator.com/D16988
2016-12-05 12:45:30 -08:00
epriestley
0ed767b967 Fix a couple of partition migration bugs
Summary:
Ref T11044. Few issues here:

  - The `PhutilProxyException` is missing an argument (hit this while in read-only mode).
  - The `$ref_key` is unused.
  - When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied.

Test Plan:
  - Poked this locally, but will initialize `secure004` as an empty master to be sure.

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16916
2016-11-22 10:57:24 -08:00
epriestley
4dc37bcee0 Ignore repository versions on inactive devices in "Repository Servers" panel in Config
Summary:
Fixes T11590. Currently, we incorrectly consider cluster repository versions that are (or were) on devices which are no longer part of the active cluster service when building this status screen.

Instead, ignore them. This is just a display bug; the actual `ClusterEngine` already had similar logic.

Test Plan:
  - Added a bad leader record to `repository_workingcopyversion`.
  - Before patch, got a bad "Partial (1w)" sync:

{F1802292}

  - After patch, got a good "Sycnchronized":

{F1802293}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11590

Differential Revision: https://secure.phabricator.com/D16492
2016-09-05 11:10:16 -07:00
epriestley
39d4e21eec Fix a bad DiffusionCommandEngine parameter from HTTPEngine conversion
Summary:
I converted this call incorrectly in D16092. We should pass the `PhutilURI` object, not the string version of it.

Specifically, this resulted in hitting an error like this if a replica needed synchronization:

```
[2016-08-11 21:22:37] EXCEPTION: (InvalidArgumentException) Argument 1 passed to DiffusionCommandEngine::setURI() must be an instance of PhutilURI, string given, called in...
#0 PhutilErrorHandler::handleError(integer, string, string, integer, array) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionCommandEngine.php:52]
#1 DiffusionCommandEngine::setURI(string) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:601]
...
```

Test Plan: Clusterized an observed repository, demoted a node, ran `bin/repository update Rxxx` to update, saw no typehint fatal.

Reviewers: chad

Reviewed By: chad

Differential Revision: https://secure.phabricator.com/D16390
2016-08-11 16:41:09 -07:00
epriestley
55a698a28a Use HTTPEngineExtension proxy for git HTTP operations
Summary: Ref T10227. When we perform `git` http operations (fetch, mirror) check if we should use a proxy; if we should, set `http_proxy` or `https_proxy` in the environment to make `git` have `curl` use it.

Test Plan:
  - Configured a proxy extension to run stuff through a local instance of Charles.
  - Ran `repository pull` and `repository mirror`.
  - Saw `git` HTTP requests route through the proxy.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10227

Differential Revision: https://secure.phabricator.com/D16092
2016-06-09 12:17:10 -07:00
epriestley
f5f784f4c1 Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.

We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.

For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").

This implements a simple way to version an observed repository:

  - Take the head of every branch/tag.
  - Look them up.
  - Pick the biggest internal ID number.

This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.

Test Plan:
  - Created an observed repository.
  - Ran `bin/repository update` and observed a sensible version number appear in the version table.
  - Pushed to the remote, did another update, saw a sensible update.
  - Did an update with no push, saw no effect on version number.
  - Toggled repository to hosted, saw the version reset.
  - Simulated read traffic to out-of-sync node, saw it do a remote fetch.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15986
2016-05-30 09:53:01 -07:00
epriestley
bb16a1b0e2 Fix a possible fatal on the first push to a cluster repository
Summary:
Fixes T11020. I think this resolves things -- `$new_version` (set above) should be used, not `$new_log` directly.

Specifically, we would get into trouble if the initial push failed for some reason (working copy not initialized yet, commit hook rejected, etc).

Test Plan: Made a bad push to a new repository. Saw it freeze before the patch and succeed afterwards.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11020

Differential Revision: https://secure.phabricator.com/D15969
2016-05-23 17:54:54 -07:00
epriestley
de4312bcde Before executing svnserve, change the CWD to a readable directory
Summary: Fixes T10941. This avoids a confusing dead end when configuring Subversion hosting, where `svnserve` will fail to execute hooks if the CWD isn't readable by the vcs-user.

Test Plan:
  - Updated and committed in a hosted SVN repository.
  - Ran some git operations, too.
  - @dpotter confirmed this locally in T10941.

Reviewers: chad

Reviewed By: chad

Subscribers: dpotter

Maniphest Tasks: T10941

Differential Revision: https://secure.phabricator.com/D15879
2016-05-11 06:48:18 -07:00
epriestley
99718b61d8 Fill in new URI credential edit web UI interfaces
Summary:
Ref T10748. Ref T10366. Allows users to set credential for new URIs.

  - Ref T7221. Our handling of the "git://" protocol is currently incorrect. This protocol is not authenticated, but is considered an SSH protocol. In the new UI, it is considered an anonymous/unauthenticated protocol instead.
  - Ref T10241. This fixes the `PassphraseCredentialControl` so it doesn't silently edit the value if the current value is not visible to you and/or not valid.

Test Plan:
Performed a whole lot of credential edits, removals, and adjustments. I'll give this additional vetting before cutting over to it.

{F1253207}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T7221, T10241, T10366, T10748

Differential Revision: https://secure.phabricator.com/D15829
2016-05-02 04:26:13 -07:00
epriestley
892a9a1f07 Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.

If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.

We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.

Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.

Basically, the changes are:

  - If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
  - Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
  - Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.

Test Plan:
  - Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
  - Pushed like this:

```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
 1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```

  - Here, I stopped `mysqld` from the CLI in another terminal window.

```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```

  - Here, I started `mysqld` again.

```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
   2cbf87c..707ecc3  master -> master
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10860

Differential Revision: https://secure.phabricator.com/D15792
2016-04-25 11:37:31 -07:00
epriestley
d0b5dac36b Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.

This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.

Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
 1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
   8616189..ca5efff  master -> master
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10860

Differential Revision: https://secure.phabricator.com/D15791
2016-04-25 11:20:57 -07:00
epriestley
dc75b4bd06 Move all cluster locking logic to a separate class
Summary: Ref T10860. This doesn't change anything, it just separates all this stuff out of `PhabricatorRepository` since I'm planning to add a bit more state to it and it's already pretty big and fairly separable.

Test Plan: Pulled, pushed, browsed Diffusion.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10860

Differential Revision: https://secure.phabricator.com/D15790
2016-04-25 11:20:29 -07:00
epriestley
d87c500002 Synchronize (hosted, clustered, Git) repositories over Conduit + HTTP
Summary:
Ref T4292. We currently synchronize hosted, clustered, Git repositories when we receive an SSH pull or push.

Additionally:

  - Synchronize before HTTP reads and writes.
  - Synchronize reads before Conduit requests.

We could relax Conduit eventually and allow Diffusion to say "it's OK to give me stale data".

We could also redirect some set of these actions to just go to the up-to-date host instead of connecting to a random host and synchronizing it. However, this potentially won't work as well at scale: if you have a larger number of servers, it sends all of the traffic to the leader immediately following a write. That can cause "thundering herd" issues, and isn't efficient if replicas are in different geographical regions and the write just went to the east coast but most clients are on the west coast. In large-scale cases, it's better to go to the local replica, wait for an update, then serve traffic from it -- particularly given that writes are relatively rare. But we can finesse this later once things are solid.

Test Plan:
  - Pushed and pulled a Git repository over HTTP.
  - Browsed a Git repository from the web UI.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15758
2016-04-19 13:05:45 -07:00
epriestley
c70f4815a9 Allow cluster devices to SSH to one another without acting as a user
Summary:
Ref T4292. When you run `git fetch` and connect to, say, `repo001.west.company.com`, we'll look at the current version of the repository in other nodes in the cluster.

If `repo002.east.company.com` has a newer version of the repository, we'll fetch that version first, then respond to your request.

To do this, we need to run `git fetch repo002.east.company.com ...` and have that connect to the other host and be able to fetch data.

This change allows us to run `PHABRICATOR_AS_DEVICE=1 git fetch ...` to use device credentials to do this fetch. (Device credentials are already supported and used, they just always connect as a user right now, but these fetches should be doable without having a user. We will have a valid user when you run `git fetch` yourself, but we won't have one if the daemons notice that a repository is out of date and want to update it, so the update code should not depend on having a user.)

Test Plan:
```
$ PHABRICATOR_AS_DEVICE=1 ./bin/ssh-connect local.phacility.com
Warning: Permanently added 'local.phacility.com' (RSA) to the list of known hosts.
PTY allocation request failed on channel 0
phabricator-ssh-exec: Welcome to Phabricator.

You are logged in as device/daemon.phacility.net.

You haven't specified a command to run. This means you're requesting an interactive shell, but Phabricator does not provide an interactive shell over SSH.

Usually, you should run a command like `git clone` or `hg push` rather than connecting directly with SSH.

Supported commands are: conduit, git-lfs-authenticate, git-receive-pack, git-upload-pack, hg, svnserve.
Connection to local.phacility.com closed.
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292

Differential Revision: https://secure.phabricator.com/D15755
2016-04-19 13:04:41 -07:00
epriestley
575c01373e Extract repository command construction from Repositories
Summary:
Ref T4292. Ref T10366. Depends on D15751. Today, generating repository commands is purely a function of the repository, so they use protocols and credentials based on the repository configuration.

For example, a repository with an SSH "remote URI" always generate SSH "remote commands".

This needs to change in the future:

  - After T10366, repositories won't necessarily just have one type of remote URI. They can only have one at a time still, but the repository itself won't change based on which one is currently active.
  - For T4292, I need to generate intracluster commands, regardless of repository configuration. These will have different protocols and credentials.

Prepare for these cases by separating out command construction, so they'll be able to generate commands in a more flexible way.

Test Plan:
  - Added unit tests.
  - Browsed diffusion.
  - Ran `bin/phd debug pull` to pull a bunch of repos.
  - Ran daemons.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T4292, T10366

Differential Revision: https://secure.phabricator.com/D15752
2016-04-19 04:51:48 -07:00
Christopher Speck
32d4ae8cb2 Added an intercept to Mercurial's capabilities command to remove bundle2.
Summary:
If Mercurial 3.4+ is used to host repositories in Phabricator, any clients using 3.5+ will receive an exception after the bundle is pushed up. Clients will also fail to update phases for changesets pushed up.

Before directly responding to mercurial clients with all capabilities, this change filters out the 'bundle2' capability so the client negotiates using a legacy bundle wire format instead.

Test Plan:
Server: Mercurial 3.5
Client: Mercurial 3.4

Test with both HTTP and SSH protocols:
1. Create a local commit on client
2. Push commit to server
3. Verify the client emits something like:
```
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
```

Closes T9450

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T9450

Differential Revision: https://secure.phabricator.com/D14241
2015-10-10 07:14:48 -07:00
Joshua Spence
b6d745b666 Extend from Phobject
Summary: All classes should extend from some other class. See D13275 for some explanation.

Test Plan: `arc unit`

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D13283
2015-06-15 18:02:27 +10:00
Joshua Spence
36e2d02d6e phtize all the things
Summary: `pht`ize a whole bunch of strings in rP.

Test Plan: Intense eyeballing.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: hach-que, Korvin, epriestley

Differential Revision: https://secure.phabricator.com/D12797
2015-05-22 21:16:39 +10:00
Joshua Spence
d0128afa29 Applied various linter fixes.
Summary: Applied some more linter fixes that I previously missed because my global `arc` install was out-of-date.

Test Plan: Will run `arc unit` on another host.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: epriestley, Korvin

Differential Revision: https://secure.phabricator.com/D9443
2014-06-09 16:04:12 -07:00
epriestley
f37832aed7 Fix loop in svnserve workflow for large binaries
Summary: If you push a large binary and the data crosses multiple data frames, we can end up in a loop in the parser.

Test Plan:
After this change, I was able to push a 95MB binary in 7s, which seems reasonable:

  >>> orbital ~/repos/INIS $ svn st
  A       large2.bin
  >>> orbital ~/repos/INIS $ ls -alh
  total 390648
  drwxr-xr-x   6 epriestley  admin   204B Dec 18 17:14 .
  drwxr-xr-x  98 epriestley  admin   3.3K Dec 16 11:19 ..
  drwxr-xr-x   7 epriestley  admin   238B Dec 18 17:14 .svn
  -rw-r--r--   1 epriestley  admin    80B Dec 18 15:07 README
  -rw-r--r--   1 epriestley  admin    95M Dec 18 16:53 large.bin
  -rw-r--r--   1 epriestley  admin    95M Dec 18 17:14 large2.bin
  >>> orbital ~/repos/INIS $ time svn commit -m 'another large binary'
  Adding  (bin)  large2.bin
  Transmitting file data .
  Committed revision 25.

  real	0m7.215s
  user	0m5.327s
  sys	0m0.407s
  >>> orbital ~/repos/INIS $

There may be room to improve this by using `PhutilRope`.

Reviewers: wrotte, btrahan, wotte

Reviewed By: wotte

CC: aran

Differential Revision: https://secure.phabricator.com/D7798
2013-12-18 17:48:29 -08:00
epriestley
85f505465e Support serving SVN repositories over SSH
Summary:
Ref T2230. The SVN protocol has a sensible protocol format with a good spec here:

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol

Particularly, compare this statement to the clown show that is the Mercurial wire protocol:

> It is possible to parse an item without knowing its type in advance.

WHAT A REASONABLE STATEMENT TO BE ABLE TO MAKE ABOUT A WIRE PROTOCOL

Although it makes substantially more sense than Mercurial, it's much heavier-weight than the Git or Mercurial protocols, since it isn't distributed.

It's also not possible to figure out if a request is a write request (or even which repository it is against) without proxying some of the protocol frames. Finally, several protocol commands embed repository URLs, and we need to reach into the protocol and translate them.

Test Plan: Ran various SVN commands over SSH (`svn log`, `svn up`, `svn commit`, etc).

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2230

Differential Revision: https://secure.phabricator.com/D7556
2013-11-11 12:19:06 -08:00
epriestley
8840f60218 Enable Mercurial reads and writes over SSH
Summary:
Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol.

Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong.

A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct.

Test Plan:
  - Ran `hg clone` over SSH.
  - Ran `hg fetch` over SSH.
  - Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success).

Reviewers: btrahan, asherkin

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2230

Differential Revision: https://secure.phabricator.com/D7553
2013-11-11 12:18:27 -08:00
epriestley
6324669748 Allow Phabricator to serve Mercurial repositories over HTTP
Summary: Ref T2230. This is easily the worst thing I've had to write in a while. I'll leave some notes inline.

Test Plan: Ran `hg clone http://...` on a hosted repo. Ran `hg push` on the same. Changed sync'd both ways.

Reviewers: asherkin, btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2230

Differential Revision: https://secure.phabricator.com/D7520
2013-11-06 18:00:42 -08:00