Summary:
Fixes T13309. If you void the warranty on a repository on disk and turn it into a shallow clone, Phabricator currently can't serve it.
We don't support hosting shallow working copies, but we should still parse and proxy the protocol rather than breaking in a mysterious way.
Test Plan:
- Created a shallow working copy with `mv X X.full; git clone --depth Y file://.../X.full X` in the storage directory on disk.
- Cloned it with `git clone <uri>`.
- Deleted all the refs inside it so the wire only has "shallow" frames; cloned it.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13309
Differential Revision: https://secure.phabricator.com/D20577
Summary:
Depends on D20381. Ref T8093. This makes minor improvements to the protocol proxy to handle cases where we add, remove, or replace refs and may need to move the "capabilities" section.
Rather than invoking a callback on every ref: parse the whole ref list into a data structure, mutate it if necessary (in a future diff), then dump it back into wire format.
This allows us to shift the capabilities data (which needs to be coupled with the first ref) around if we modify the first ref, and reorder the reflist alphabetically like git does.
When the server has no refs, Git sends no capabilities data. This is easy to emulate, just surprising.
Test Plan:
Tested the cases not covered by D20381:
- Fetching where the fetch actually fetches data.
- `ls-remote` when we hide the first ref (capabilities data properly moves to the first visible ref).
- `ls-remote` when the remote is empty (we just drop the capabilities frame completely).
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T8093
Differential Revision: https://secure.phabricator.com/D20436
Summary:
Depends on D20380. Ref T8093. When prototypes are enabled, inject a (hopefully?) no-op proxy into the Git wire protocol.
This proxy decodes "git upload-pack" and allows the list of references to be rewritten, in a similar way to how we already proxy the Subversion protocol to rewrite URIs and proxy the Mercurial protocol to distinguish between read and write operations.
The piece we care about comes at the beginning, and looks like this:
```
<frame-length><ref-hash> <ref-name>\0<server-capabilities>\n
<frame-length><ref-hash> <ref-name>\n
<frame-length><ref-hash> <ref-name>\n
...
<0000>
```
We can add, remove, or modify this section to make it appear that the server has different refs than the refs that exist on disk.
Things I have tried:
- `git ls-remote`
- `git ls-remote` where the server hides some refs.
- `git fetch` where the fetch is a no-op.
Things I have not tried:
- `git fetch` where the fetch is not a no-op.
- Tricking things into doing protocol v2. Or: I tried this, I wasn't successful. In v2, additional "\0" tricks are used to hide data in the capabilities, I think?
- `git ls-remote` where we rewrite/hide the first ref in the list, and need to move the capabilities frame elsewhere.
- `git ls-remote` where the server has no refs at all, or we remove every ref.
So the "interesting" piece of this works, but it almost certainly needs some cleanup to survive interaction with the real world.
Test Plan: See above.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T8093
Differential Revision: https://secure.phabricator.com/D20381
Summary:
Ref T13242. See <https://discourse.phabricator-community.org/t/out-of-range-value-for-column-deviceversion/2218>.
The synchronization log column is `uint32?` and `-1` doesn't go into that column.
Since we're only using `-1` for convenience to cheat through `$max_version > $this_version` checks, use `null` instead and make the checks more explicit.
Test Plan: Reproducing this is a bit tricky and I cheated fairly heavily to force the code down this pathway without actually building a multi-device cluster, but I did reproduce the original exception, apply the patch, and observe that it fixed things.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13242
Differential Revision: https://secure.phabricator.com/D20047
Summary: Depends on D19830. Ref T13216. See PHI908. See PHI750. See PHI885. Allow users to configure a filesize limit, and allow them to adjust the clone/fetch timeout.
Test Plan:
{F6021356}
- Configured a filesize limit and pushed, hit it. Made the limit larger and pushed, change went through.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19831
Summary:
Depends on D19814. Ref T13216. See PHI885. For various eldritch reasons, `git fetch` can hang. Although we'd probably like to fix this with `git fetch --require-sustained-network-transfer-rate=512KB/5s` or similar, that flag doesn't exist and we don't have a reasonable way to build it.
Short of that, move toward formalizing a repository "copy time limit": the longest amount of time anything may spend trying to make a copy of this repository.
This grows out of the existing intracluster sync limit, which is effectively the same thing. Here, apply it to `git clone` and `git fetch` in Drydock working copy construction, too. A future change may make it configurable.
Test Plan:
- Set the limit to 0.001.
- Tried to build and lease working copies, got sensible timeout errors (see D19815).
```
<Activation Failed> Lease activation failed: [CommandException] Command killed by timeout after running for more than 0.001 seconds.
COMMAND
ssh '-o' 'LogLevel=quiet' '-o' 'StrictHostKeyChecking=no' '-o' 'UserKnownHostsFile=/dev/null' '-o' 'BatchMode=yes' -l '********' -p '2222' -i '********' '127.0.0.1' -- '(cd '\''/var/drydock/workingcopy-163/repo/spellbook/'\'' && git clean -d --force && git fetch && git reset --hard)'
```
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19816
Summary:
Depends on D19797. Ref T13216.
- Put the new `hookWait` in the export and the UI.
- Put the existing waits in the UI, not just the export.
- Make order consistent: host, write, read, hook (this is the order the timers start in).
Test Plan: Pushed some stuff, viewed web UI and saw sensible numbers, exported data and got the same values.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19798
Summary: Depends on D19796. Simplify some timing code by using phutil_microseconds_since() instead of duplicate casting and arithmetic.
Test Plan: Grepped for `1000000` to find these. Pulled, pushed, made a conduit call. This isn't exhaustive but it should be hard for these to break in a bad way since they're all just diagnostic.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D19797
Summary:
Depends on D19778. Ref T13216. See PHI943, PHI889, et al.
We currently have a push log and a pull log, but do not separately log intracluster synchronization events. We've encountered several specific cases where having this kind of log would be helpful:
- In PHI943, an install was accidentally aborting locks early. Having timing information in the sync log would let us identify this more quickly.
- In PHI889, an install hit an issue with `MaxStartups` configuration in `sshd`. A log would let us identify when this is an issue.
- In PHI889, I floated a "push the linux kernel + fetch timeout" theory. A sync log would let us see sync/fetch timeouts to confirm this being a problem in practice.
- A sync log will help us assess, develop, test, and monitor intracluster routing sync changes (likely those in T13211) in the future.
Some of these events are present in the pull log already, but only if they make it as far as running a `git upload-pack` subprocess (not the case with `MaxStartups` problems) -- and they can't record end-to-end timing.
No UI yet, I'll add that in a future change.
Test Plan:
- Forced all operations to synchronize by adding `|| true` to the version check.
- Pulled, got a sync log in the database.
Reviewers: amckinley
Reviewed By: amckinley
Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19779
Summary:
Depends on D19734. Ref T13202. Ref T13109. Ref T10884. See PHI905. See PHI889. We currently rank cluster nodes in three cases:
# when performing a write, we can go to any node (D19734 should make our ranking good);
# when performing a read, we can go to any node (currently random, but T10884 discusses ideas to improve our ranking);
# when performing an internal synchronization before a read or a write, we must go to an up-to-date node.
Currently, case (3) is not-exactly-deterministic but not random, and we won't spread intracluster traffic acrosss the cluster evenly if, say, half of it is up to date and half of it is still synchronizing. For a given write, I believe all nodes will tend to synchronize from whichever node first received the write today.
Instead, shuffle the list and synchronize from any up-to-date node.
(I think we could improve upon this only by knowing which nodes actually have load and selecting the least-loaded -- doable, but not trivial.)
Test Plan: Poked at it locally, will deploy to `secure`. This is hard to measure/test terribly convincingly.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13202, T13109, T10884
Differential Revision: https://secure.phabricator.com/D19735
Summary: Ref D18268. This typo cancelled itself out, and I can't find any other callers.
Test Plan: arc unit
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin
Differential Revision: https://secure.phabricator.com/D19718
Summary:
Ref T13202. See PHI889. If the lock log is enabled, we can try to offer more details about lock holders.
When we fail to acquire a lock:
- check for recent acquisitions and suggest that this is a bottleneck issue;
- if there are no recent acquisitions, check for the last acquisition and print details about it (what process, how long ago, whether or not we believe it was released).
Test Plan:
- Enabled the lock log.
- Changed the lock wait time to 1 second.
- Added a `sleep(10)` after grabbing the lock.
- In one window, ran a Conduit call or a `git fetch`.
- In another window, ran another operation.
- Got useful/sensible errors for both ssh and web lock holders, for example:
> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=12609, host=orbital-3.local, sapi=apache2handler, controller=PhabricatorConduitAPIController, method=diffusion.rawdiffquery) 3 second(s) ago. There is no record of this lock being released.)
> PhutilProxyException: Failed to acquire read lock after waiting 1 second(s). You may be able to retry later. (This lock was most recently acquired by a process (pid=65251, host=orbital-3.local, sapi=cli, argv=/Users/epriestley/dev/core/lib/phabricator/bin/ssh-exec --phabricator-ssh-device local.phacility.net --phabricator-ssh-key 2) 2 second(s) ago. There is no record of this lock being released.)
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13202
Differential Revision: https://secure.phabricator.com/D19702
Summary:
Ref T13187. See PHI834. Mercurial has somewhat-recently (changeset is from Jan 2018) introduced a new "protocaps" command, that appears in Mercurial 4.7 and possibly before then.
We must explicitly enumerate all protocol commands because you can't decode the protocol without knowing how many arguments the command expects, so enumerate it.
(Also fix an issue where the related error message had an extra apostrophe.)
Test Plan:
- Ran `hg clone ...` with client and server on Mercurial 4.7.
- Before: fatal on unknown "protocaps" command.
- Midway: better typography in error message.
- After: clean clone.
Reviewers: amckinley
Maniphest Tasks: T13187
Differential Revision: https://secure.phabricator.com/D19596
Summary:
Fixes T13140. See PHI660.
Recent versions of Subversion can send a `(get-file true false false )` protocol frame with extra space between "false" and "false". This is allowed by the protocol spec, but never normally happens, and we do not parse it correctly.
Instead, parse it correctly.
Test Plan:
- Added unit tests.
- Ran `svn proplist svn+ssh://.../diffusion/X/file.c` under SVN 1.10 before and after the change.
- Before: indefinite hang.
- After: completed in finite time.
Reviewers: amckinley, asherkin
Reviewed By: amckinley, asherkin
Maniphest Tasks: T13140
Differential Revision: https://secure.phabricator.com/D19451
Summary:
Depends on D19249. Ref T13109. Add timing information to the `PushEvent`:
- `writeWait`: Time spent waiting for a write lock.
- `readWait`: Time spent waiting for a read lock.
- `hostWait`: Roughly, total time spent on the leaf node.
The primary goal here is to see if `readWait` is meaningful in the wild. If it is, that motivates smarter routing, and the value of smarter routing can be demonstrated by looking for a reduction in read wait times.
Test Plan: Pushed some stuff, saw reasonable timing values in the table. Saw timing information in "Export Data".
Maniphest Tasks: T13109
Differential Revision: https://secure.phabricator.com/D19250
Summary:
Ref T13109. Make it slightly more clear what the scope of the write and read locks are, and slightly more clear that we're actively acquiring locks, not just sitting around waiting.
While waiting on another writer, show who we're waiting on so you can walk over to their desk and glare at them.
Test Plan:
Added `sleep(15)` after `willWrite()`. Pushed in two windows. Saw new, more informative messages. In the second window, saw the new guidance:
> # Waiting for hector to finish writing (on device "repo1.local.phacility.net" for 11s)...
Reviewers: asherkin
Reviewed By: asherkin
Subscribers: asherkin
Maniphest Tasks: T13109
Differential Revision: https://secure.phabricator.com/D19247
Summary:
Ref T13108. See PHI364. See the task and issue for discussion.
If a `git fetch` during synchronization hangs, the whole node currently hangs. While the causes of a `git fetch` hang aren't clear, we don't expect synchronization to ever reasonably take more than 15 minutes, so add a default timeout.
Test Plan: Will deploy and observe; this is difficult to reproduce or test directly.
Maniphest Tasks: T13108
Differential Revision: https://secure.phabricator.com/D19235
A recent version of Git has changed some piece of behavior here and we
now get "fatal: ssh variant 'simple' does not support setting port"
when using a port. Explicitly setting GIT_SSH_VARIANT to `ssh` likely
fixes this.
Summary:
Ref T13012. These flags can be exploited by attackers to execute code remotely. See T13012 for discussion and context.
Additionally, harden some Mercurial commands where possible (by using additional quoting or embedding arguments in other constructs) so they resist these flags and behave properly when passed arguments with these values.
Test Plan:
- Added unit tests.
- Verified "--config" and "--debugger" commands are rejected.
- Verified more commands now work properly even with branches and files named `--debugger`, although not all of them do.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13012
Differential Revision: https://secure.phabricator.com/D18769
Summary: Noticed a couple of typos in the docs, and then things got out of hand.
Test Plan:
- Stared at the words until my eyes watered and the letters began to swim on the screen.
- Consulted a dictionary.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18693
Summary:
Ref T12961. In Mercurial, it's possible to have "subrepos" which may use a different protocol than the main repository.
By putting an SSH repository inside an HTTP repository, an attacker can theoretically get us to execute `hg` without overriding `ui.ssh`, then execute code via the SSH hostname attack.
As an immediate mitigation to this attack, specify `ui.ssh` unconditionally. Normally, this will have no effect (it will just be ignored). In the specific case of an SSH repo inside an HTTP repo, it will defuse the `ssh` protocol.
For good measure and consistency, do the same for Subversion and Git. However, we don't normally maintain working copies for either Subversion or Git so it's unlikely that similar attacks exist there.
Test Plan:
- Put an SSH subrepo with an attack URI inside an HTTP outer repo in Mercurial.
- Ran `hg up` with and without `ui.ssh` specified.
- Got dangerous badness without `ui.ssh` and safe `ssh` subprocesses with `ui.ssh`.
I'm not yet able to confirm that `hg pull -u -- <uri>` can actually trigger this, but this can't hurt and our SSH wrapper is safer than the native behavior for all Subversion, Git and Mercurial versions released prior to today.
Reviewers: chad
Reviewed By: chad
Subscribers: cspeckmim
Maniphest Tasks: T12961
Differential Revision: https://secure.phabricator.com/D18389
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
Summary: This spelling can definitely feel a little overplayed at times, but I still think it's a gold standard in spellings of "capabilities".
Test Plan: Felt old and uncool.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D18215
Summary:
Ref T12613. Currently, the SVNTEST and HGTEST repositories are improperly configured on `secure`. These repositories use VCS systems which do not support synchronization, so they can not be served from cluster services with multiple hosts.
However, I've incorrectly configured them the same way as all the Git repositories, which support synchronization. This causes about 50% of requests to randomly fail (when they reach the wrong host).
Detect this issue and warn the user that the configuration is not valid.
It should be exceptionally difficult for normal installs to run into this.
Test Plan:
- Mostly faked these conditions locally, verified that `secure` really has this configuration.
- I'll push this, verify that the issue is detected correctly in production, then fix the config which should resolve the intermittent issues with SVNTEST.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12613
Differential Revision: https://secure.phabricator.com/D17774
Summary: Fixes T12243. That error occured due to network flakiness with some mounted filesystems so I'm not sure how best to simulate it. But you can look and see that the PhutilProxyException does indeed expect an exception as its second arg.
Test Plan: Look at method signature... look at callsite... now back at the method. Smile and nod.
Reviewers: #blessed_reviewers, yelirekim, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley
Maniphest Tasks: T12243
Differential Revision: https://secure.phabricator.com/D17335
Summary:
Fixes T12087. When transitioning into a clustered configuration for the first time, the documentation recommends using a one-device cluster as a transitional step.
However, installs may not do this for whatever reason, and we aren't as clear as we could be in warning about clusterizing directly into a multi-device cluster.
Roughly, when you do this, we end up believing that working copies exist on several different devices, but have no information about which copy or copies are up to date. //Usually// they all were already synchronized and are all up to date, but we can't make this assumption safely without risking data.
Instead, we err on the side of caution, and require a human to tell us which copy we should consider to be up-to-date, using `bin/repository thaw --promote`.
Test Plan:
```
$ ./bin/repository clusterize rLOCKS --service repos001.phacility.net
Service "repos001.phacility.net" is actively bound to more than one device
(local002.local, local001.phacility.net).
If you clusterize a repository onto this service it will be unclear which
devices have up-to-date copies of the repository. This leader/follower
ambiguity will freeze the repository. You may need to manually promote a
device to unfreeze it. See "Ambiguous Leaders" in the documentation for
discussion.
Continue anyway? [y/N]
```
Read other changes.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12087
Differential Revision: https://secure.phabricator.com/D17169
Summary:
Fixes T11940. In 2.11.0, Git has made a change so that newly-pushed changes are held in a temporary area until the hook accepts or rejects them.
This magic temporary area is only readable if the appropriate `GIT_ENVIRONMENTAL_MAGIC` variables are available. When executing `git` commands, pass them through from the calling context.
We're intentionally conservative about which variables we pass, and with good reason (see "httpoxy" in T11359). I think this continues to be the correct default behavior.
Test Plan:
- Upgraded to Git 2.11.0.
- Tried to push over SSH, got a hook error.
- Applied patch.
- Pulled and pushed over SSH.
- Pulled and pushed over HTTP.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11940
Differential Revision: https://secure.phabricator.com/D16988
Summary:
Ref T11044. Few issues here:
- The `PhutilProxyException` is missing an argument (hit this while in read-only mode).
- The `$ref_key` is unused.
- When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied.
Test Plan:
- Poked this locally, but will initialize `secure004` as an empty master to be sure.
Reviewers: chad, avivey
Reviewed By: avivey
Maniphest Tasks: T11044
Differential Revision: https://secure.phabricator.com/D16916
Summary:
Fixes T11590. Currently, we incorrectly consider cluster repository versions that are (or were) on devices which are no longer part of the active cluster service when building this status screen.
Instead, ignore them. This is just a display bug; the actual `ClusterEngine` already had similar logic.
Test Plan:
- Added a bad leader record to `repository_workingcopyversion`.
- Before patch, got a bad "Partial (1w)" sync:
{F1802292}
- After patch, got a good "Sycnchronized":
{F1802293}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11590
Differential Revision: https://secure.phabricator.com/D16492
Summary:
I converted this call incorrectly in D16092. We should pass the `PhutilURI` object, not the string version of it.
Specifically, this resulted in hitting an error like this if a replica needed synchronization:
```
[2016-08-11 21:22:37] EXCEPTION: (InvalidArgumentException) Argument 1 passed to DiffusionCommandEngine::setURI() must be an instance of PhutilURI, string given, called in...
#0 PhutilErrorHandler::handleError(integer, string, string, integer, array) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionCommandEngine.php:52]
#1 DiffusionCommandEngine::setURI(string) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:601]
...
```
Test Plan: Clusterized an observed repository, demoted a node, ran `bin/repository update Rxxx` to update, saw no typehint fatal.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16390
Summary: Ref T10227. When we perform `git` http operations (fetch, mirror) check if we should use a proxy; if we should, set `http_proxy` or `https_proxy` in the environment to make `git` have `curl` use it.
Test Plan:
- Configured a proxy extension to run stuff through a local instance of Charles.
- Ran `repository pull` and `repository mirror`.
- Saw `git` HTTP requests route through the proxy.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10227
Differential Revision: https://secure.phabricator.com/D16092
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
Summary:
Fixes T11020. I think this resolves things -- `$new_version` (set above) should be used, not `$new_log` directly.
Specifically, we would get into trouble if the initial push failed for some reason (working copy not initialized yet, commit hook rejected, etc).
Test Plan: Made a bad push to a new repository. Saw it freeze before the patch and succeed afterwards.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11020
Differential Revision: https://secure.phabricator.com/D15969
Summary: Fixes T10941. This avoids a confusing dead end when configuring Subversion hosting, where `svnserve` will fail to execute hooks if the CWD isn't readable by the vcs-user.
Test Plan:
- Updated and committed in a hosted SVN repository.
- Ran some git operations, too.
- @dpotter confirmed this locally in T10941.
Reviewers: chad
Reviewed By: chad
Subscribers: dpotter
Maniphest Tasks: T10941
Differential Revision: https://secure.phabricator.com/D15879
Summary:
Ref T10748. Ref T10366. Allows users to set credential for new URIs.
- Ref T7221. Our handling of the "git://" protocol is currently incorrect. This protocol is not authenticated, but is considered an SSH protocol. In the new UI, it is considered an anonymous/unauthenticated protocol instead.
- Ref T10241. This fixes the `PassphraseCredentialControl` so it doesn't silently edit the value if the current value is not visible to you and/or not valid.
Test Plan:
Performed a whole lot of credential edits, removals, and adjustments. I'll give this additional vetting before cutting over to it.
{F1253207}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7221, T10241, T10366, T10748
Differential Revision: https://secure.phabricator.com/D15829
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
Summary: Ref T10860. This doesn't change anything, it just separates all this stuff out of `PhabricatorRepository` since I'm planning to add a bit more state to it and it's already pretty big and fairly separable.
Test Plan: Pulled, pushed, browsed Diffusion.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15790
Summary:
Ref T4292. We currently synchronize hosted, clustered, Git repositories when we receive an SSH pull or push.
Additionally:
- Synchronize before HTTP reads and writes.
- Synchronize reads before Conduit requests.
We could relax Conduit eventually and allow Diffusion to say "it's OK to give me stale data".
We could also redirect some set of these actions to just go to the up-to-date host instead of connecting to a random host and synchronizing it. However, this potentially won't work as well at scale: if you have a larger number of servers, it sends all of the traffic to the leader immediately following a write. That can cause "thundering herd" issues, and isn't efficient if replicas are in different geographical regions and the write just went to the east coast but most clients are on the west coast. In large-scale cases, it's better to go to the local replica, wait for an update, then serve traffic from it -- particularly given that writes are relatively rare. But we can finesse this later once things are solid.
Test Plan:
- Pushed and pulled a Git repository over HTTP.
- Browsed a Git repository from the web UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15758
Summary:
Ref T4292. When you run `git fetch` and connect to, say, `repo001.west.company.com`, we'll look at the current version of the repository in other nodes in the cluster.
If `repo002.east.company.com` has a newer version of the repository, we'll fetch that version first, then respond to your request.
To do this, we need to run `git fetch repo002.east.company.com ...` and have that connect to the other host and be able to fetch data.
This change allows us to run `PHABRICATOR_AS_DEVICE=1 git fetch ...` to use device credentials to do this fetch. (Device credentials are already supported and used, they just always connect as a user right now, but these fetches should be doable without having a user. We will have a valid user when you run `git fetch` yourself, but we won't have one if the daemons notice that a repository is out of date and want to update it, so the update code should not depend on having a user.)
Test Plan:
```
$ PHABRICATOR_AS_DEVICE=1 ./bin/ssh-connect local.phacility.com
Warning: Permanently added 'local.phacility.com' (RSA) to the list of known hosts.
PTY allocation request failed on channel 0
phabricator-ssh-exec: Welcome to Phabricator.
You are logged in as device/daemon.phacility.net.
You haven't specified a command to run. This means you're requesting an interactive shell, but Phabricator does not provide an interactive shell over SSH.
Usually, you should run a command like `git clone` or `hg push` rather than connecting directly with SSH.
Supported commands are: conduit, git-lfs-authenticate, git-receive-pack, git-upload-pack, hg, svnserve.
Connection to local.phacility.com closed.
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15755
Summary:
Ref T4292. Ref T10366. Depends on D15751. Today, generating repository commands is purely a function of the repository, so they use protocols and credentials based on the repository configuration.
For example, a repository with an SSH "remote URI" always generate SSH "remote commands".
This needs to change in the future:
- After T10366, repositories won't necessarily just have one type of remote URI. They can only have one at a time still, but the repository itself won't change based on which one is currently active.
- For T4292, I need to generate intracluster commands, regardless of repository configuration. These will have different protocols and credentials.
Prepare for these cases by separating out command construction, so they'll be able to generate commands in a more flexible way.
Test Plan:
- Added unit tests.
- Browsed diffusion.
- Ran `bin/phd debug pull` to pull a bunch of repos.
- Ran daemons.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292, T10366
Differential Revision: https://secure.phabricator.com/D15752
Summary:
If Mercurial 3.4+ is used to host repositories in Phabricator, any clients using 3.5+ will receive an exception after the bundle is pushed up. Clients will also fail to update phases for changesets pushed up.
Before directly responding to mercurial clients with all capabilities, this change filters out the 'bundle2' capability so the client negotiates using a legacy bundle wire format instead.
Test Plan:
Server: Mercurial 3.5
Client: Mercurial 3.4
Test with both HTTP and SSH protocols:
1. Create a local commit on client
2. Push commit to server
3. Verify the client emits something like:
```
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
```
Closes T9450
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T9450
Differential Revision: https://secure.phabricator.com/D14241
Summary: All classes should extend from some other class. See D13275 for some explanation.
Test Plan: `arc unit`
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13283
Summary: Applied some more linter fixes that I previously missed because my global `arc` install was out-of-date.
Test Plan: Will run `arc unit` on another host.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D9443
Summary: If you push a large binary and the data crosses multiple data frames, we can end up in a loop in the parser.
Test Plan:
After this change, I was able to push a 95MB binary in 7s, which seems reasonable:
>>> orbital ~/repos/INIS $ svn st
A large2.bin
>>> orbital ~/repos/INIS $ ls -alh
total 390648
drwxr-xr-x 6 epriestley admin 204B Dec 18 17:14 .
drwxr-xr-x 98 epriestley admin 3.3K Dec 16 11:19 ..
drwxr-xr-x 7 epriestley admin 238B Dec 18 17:14 .svn
-rw-r--r-- 1 epriestley admin 80B Dec 18 15:07 README
-rw-r--r-- 1 epriestley admin 95M Dec 18 16:53 large.bin
-rw-r--r-- 1 epriestley admin 95M Dec 18 17:14 large2.bin
>>> orbital ~/repos/INIS $ time svn commit -m 'another large binary'
Adding (bin) large2.bin
Transmitting file data .
Committed revision 25.
real 0m7.215s
user 0m5.327s
sys 0m0.407s
>>> orbital ~/repos/INIS $
There may be room to improve this by using `PhutilRope`.
Reviewers: wrotte, btrahan, wotte
Reviewed By: wotte
CC: aran
Differential Revision: https://secure.phabricator.com/D7798
Summary:
Ref T2230. The SVN protocol has a sensible protocol format with a good spec here:
http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol
Particularly, compare this statement to the clown show that is the Mercurial wire protocol:
> It is possible to parse an item without knowing its type in advance.
WHAT A REASONABLE STATEMENT TO BE ABLE TO MAKE ABOUT A WIRE PROTOCOL
Although it makes substantially more sense than Mercurial, it's much heavier-weight than the Git or Mercurial protocols, since it isn't distributed.
It's also not possible to figure out if a request is a write request (or even which repository it is against) without proxying some of the protocol frames. Finally, several protocol commands embed repository URLs, and we need to reach into the protocol and translate them.
Test Plan: Ran various SVN commands over SSH (`svn log`, `svn up`, `svn commit`, etc).
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7556
Summary:
Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol.
Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong.
A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct.
Test Plan:
- Ran `hg clone` over SSH.
- Ran `hg fetch` over SSH.
- Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success).
Reviewers: btrahan, asherkin
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7553
Summary: Ref T2230. This is easily the worst thing I've had to write in a while. I'll leave some notes inline.
Test Plan: Ran `hg clone http://...` on a hosted repo. Ran `hg push` on the same. Changed sync'd both ways.
Reviewers: asherkin, btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7520