Summary: Ref T10860. This doesn't change anything, it just separates all this stuff out of `PhabricatorRepository` since I'm planning to add a bit more state to it and it's already pretty big and fairly separable.
Test Plan: Pulled, pushed, browsed Diffusion.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15790
Summary:
Ref T4292. We currently synchronize hosted, clustered, Git repositories when we receive an SSH pull or push.
Additionally:
- Synchronize before HTTP reads and writes.
- Synchronize reads before Conduit requests.
We could relax Conduit eventually and allow Diffusion to say "it's OK to give me stale data".
We could also redirect some set of these actions to just go to the up-to-date host instead of connecting to a random host and synchronizing it. However, this potentially won't work as well at scale: if you have a larger number of servers, it sends all of the traffic to the leader immediately following a write. That can cause "thundering herd" issues, and isn't efficient if replicas are in different geographical regions and the write just went to the east coast but most clients are on the west coast. In large-scale cases, it's better to go to the local replica, wait for an update, then serve traffic from it -- particularly given that writes are relatively rare. But we can finesse this later once things are solid.
Test Plan:
- Pushed and pulled a Git repository over HTTP.
- Browsed a Git repository from the web UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15758
Summary: Fixes T10797. This seems to fix things on my local system.
Test Plan:
- Cloned with a username, got prompted for a password.
- Cloned with a username + password.
- Cloned with a username + bad password (error).
Reviewers: chad
Reviewed By: chad
Subscribers: Grimeh
Maniphest Tasks: T10797
Differential Revision: https://secure.phabricator.com/D15706
Summary:
Ref T10262. This removes one-time tokens and makes file data responses always-cacheable (for 30 days).
The URI will stop working once any attached object changes its view policy, or the file view policy itself changes.
Files with `canCDN` (totally public data like profile images, CSS, JS, etc) use "cache-control: public" so they can be CDN'd.
Files without `canCDN` use "cache-control: private" so they won't be cached by the CDN. They could still be cached by a misbehaving local cache, but if you don't want your users seeing one anothers' secret files you should configure your local network properly.
Our "Cache-Control" headers were also from 1999 or something, update them to be more modern/sane. I can't find any evidence that any browser has done the wrong thing with this simpler ruleset in the last ~10 years.
Test Plan:
- Configured alternate file domain.
- Viewed site: stuff worked.
- Accessed a file on primary domain, got redirected to alternate domain.
- Verified proper cache headers for `canCDN` (public) and non-`canCDN` (private) files.
- Uploaded a file to a task, edited task policy, verified it scrambled the old URI.
- Reloaded task, new URI generated transparently.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10262
Differential Revision: https://secure.phabricator.com/D15642
Summary: Ref T7789. Implement proper detection for read-only requests. Previously, we assumed every request was read/write and required lots of permissions, but we don't need "Can Push" permission if you're only cloning/fetching/pulling.
Test Plan:
- Set push policy to "no one".
- Fetched, got clean data out of LFS.
- Tried to push, got useful error.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7789
Differential Revision: https://secure.phabricator.com/D15499
Summary:
Ref T7789. Ref T10604. This implements the `upload` action, which streams file data into Files.
This makes Git LFS actually work, at least roughly.
Test Plan:
- Tracked files in an LFS repository.
- Pushed LFS data (`git lfs track '*.png'; git add something.png; git commit -m ...; git push`).
- Pulled LFS data (`git checkout master^; rm -rf .git/lfs; git checkout master; open something.png`).
- Verified LFS refs show up in the gitlfsref table.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7789, T10604
Differential Revision: https://secure.phabricator.com/D15492
Summary:
Ref T7789. This implements:
- A new table to store the `<objectHash, filePHID>` relationship between Git LFS files and Phabricator file objects.
- A basic response to `batch` commands, which return actions for a list of files.
Test Plan:
Ran `git lfs push origin master`, got a little further than previously:
```
epriestley@orbital ~/dev/scratch/poemslocal $ git lfs push origin master
Git LFS: (2 of 1 files) 174.24 KB / 87.12 KB
Git LFS operation "upload/b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69" is not supported by this server.
Git LFS operation "upload/b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69" is not supported by this server.
```
With `GIT_TRACE=1`, this shows the batch part of the API going through.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7789
Differential Revision: https://secure.phabricator.com/D15489
Summary:
Ref T7789. This builds on top of `git-lfs-authenticate` to detect LFS requests, read LFS tokens, and route them to a handler which can do useful things.
This handler promptly drops them on the floor with an error message.
Test Plan:
Here's a transcript showing the parts working together so far:
- `git-lfs` connects to the server with SSH, and gets told how to connect with HTTP to do uploads.
- `git-lfs` uses HTTP, and authenticates with the tokens properly.
- But the server tells it to go away, and that it doesn't support anything, so the operation ultimately fails.
```
$ GIT_TRACE=1 git lfs push origin master
12:45:56.153913 git.c:558 trace: exec: 'git-lfs' 'push' 'origin' 'master'
12:45:56.154376 run-command.c:335 trace: run_command: 'git-lfs' 'push' 'origin' 'master'
trace git-lfs: Upload refs origin to remote [master]
trace git-lfs: run_command: git rev-list --objects master --not --remotes=origin
trace git-lfs: run_command: git cat-file --batch-check
trace git-lfs: run_command: git cat-file --batch
trace git-lfs: run_command: 'git' config -l
trace git-lfs: tq: starting 3 transfer workers
trace git-lfs: tq: running as batched queue, batch size of 100
trace git-lfs: prepare upload: b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69 lfs/dog1.jpg 1/1
trace git-lfs: tq: sending batch of size 1
trace git-lfs: ssh: local@localvault.phacility.com git-lfs-authenticate diffusion/18/poems.git upload
trace git-lfs: api: batch 1 files
trace git-lfs: HTTP: POST http://local.phacility.com/diffusion/POEMS/poems.git/info/lfs/objects/batch
trace git-lfs: HTTP: 404
trace git-lfs: HTTP: {"message":"Git LFS operation \"objects\/batch\" is not supported by this server."}
trace git-lfs: HTTP:
trace git-lfs: api: batch not implemented: 404
trace git-lfs: run_command: 'git' config lfs.batch false
trace git-lfs: tq: batch api not implemented, falling back to individual
trace git-lfs: ssh: local@localvault.phacility.com git-lfs-authenticate diffusion/18/poems.git upload b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69
trace git-lfs: api: uploading (b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69)
trace git-lfs: HTTP: POST http://local.phacility.com/diffusion/POEMS/poems.git/info/lfs/objects
trace git-lfs: HTTP: 404
trace git-lfs: HTTP: {"message":"Git LFS operation \"objects\" is not supported by this server."}
trace git-lfs: HTTP:
trace git-lfs: tq: retrying 1 failed transfers
trace git-lfs: ssh: local@localvault.phacility.com git-lfs-authenticate diffusion/18/poems.git upload b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69
trace git-lfs: api: uploading (b7e0aeb82a03d627c6aa5fc1bbfd454b6789d9d9affc8607d40168fa18cf6c69)
trace git-lfs: HTTP: POST http://local.phacility.com/diffusion/POEMS/poems.git/info/lfs/objects
trace git-lfs: HTTP: 404
trace git-lfs: HTTP: {"message":"Git LFS operation \"objects\" is not supported by this server."}
trace git-lfs: HTTP:
Git LFS: (0 of 1 files) 0 B / 87.12 KB
Git LFS operation "objects" is not supported by this server.
Git LFS operation "objects" is not supported by this server.
```
Reviewers: chad
Reviewed By: chad
Subscribers: eadler
Maniphest Tasks: T7789
Differential Revision: https://secure.phabricator.com/D15485
Summary: Ref T4245. Consolidates the URI parsing/rewriting logic so that repositories can be served from either `/diffusion/XYZ/` or `/diffusion/123/`, over both HTTP and SSH.
Test Plan:
- Pulled a Git repository by ID and callsign over HTTP and SSH.
- Pulled a Mercurial repository by ID and callsign over HTTP and SSH.
- Pulled a Subversion repository by ID and callsign over SSH (no HTTP support for SVN).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4245
Differential Revision: https://secure.phabricator.com/D15302
Summary:
Fixes T10264. I'm reasonably confident that this is the chain of events here:
First, prior to 8269fd6e, we would ignore "Content-Encoding" when reading inbound bodies. So if a request was gzipped, we would read a gzipped body, then give `git-http-backend` a gzipped body with "Content-Encoding: gzip". Everything matched normally, so that was fine, except in the cluster.
In the cluster, we'd accept "gzip + compressed body" and proxy it, but not tell cURL that it was already compressed. cURL would think it was raw data, so it would arrive on the repository host with a compressed body but no "Content-Encoding: gzip". Then we'd hand it to git in the same form. This caused the issue in 8269fd6e: handing it compressed data, but no "this is compressed" header.
To fix this, I made us decompress the encoding when we read the body, so the cluster now proxies raw data instead of proxying gzipped data. This fixed the issue in the cluster, but created a new issue on non-cluster hosts. The new issue is that we accept "gzip + compressed body" and decompress the body, but then pass the //original// header to `git-http-backend`. So now we have the opposite problem from what we originally had: a "gzip" header, but a raw body.
To fix //this//, we could do two things:
- Revert 8269fd6e, then change the proxy request to preserve "Content-Encoding" instead.
- Stop telling `git-http-backend` that we're handing it compressed data when we're handing it raw data.
I did the latter here because it's an easier change to make and test, we'll need to interact with the raw data later anyway, to implement repository virtualization in connection with T8238.
Test Plan: See T10264 for users confirming this fix.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10264
Differential Revision: https://secure.phabricator.com/D15258
Summary: Fixes T10259. There was no real reason to do this `ip2long()` stuff in the first place -- it's very slightly smaller, but won't work with ipv6 and the savings are miniscule.
Test Plan:
- Ran migration.
- Viewed logs in web UI.
- Pulled and pushed.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10259
Differential Revision: https://secure.phabricator.com/D15165
Summary: Ref T10228. Commands like `git-http-backend` can emit errors with raw bytes in the output. Sanitize these if present so we can log them in JSON format.
Test Plan: Edited this into production. >_> sneaky sneaky <_<
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10228
Differential Revision: https://secure.phabricator.com/D15144
Summary:
Ref T10228. This is currently quite limited:
- No UI.
- No SSH support.
My primary goal is to debug the issue in T10228. In the long run we can expand this to be a bit fancier.
Test Plan:
Made various valid and invalid clones, got sucess responses and not-so-successful responses, viewed the log table for general corresponding messages and broad sanity.
Ran GC via `bin/phd debug trigger`, no issues.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10228
Differential Revision: https://secure.phabricator.com/D15127
Summary: Ref T4245. This is the last of it, and covers the clone/push stuff.
Test Plan:
- Cloned git.
- Pushed git.
- Cloned mercurial.
- Pushed mercurial.
- Visited a `blah.git` URL in my browser just because; got redirected to a human-facing UI.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4245
Differential Revision: https://secure.phabricator.com/D14949
Summary:
If Mercurial 3.4+ is used to host repositories in Phabricator, any clients using 3.5+ will receive an exception after the bundle is pushed up. Clients will also fail to update phases for changesets pushed up.
Before directly responding to mercurial clients with all capabilities, this change filters out the 'bundle2' capability so the client negotiates using a legacy bundle wire format instead.
Test Plan:
Server: Mercurial 3.5
Client: Mercurial 3.4
Test with both HTTP and SSH protocols:
1. Create a local commit on client
2. Push commit to server
3. Verify the client emits something like:
```
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
```
Closes T9450
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin
Maniphest Tasks: T9450
Differential Revision: https://secure.phabricator.com/D14241
Summary:
Ref T7019. When we receive a `git clone https://` (or `git push` on HTTP/S), and the repository is not local, proxy the request to the appropriate service.
This has scalability limits, but they are not more severe than the existing limits (T4369) and are about as abstracted as we can get them.
This doesn't fully work in a Phacility context because the commit hook does not know which instance it is running in, but that problem is not unique to HTTP.
Test Plan:
- Pushed and pulled a Git repo via proxy.
- Pulled a Git repo normally.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7019
Differential Revision: https://secure.phabricator.com/D11494
Summary: Fixes T5646. Makes diffusion a much better user experience. Users now see a 404 exception page when they have a bad URI. Previously, they saw a developer-facing raw exception.
Test Plan: played around in diffusion a bunch. most of these changes were fairly mechanical at the end of the day.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, epriestley
Maniphest Tasks: T5646
Differential Revision: https://secure.phabricator.com/D11299
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.
Test Plan: Eyeballed it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9431
Summary: Fixes T4443. Plug VCS passwords into the shared key stretching. They don't use any real stretching now (I anticipated doing something like T4443 eventually) so we can just migrate them into stretching all at once.
Test Plan:
- Viewed VCS settings.
- Used VCS password after migration.
- Set VCS password.
- Upgraded VCS password by using it.
- Used VCS password some more.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4443
Differential Revision: https://secure.phabricator.com/D8272
Summary:
Ref T4175. This allows these URIs to all be valid for Git and Mercurial:
/diffusion/X/
/diffusion/X/anything.git
/diffusion/X/anything/
This mostly already works, it just needed a few tweaks.
Test Plan: Cloned git and hg working copies using HTTP and SSH.
Reviewers: btrahan, chad
Reviewed By: chad
CC: aran
Maniphest Tasks: T4175
Differential Revision: https://secure.phabricator.com/D8098
Summary: Ref T4195. Stores remote address and protocol in the logs, where possible.
Test Plan: Pushed some stuff, looked at the log, saw data.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4195
Differential Revision: https://secure.phabricator.com/D7711
Summary: Ref T4189. Fixes T2066. Mercurial has a //lot// of hooks so I'm not 100% sure this is all we need to install (we may need separate hooks for tags/bookmarks) but it should cover most of what we're after at least.
Test Plan:
- `bin/repository pull`'d a Mercurial repo and got a hook install.
- Pushed to a Mercurial repository over SSH and HTTP, with good/bad hooks. Saw hooks fire.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2066, T4189
Differential Revision: https://secure.phabricator.com/D7685
Summary:
Ref T4189. T4189 describes most of the intent here:
- When updating hosted repositories, sync a pre-commit hook into them instead of doing a `git fetch`.
- The hook calls into Phabricator. The acting Phabricator user is sent via PHABRICATOR_USER in the environment. The active repository is sent via CLI.
- The hook doesn't do anything useful yet; it just veifies basic parameters, does a little parsing, and exits 0 to allow the commit.
Test Plan:
- Performed Git pushes and pulls over SSH and HTTP.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4189
Differential Revision: https://secure.phabricator.com/D7682
Summary:
Ref T2230. When fully set up, we have up to three users who all need to write into the repositories:
- The webserver needs to write for HTTP receives.
- The SSH user needs to write for SSH receives.
- The daemons need to write for "git fetch", "git clone", etc.
These three users don't need to be different, but in practice they are often not likely to all be the same user. If for no other reason, making them all the same user requires you to "git clone httpd@host.com", and installs are likely to prefer "git clone git@host.com".
Using three different users also allows better privilege separation. Particularly, the daemon user can be the //only// user with write access to the repositories. The webserver and SSH user can accomplish their writes through `sudo`, with a whitelisted set of commands. This means that even if you compromise the `ssh` user, you need to find a way to escallate from there to the daemon user in order to, e.g., write arbitrary stuff into the repository or bypass commit hooks.
This lays some of the groundwork for a highly-separated configuration where the SSH and HTTP users have the fewest privileges possible and use `sudo` to interact with repositories. Some future work which might make sense:
- Make `bin/phd` respect this (require start as the right user, or as root and drop privileges, if this configuration is set).
- Execute all `git/hg/svn` commands via sudo?
Users aren't expected to configure this yet so I haven't written any documentation.
Test Plan:
Added an SSH user ("dweller") and gave it sudo by adding this to `/etc/sudoers`:
dweller ALL=(epriestley) SETENV: NOPASSWD: /usr/bin/git-upload-pack, /usr/bin/git-receive-pack
Then I ran git pushes and pulls over SSH via "dweller@localhost". They successfully interacted with the repository on disk as the "epriestley" user.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7589
Summary:
Small step forward which improves existing stuff or lays groudwork for future stuff:
- Currently, to check for email verification, we have to single-query the email address on every page. Instead, denoramlize it into the user object.
- Migrate all the existing users.
- When the user verifies an email, mark them as `isEmailVerified` if the email is their primary email.
- Just make the checks look at the `isEmailVerified` field.
- Add a new check, `isUserActivated()`, to cover email-verified plus disabled. Currently, a non-verified-but-not-disabled user could theoretically use Conduit over SSH, if anyone deployed it. Tighten that up.
- Add an `isApproved` flag, which is always true for now. In a future diff, I want to add a default-on admin approval queue for new accounts, to prevent configuration mistakes. The way it will work is:
- When the queue is enabled, registering users are created with `isApproved = false`.
- Admins are sent an email, "[Phabricator] New User Approval (alincoln)", telling them that a new user is waiting for approval.
- They go to the web UI and approve the user.
- Manually-created accounts are auto-approved.
- The email will have instructions for disabling the queue.
I think this queue will be helpful for new installs and give them peace of mind, and when you go to disable it we have a better opportunity to warn you about exactly what that means.
Generally, I want to improve the default safety of registration, since if you just blindly coast through the path of least resistance right now your install ends up pretty open, and realistically few installs are on VPNs.
Test Plan:
- Ran migration, verified `isEmailVerified` populated correctly.
- Created a new user, checked DB for verified (not verified).
- Verified, checked DB (now verified).
- Used Conduit, People, Diffusion.
Reviewers: btrahan
Reviewed By: btrahan
CC: chad, aran
Differential Revision: https://secure.phabricator.com/D7572
Summary:
Ref T2230. As far as I can tell, getting SVN working over HTTP is incredibly complicated. It's all DAV-based and doesn't appear to have any kind of binary we can just execute and pass requests through to. Don't support it for now.
- Disable it in the UI.
- Make sure all the error messages are reasonable.
Test Plan: Tried to HTTP an SVN repo. Tried to clone a Git repo with SVN, got a good error message.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7562
Summary:
Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol.
Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong.
A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct.
Test Plan:
- Ran `hg clone` over SSH.
- Ran `hg fetch` over SSH.
- Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success).
Reviewers: btrahan, asherkin
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7553
Summary: Ref T2230. Fixes T4079. As it turns out, this is Git being weird. See comments for some detials about what's going on here.
Test Plan: Created shallow and deep Git clones.
Reviewers: hach-que, btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4079, T2230
Differential Revision: https://secure.phabricator.com/D7554
Summary: Ref T2230. This is easily the worst thing I've had to write in a while. I'll leave some notes inline.
Test Plan: Ran `hg clone http://...` on a hosted repo. Ran `hg push` on the same. Changed sync'd both ways.
Reviewers: asherkin, btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7520
Summary: This is starting to get a bit sizable and it turns out Mercurial is sort of a beast, so split the VCS serve stuff into a separate controller.
Test Plan: Pushed and pulled an authenticated Git repository.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, hach-que
Differential Revision: https://secure.phabricator.com/D7494