Summary: If you push a large binary and the data crosses multiple data frames, we can end up in a loop in the parser.
Test Plan:
After this change, I was able to push a 95MB binary in 7s, which seems reasonable:
>>> orbital ~/repos/INIS $ svn st
A large2.bin
>>> orbital ~/repos/INIS $ ls -alh
total 390648
drwxr-xr-x 6 epriestley admin 204B Dec 18 17:14 .
drwxr-xr-x 98 epriestley admin 3.3K Dec 16 11:19 ..
drwxr-xr-x 7 epriestley admin 238B Dec 18 17:14 .svn
-rw-r--r-- 1 epriestley admin 80B Dec 18 15:07 README
-rw-r--r-- 1 epriestley admin 95M Dec 18 16:53 large.bin
-rw-r--r-- 1 epriestley admin 95M Dec 18 17:14 large2.bin
>>> orbital ~/repos/INIS $ time svn commit -m 'another large binary'
Adding (bin) large2.bin
Transmitting file data .
Committed revision 25.
real 0m7.215s
user 0m5.327s
sys 0m0.407s
>>> orbital ~/repos/INIS $
There may be room to improve this by using `PhutilRope`.
Reviewers: wrotte, btrahan, wotte
Reviewed By: wotte
CC: aran
Differential Revision: https://secure.phabricator.com/D7798
Summary:
Ref T2230. The SVN protocol has a sensible protocol format with a good spec here:
http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol
Particularly, compare this statement to the clown show that is the Mercurial wire protocol:
> It is possible to parse an item without knowing its type in advance.
WHAT A REASONABLE STATEMENT TO BE ABLE TO MAKE ABOUT A WIRE PROTOCOL
Although it makes substantially more sense than Mercurial, it's much heavier-weight than the Git or Mercurial protocols, since it isn't distributed.
It's also not possible to figure out if a request is a write request (or even which repository it is against) without proxying some of the protocol frames. Finally, several protocol commands embed repository URLs, and we need to reach into the protocol and translate them.
Test Plan: Ran various SVN commands over SSH (`svn log`, `svn up`, `svn commit`, etc).
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7556
Summary:
Ref T2230. This is substantially more complicated than Git, but mostly because Mercurial's protocol is a like 50 ad-hoc extensions cobbled together. Because we must decode protocol frames in order to determine if a request is read or write, 90% of this is implementing a stream parser for the protocol.
Mercurial's own parser is simpler, but relies on blocking reads. Since we don't even have methods for blocking reads right now and keeping the whole thing non-blocking is conceptually better, I made the parser nonblocking. It ends up being a lot of stuff. I made an effort to cover it reasonably well with unit tests, and to make sure we fail closed (i.e., reject requests) if there are any parts of the protocol I got wrong.
A lot of the complexity is sharable with the HTTP stuff, so it ends up being not-so-bad, just very hard to verify by inspection as clearly correct.
Test Plan:
- Ran `hg clone` over SSH.
- Ran `hg fetch` over SSH.
- Ran `hg push` over SSH, to a read-only repo (error) and a read-write repo (success).
Reviewers: btrahan, asherkin
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7553
Summary: Ref T2230. This is easily the worst thing I've had to write in a while. I'll leave some notes inline.
Test Plan: Ran `hg clone http://...` on a hosted repo. Ran `hg push` on the same. Changed sync'd both ways.
Reviewers: asherkin, btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7520