Summary: Ref T4068. Partly, this moves discovery to the more unit-testable PhabricatorRepositoryDiscoveryEngine. It also fixes some issues, see inlines.
Test Plan: In a Mercurial repository, ran `bin/repository discover --repair`, verified commits came out topographically sorted. Ran without `--repair` and in various other contexts, like with no commits to discover and some-but-not-all commits to discover.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4068
Differential Revision: https://secure.phabricator.com/D7518
Summary: Ref T1493. Consolidate these a bit; they might need some more magic once we do `--noupdate` checkouts. Mostly just trying to clean up and centralize this code a bit.
Test Plan: Viewed and `bin/repository discover`'d Mercurial repos with and without any branches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1493
Differential Revision: https://secure.phabricator.com/D7480
Summary:
Hosted repositories only sometimes survive the pull/discover phases right now, due to issues like:
- Pull tries to `git clone`, but should `git init`.
- Mercurial doesn't handle empty repositories with on branches.
- SVN tries to connect to an invalid remote.
- None of them set the INIT repo flag correctly, so status doesn't get updated properly in the UI.
Fix all this stuff.
Test Plan:
- For each of Git, SVN and Mercurial:
- Created a new repository from the web UI in a deactivated state.
- Made it hosted.
- Manually ran pull/discover.
- Verified we end up with initialized, empty repositories in consistent states.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7474
Summary:
Ref T2350. Fixes T2231.
- Adds log flags around discovery.
- Adds message flags for "needs update". This is basically an out-of-band hint to the daemons that a repository should be pulled sooner than normal. We set the flag when users push a revision, and expose a Conduit method that `arc land` will be able to use.
Test Plan: See screenshots.
Reviewers: btrahan, chad
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2350, T2231
Differential Revision: https://secure.phabricator.com/D7467
Summary: Ref T2230. This cleans up D7442, by using `git for-each-ref` everywhere we can, in a basically reasonable way.
Test Plan:
In bare and non-bare repositories:
- Ran discovery with `bin/repository discover`;
- listed branches on `/diffusion/X/`;
- listed tags on `/diffusion/X/`;
- listed tags, branches and refs on `/diffusion/rXnnnn`.
Reviewers: btrahan, avivey
Reviewed By: avivey
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7447
Summary:
Ref T2230. Although all the non-bare commands //run// fine in bare repos, not all of them do exactly the same thing.
This could use further cleanup, but at least get it working again for now.
Test Plan: Ran `bin/repository pull`, `bin/repository discover`, viewed Diffusion (looked at branch table), viewed a commit (looked at "Branches"), for bare and non-bare git repos.
Reviewers: avive, btrahan, avivey
Reviewed By: avivey
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7442
Summary:
Fixes T3217. Ref T776. Ref T1493. Broadly, this introduces a mechanism which works like this:
- When a repository is created, we set an "importing" flag.
- After discovery completes, we check if a repository has no importing commits. Basically, this is the first time we catch up to HEAD.
- If we're caught up, clear the "importing" flag.
This flag lets us fix some issues:
- T3217. Currently, when you import a new repository and users have rules like "Email me on every commit ever" or "trigger an audit on every commit", we take a bunch of publish actions. Instead, implicitly disable publishing during import.
- An imported but un-pulled repository currently has an incomprehensible error on `/diffusion/X/`. Fix that.
- Show more cues in the UI about importing.
- Made some exceptions more specific.
Test Plan:
This is the new screen for a completely new repo, replacing a giant exception:
{F75443}
- Created a repository, saw it "importing".
- Pulled and discovered it.
- Processed its commits.
- Ran discovery again, saw import flag clear.
- Also this repository was empty, which hit some of the other code.
This is the new "parsed empty repository" UI, which isn't good, but is less broken:
{F75446}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, hach-que
Maniphest Tasks: T3607, T1493, T776, T3217
Differential Revision: https://secure.phabricator.com/D7429
Summary:
- Don't try to pull hosted repos.
- Also, fix the `--verbose` + `--trace` interaction for `bin/repository`.
- Also, fix a couple of unit tests which got tweaked earlier.
Test Plan:
$ ./bin/repository pull GTEST --verbose
Pulling 'GTEST'...
Repository "GTEST" is hosted, so Phabricator does not pull updates for it.
Done.
Reviewers: btrahan, hach-que
Reviewed By: hach-que
CC: aran
Maniphest Tasks: T2230
Differential Revision: https://secure.phabricator.com/D7427
Summary: Ref T603. Move to real Query classes.
Test Plan:
- Ran `phd debug pull X` (where `X` does not match a repository).
- Ran `phd debug pull Y` (where `Y` does match a repository).
- Ran `phd debug pull`.
- Ran `repository pull`.
- Ran `repository pull X`.
- Ran `repository pull Y`.
- Ran `repository discover`.
- Ran `repository delete`.
- Ran `grep`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T603
Differential Revision: https://secure.phabricator.com/D7137
Summary: Ref T2569. When repository pulls fail, they don't necessarily list identifying information about which repository failed. Use a proxy exception to list that information.
Test Plan: {F51267}
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2569
Differential Revision: https://secure.phabricator.com/D6548
Summary: Ref T3377. MySQL ignores indexes if we hand it mismatched datatypes. This seems colossally dumb, but give it what it expects.
Test Plan: wat
Reviewers: wez, btrahan
Reviewed By: wez
CC: aran
Maniphest Tasks: T3377
Differential Revision: https://secure.phabricator.com/D6201
Summary:
Ref T2784. Begins pulling discovery into Engines and covering it with tests. In particular:
- Discovery is currently a one-shot process where we find all the new commits and write them to the database in one go. Split it apart so we find and return the new commits first, then write them to the database separately. This makes things simpler and more testable.
- This diff only brings SVN into an engine (and only the "find the commits" part), since it's simpler than Git or Mercurial.
- Creates a base Engine class and moves common functionality there.
- Restores the `--verbose` flag to `repository pull`.
Test Plan: Added unit tests. Ran `bin/repository discover`. Ran `bin/phd debug pulllocal`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2784
Differential Revision: https://secure.phabricator.com/D5906
Summary:
Ref T2784. This moves us toward being able to test the background and Conduit pipelines for repositories. In particular:
- Separate the logic for pulling repositories (`git pull`, `hg pull`) out of `PhabricatorRepositoryPullLocalDaemon` and put it in `PhabricatorRepositoryPullEngine`. This allows repositories to be pulled directly without invoking the daemons.
- Add tests for the engine, including a future-looking base test case.
- Add basic `PhutilDirectoryFixture`-based repositories.
Next steps:
# Do the same for repo discovery.
# Then we can start writing tests against specific Conduit methods.
Test Plan: Ran unit tests. Ran `bin/repository pull` on SVN, Hg and Git repositories. Ran `bin/phd debug pulllocal`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, nh
Maniphest Tasks: T2784
Differential Revision: https://secure.phabricator.com/D5904
Summary: ref T2784. This one had a few fun spots where I had to move data around. Also, is there some common object (or should I add it?) that can do this toDictionary newFromConduit stuff? Also, this assumes D5803 is largely correct at the time of this diff.
Test Plan: browsed mercurial and git repository page. saw the branches i expected.
Reviewers: epriestley
Reviewed By: epriestley
CC: chad, aran, Korvin
Maniphest Tasks: T2784
Differential Revision: https://secure.phabricator.com/D5810
Summary:
This adds a configuration option for repositories to be marked as managed
by phabricator, which is then used by the pullLocal daemon to try to recover
from some errors it runs into.
Test Plan: Set a bogus uri for a repo, saw that the pullLocal daemon fixed it.
Reviewers: epriestley, vrana
Reviewed By: epriestley
CC: aran, Korvin
Differential Revision: https://secure.phabricator.com/D3680
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary:
Currently, when taskmasters complete a task it is immediately deleted. This prevents us from doing some general things, like:
- Supporting the idea of permanent failure (e.g., after N failures just stop trying).
- Showing the user how fast taskmasters are completing tasks.
- Showing the user how long tasks took to complete.
Having better visibility into this is important to Drydock, which builds on the task system. Also, generally buff debug output for task execution.
Test Plan: Ran `bin/phd debug taskmaster`. Ran `bin/phd debug garbage`. Queued some tasks via various systems.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D3852
Summary:
See D3033, T1529. We currently transform "scp-style" `user@host:path` URIs into normal `ssh://user@host/path` URIs. This is undesirable for two reasons:
- The paths aren't always equivalent. They are for GitHub, which is why I missed this originally, but in the general case the ":path" is resolved relatively and the "/path" is resolved absolutely. So this transformation can break things.
- It confuses users, who do not think of "git@host:path" URIs as SSH URIs even though the SSH protocol is implied.
So stop using them, and just use the "git@host:path" URIs instead. This is a bit messy since we have some validation built up on top of URIs. Hopefully we can get rid of more of this in the future as we simplify repository management.
Test Plan: Unit tests cover this stuff pretty well. Made a new git repository with a "git@host:path" style URI and did pull/discover on it, verified the right URI was used.
Reviewers: btrahan, vrana
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1529
Differential Revision: https://secure.phabricator.com/D3036
Summary:
The locks held by read-only pullLocal daemons were causing our discovery instance
to not get the lock and fail at discovery. We don't need to hold the lock while
pulling (only while discovering), so this moves the lock to the appropriate place.
Test Plan: tested in production
Reviewers: jungejason, epriestley, vrana
Reviewed By: jungejason
CC: aran, Korvin
Differential Revision: https://secure.phabricator.com/D2890
Summary:
We have a race condition right now, where we may insert a commit without `seenOnBranches`. This means shouldAutocloseCommit will return false (since it's not on any autoclose branches) if the message parser runs fast enough, causing it to associate the commit but not close the revision. This happened for D2851.
Also prompt the user to repair broken repositories.
Test Plan: Ran discovery / repair. Ran discovery on new commits. Verified 'seenOnBranches' value. Deleted some data, verified "repair" error. Repaired repository.
Reviewers: jungejason, nh, vrana, btrahan
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D2858
Summary: Allow multiple daemons to run without contention.
Test Plan: Ran multiple daemons simultaneously in "debug" mode, observed them acquiring (and sometimes failing to acquire) locks.
Reviewers: btrahan, jungejason, nh
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1400
Differential Revision: https://secure.phabricator.com/D2877
Summary:
Improve performance of large discovery tasks in Git by using subprocess streaming, like we do for Mercurial.
Basically, we save the cost of running many `git log` commands by running one big `git log` command but only parsing as much of it as we need to. This is pretty complicated, but we more or less need it for mercurial (which has ~100ms of 'hg' overhead instead of ~5ms of 'git' overhead) so we're already committed to most of the complexity costs. The git implementation is much simpler than the hg implementation because we don't need to handle all the weird parent rules (git gives us to them easily).
Test Plan:
Before, `discover --repair` on Phabricator took 35s:
real 0m35.324s
user 0m13.364s
sys 0m21.088s
Now 7s:
real 0m7.236s
user 0m2.436s
sys 0m3.444s
Note that most of the time is spent inserting rows after discover, the actual speedup of the git discovery part is much larger (subjectively, it runs in less than a second now, from ~28 seconds before).
Also ran discover/pull on single new commits in normal cases to verify that nothing broke in the common case.
Reviewers: jungejason, nh, vrana
Reviewed By: vrana
CC: aran
Maniphest Tasks: T1401
Differential Revision: https://secure.phabricator.com/D2851
Summary:
If a repository is missing commits because they mysteriously vanished, there's no reasonable way to get them back right now. Provide a way to ignore the state in the database and rediscover the entire repository unconditionally.
We don't queue any reparses or anything, but when I move reparse into this script we can hook things up or something. This generally shouldn't be too important anyway.
Test Plan: Ran `repository discover --repair` on Phabricator.
Reviewers: jungejason, nh, vrana
Reviewed By: vrana
CC: aran
Differential Revision: https://secure.phabricator.com/D2850
Summary: Add verbose logging. This logging is activated by setting "phd.verbose" in the config, running "phd debug", or explicitly in scripts/repository/pull.php and scripst/repository/discover.php
Test Plan:
>>> orbital ~/devtools/phabricator $ ./scripts/repository/discover.php GTEST
Discovering 'GTEST'...
<VERB> PhabricatorRepositoryPullLocalDaemon Discovering commits in repository 'GTEST'...
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '()_+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '_+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch', at 774c7737b2d560a291697126bf4513204ccf661a.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch-1', at dc97539bee07293f95990d71f4638335a2531d69.
<VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
<VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch-2', at 1acfaec313c46dd3caa90448800181fb91b0270f.
Reviewers: jungejason
Reviewed By: jungejason
CC: aran
Differential Revision: https://secure.phabricator.com/D2843
Summary: it's calling pull daemon's failure. This is actually Nick's fix.
Test Plan: Nick already manually ran it on daemon machine.
Reviewers: vrana, nh
Reviewed By: vrana
CC: aran, epriestley
Differential Revision: https://secure.phabricator.com/D2828
Summary:
The autoclose logic is currently doing a little too much work. We want to parse each commit at most twice:
# When it first appears in the repository.
# When it first appears on an autoclose branch.
These two events might not be distinct (i.e., it might first appear on an autoclose branch).
Currently, to discover commits initially appearing on autoclose branches, we check each branch, determine if it's an autoclose branch or not, and determine if the HEAD is already a known commit on an autoclose branch. This is correct so far, and allows us to ignore branches which either haven't changed or have commits at HEAD which we've already examined.
However, if an autoclose branch has a new commit, we start working backward through it. Prior to this patch, we only stop when we hit commits that we've already discovered lie on this branch. If the branch is new, none of the commits will be discovered on it (they're discovered in general, and likely discovered on other autoclose branches, but not discovered on this branch), so we'll parse all the way back to the root.
Instead, we want to stop when we hit commits that we've already discovered on //any// autoclose branch.
So do that.
Test Plan: Pushed a new branch, then pushed a new commit at HEAD. Ran discovery, verified we rediscovered only 1 commit, not every commit in history.
Reviewers: vrana, jungejason, aurelijus
Reviewed By: vrana
CC: aran
Maniphest Tasks: T1389
Differential Revision: https://secure.phabricator.com/D2798
Summary: Implemented it how it was suggested in ticket comments
Test Plan: create a revision in a branch, push that branch up, verify it's visible in diffusion and also that revision is not closed, then merge and push to master, verify that revision closed
Reviewers: epriestley
Reviewed By: epriestley
CC: aran, Korvin, 20after4
Maniphest Tasks: T1210
Differential Revision: https://secure.phabricator.com/D2706
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.
NOTE: `arc diff` timed out so I'm pushing it without review.
Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.
Auditors: epriestley
Maniphest Tasks: T1103
Summary: Prelminary, I want to test this a bit more when I'm better rested. Provide a "bulk" import mode for Mercurial so we can do initial discovery more quickly.
Test Plan: Imported the Mercurial repository in 2m45s, blocked on MySQL I/O rather than Mercurial I/O.
Reviewers: csilvers, btrahan
Reviewed By: csilvers
CC: aran
Differential Revision: https://secure.phabricator.com/D2457
Summary:
- Merge CommitTask daemon into PullLocal daemon. This is another artifact of past instability (and order-dependent parsers). We still publish to the timeline, although this was the last consumer. Long term we'll probably delete timeline and move to webhooks, since everyone who has asked about this stuff has been eager to trade away the durability and ordering of the timeline for the ease of use of webhooks. There's also no reason to timeline this anymore since parsing is no longer order-dependent.
- Add `phd start` to start all the daemons you need. Add `phd restart` to restart all the daemons you need. So cool~
- Simplify and improve phd and Diffusion daemon documentation.
Test Plan:
- Ran `phd start`.
- Ran `phd restart`.
- Generated/read documentation.
- Imported some stuff, got clean parses.
Reviewers: btrahan, csilvers
Reviewed By: csilvers
CC: aran, jungejason, nh
Differential Revision: https://secure.phabricator.com/D2433
Summary:
See D2418. This merges the commit discovery daemon into the same single daemon, and applies all the same rules to it.
There are relatively few implementation changes, but a few things did change:
- I simplified/improved Mercurial importing, by finding full branch tip hashes with "--debug branches" and using "parents --template {node}" so we don't need to do separate "--debug id" calls.
- Added a new "--not" flag to exclude repositories, since I switched to real arg parsing anyway.
- I removed a web UI notification that you need to restart the daemons, this is no longer true.
- I added a web UI notification that no pull daemon is running on the machine.
NOTE: @makinde, this doesn't change anything from your perspective, but it something breaks this is the likely cause.
This implicitly resolves T792, because discovery no longer runs before pulling.
Test Plan:
- Swapped databases to a fresh install.
- Ran "pulllocal" in debug mode. Verified it correctly does nothing (fixed a minor issue with min() on empty array).
- Added an SVN repository. Verified it cloned and discovered correctly.
- Added a Mercurial repository. Verified it cloned and discovered correctly.
- Added a Git repository. Verified it cloned and discovered correctly.
- Ran with arguments to verify behaviors: "--not MTEST --not STEST", "P --no-discovery", "P".
Reviewers: btrahan, csilvers, Makinde
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T792
Differential Revision: https://secure.phabricator.com/D2430
Summary:
Allow the pull daemon to take a list of repositories. By default, pull all repositories.
Make some effort to respect pull frequencies, although we'll necessarily suffer a bit if running with only one process.
NOTE: We still launch one discovery daemon per working copy, so this only cuts the daemon count in half.
Test Plan:
- Ran `phd debug pulllocal`, verified behavior.
- Ran `pull.php P MTEST SVNTEST --trace`, verified it pulled the repos and ran the right commands.
- Ran `phd repository-launch-master`, verified the right daemons launched, checked daemon console.
- Ran `phd repository-launch-readonly`, verified the right daemon launched, checked daemon console.
Reviewers: btrahan, csilvers, davidreuss
Reviewed By: csilvers
CC: aran
Differential Revision: https://secure.phabricator.com/D2418
Summary:
This is somewhat controversial but push date is usually more useful than commit date (which can be for example a month before other people can see the commit).
We can also store both dates.
Test Plan:
git log --pretty="%ct %at"
Reviewers: epriestley
Reviewed By: epriestley
CC: nh, aran, Koolvin
Differential Revision: https://secure.phabricator.com/D2319
Summary:
PHP arrays have an internal "current position" marker. (I think because foreach() wasn't introduced until PHP 4 and there was no way to get rid of it by then?)
A few functions affect the position of the marker, like reset(), end(), each(), next(), and prev(). A few functions read the position of the marker, like each(), next(), prev(), current() and key().
For the most part, no one uses any of this because foreach() is vastly easier and more natural. However, we sometimes want to select the first or last key from an array. Since key() returns the key //at the current position//, and you can't guarantee that no one will introduce some next() calls somewhere, the right way to do this is reset() + key(). This is cumbesome, so we introduced head_key() and last_key() (like head() and last()) in D2161.
Switch all the reset()/end() + key() (or omitted reset() since I was feeling like taking risks + key()) calls to head_key() or last_key().
Test Plan: Verified most of these by visiting the affected pages.
Reviewers: btrahan, vrana, jungejason, Koolvin
Reviewed By: jungejason
CC: aran
Differential Revision: https://secure.phabricator.com/D2169
Summary:
See <https://github.com/facebook/phabricator/issues/102>. Between Feb 1 and Mar 1, the hg released changed the exit code behavior of "hg pull". This broke us mildly (and a bunch of other applications more severely, which is why it was reverted).
Detect the common case of this (english) and don't fail.
Test Plan: @killermonk, can you try applying this? I'll try to do an upgrade to 2.1 and see if I can also do a proper test.
Reviewers: Makinde, btrahan, killermonk
Reviewed By: btrahan
CC: killermonk, aran, epriestley
Differential Revision: https://secure.phabricator.com/D1948
Summary: Last of the big final patches. Left a few debatable classes (12 out of about 400) that I'll deal with individually eventually.
Test Plan: Ran testEverythingImplemented.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, epriestley
Maniphest Tasks: T795
Differential Revision: https://secure.phabricator.com/D1881
Summary: We've hit a couple of these in the wild, raise better error messages when the local repo is toast / broken / nonsense.
Test Plan: Broke my local repo in all of the different ways we test for, verified I got an error message in each case.
Reviewers: btrahan, abirchall
Reviewed By: btrahan
CC: aran, epriestley
Maniphest Tasks: T964, T924
Differential Revision: https://secure.phabricator.com/D1855
Summary: We'll keep deleted branches around right now because Git's behavior is not to remove them without --prune.
Test Plan: Ran "git fetch --all --prune" to make sure it at least ostensibly works.
Reviewers: btrahan, 20after4
Reviewed By: 20after4
CC: aran, epriestley
Differential Revision: https://secure.phabricator.com/D1833