1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-25 00:02:41 +01:00
Commit graph

53 commits

Author SHA1 Message Date
epriestley
7b50b2fbdc Use subprocess output streaming to improve performance of Git commit discovery
Summary:
Improve performance of large discovery tasks in Git by using subprocess streaming, like we do for Mercurial.

Basically, we save the cost of running many `git log` commands by running one big `git log` command but only parsing as much of it as we need to. This is pretty complicated, but we more or less need it for mercurial (which has ~100ms of 'hg' overhead instead of ~5ms of 'git' overhead) so we're already committed to most of the complexity costs. The git implementation is much simpler than the hg implementation because we don't need to handle all the weird parent rules (git gives us to them easily).

Test Plan:
Before, `discover --repair` on Phabricator took 35s:

  real	0m35.324s
  user	0m13.364s
  sys	0m21.088s

Now 7s:

  real	0m7.236s
  user	0m2.436s
  sys	0m3.444s

Note that most of the time is spent inserting rows after discover, the actual speedup of the git discovery part is much larger (subjectively, it runs in less than a second now, from ~28 seconds before).

Also ran discover/pull on single new commits in normal cases to verify that nothing broke in the common case.

Reviewers: jungejason, nh, vrana

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1401

Differential Revision: https://secure.phabricator.com/D2851
2012-06-25 15:21:48 -07:00
epriestley
ca31e3e84b Add a --repair flag to "bin/repository discover"
Summary:
If a repository is missing commits because they mysteriously vanished, there's no reasonable way to get them back right now. Provide a way to ignore the state in the database and rediscover the entire repository unconditionally.

We don't queue any reparses or anything, but when I move reparse into this script we can hook things up or something. This generally shouldn't be too important anyway.

Test Plan: Ran `repository discover --repair` on Phabricator.

Reviewers: jungejason, nh, vrana

Reviewed By: vrana

CC: aran

Differential Revision: https://secure.phabricator.com/D2850
2012-06-25 12:35:47 -07:00
epriestley
a705f336a3 Add vebose logging to PhutilRepositoryPullDaemon
Summary: Add verbose logging. This logging is activated by setting "phd.verbose" in the config, running "phd debug", or explicitly in scripts/repository/pull.php and scripst/repository/discover.php

Test Plan:
  >>> orbital ~/devtools/phabricator $ ./scripts/repository/discover.php GTEST
  Discovering 'GTEST'...
  <VERB> PhabricatorRepositoryPullLocalDaemon Discovering commits in repository 'GTEST'...
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '()_+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch '_+abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'abcd$100', at a37bc285a12efa7224fe19f3df54cd90fa2b897a.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch', at 774c7737b2d560a291697126bf4513204ccf661a.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch-1', at dc97539bee07293f95990d71f4638335a2531d69.
  <VERB> PhabricatorRepositoryPullLocalDaemon Skipping, HEAD is known.
  <VERB> PhabricatorRepositoryPullLocalDaemon Examining branch 'arcpatch-2', at 1acfaec313c46dd3caa90448800181fb91b0270f.

Reviewers: jungejason

Reviewed By: jungejason

CC: aran

Differential Revision: https://secure.phabricator.com/D2843
2012-06-24 15:06:40 -07:00
Jason Ge
de46a2550c Fix auto-close in a repo with missing commits in Phabricator db
Summary:
After adding more error logging: https://secure.phabricator.com/differential/diff/5295/, discover.php prints the output: https://secure.phabricator.com/P427

The problem is that there are gaps in the commits and revisions in Phabricator's DB for the repo. For some git commits, both the phabricator commits and the differential revision are missing (because of a mysterious DB data loss. nharper should be able to recover it); for some other git commits, the phabricator commits are missing (probably also because of the daba loss mentioned above). This can also happen if the git commits were too big to be parsed by phabricator.

Because of the gap, the pull daemon detects them as new autoclose commits on the branch
  (https://secure.phabricator.com/diffusion/P/browse/master/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php;382bafa271ec4962$267)
  (https://secure.phabricator.com/diffusion/P/browse/master/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php;382bafa271ec4962$592)

Then tries to update it
  (https://secure.phabricator.com/diffusion/P/browse/master/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php;382bafa271ec4962$616)
and will fail because the commit doesn't exist in the DB.
  (https://secure.phabricator.com/diffusion/P/browse/master/src/applications/repository/daemon/PhabricatorRepositoryPullLocalDaemon.php;382bafa271ec4962$341)
After the exception is thrown, all the commits just detected are not marked as 'seenOnBranches', so they won't be marked as closed by the CommitMessageParserWorker.

This also explains the weird behavior where the parsing of the commits is always paused for 15 minutes and then gets started again. The parsing pauses because the process listed at https://secure.phabricator.com/P427 takes quite some time to finish before it can record tha tasks. The parsing happens all the time because the process will be executed again and again.

A better fix might be to put all the missing commits in the repository_badcommit table and check against it.

Test Plan: will run 'discover.php FBCODE' to test it.

Reviewers: epriestley, vrana

Reviewed By: vrana

CC: nh, vrana, aran, Korvin

Differential Revision: https://secure.phabricator.com/D2846
2012-06-23 09:47:20 -07:00
Jason Ge
a75da4efac Revert "run record commit if its the first time seen"
Summary: This reverts commit 22e606d49e.

Test Plan: none

Reviewers: epriestley, nh

Reviewed By: nh

CC: nh, vrana, aran, Korvin

Differential Revision: https://secure.phabricator.com/D2833
2012-06-21 20:26:07 -07:00
Jason Ge
22e606d49e run record commit if its the first time seen
Test Plan: test live.

Reviewers: nh

Reviewed By: nh

CC: epriestley, aran

Differential Revision: https://secure.phabricator.com/D2832
2012-06-21 19:47:25 -07:00
Jason Ge
06c976b738 check commit before calling its method
Summary: it's calling pull daemon's failure. This is actually Nick's fix.

Test Plan: Nick already manually ran it on daemon machine.

Reviewers: vrana, nh

Reviewed By: vrana

CC: aran, epriestley

Differential Revision: https://secure.phabricator.com/D2828
2012-06-21 18:36:31 -07:00
epriestley
e48830eecb Don't rediscover an entire branch when a new commit appears at HEAD
Summary:
The autoclose logic is currently doing a little too much work. We want to parse each commit at most twice:

  # When it first appears in the repository.
  # When it first appears on an autoclose branch.

These two events might not be distinct (i.e., it might first appear on an autoclose branch).

Currently, to discover commits initially appearing on autoclose branches, we check each branch, determine if it's an autoclose branch or not, and determine if the HEAD is already a known commit on an autoclose branch. This is correct so far, and allows us to ignore branches which either haven't changed or have commits at HEAD which we've already examined.

However, if an autoclose branch has a new commit, we start working backward through it. Prior to this patch, we only stop when we hit commits that we've already discovered lie on this branch. If the branch is new, none of the commits will be discovered on it (they're discovered in general, and likely discovered on other autoclose branches, but not discovered on this branch), so we'll parse all the way back to the root.

Instead, we want to stop when we hit commits that we've already discovered on //any// autoclose branch.

So do that.

Test Plan: Pushed a new branch, then pushed a new commit at HEAD. Ran discovery, verified we rediscovered only 1 commit, not every commit in history.

Reviewers: vrana, jungejason, aurelijus

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1389

Differential Revision: https://secure.phabricator.com/D2798
2012-06-19 13:47:01 -07:00
Ben Rogers
444c634b6c Allows branches to appear in diffusion with out triggering closing things from the commit message
Summary: Implemented it how it was suggested in ticket comments

Test Plan: create a revision in a branch, push that branch up, verify it's visible in diffusion and also that revision is not closed, then merge and push to master, verify that revision closed

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin, 20after4

Maniphest Tasks: T1210

Differential Revision: https://secure.phabricator.com/D2706
2012-06-15 14:10:15 -07:00
epriestley
375c921bb0 Minor, fix issue with defaults: wildcard and repeat default to array and may not have a different default. 2012-06-12 07:05:57 -07:00
vrana
2e484e257d Fix lint errors found by Nemo
Summary:
See also:

- https://github.com/tpyo/amazon-s3-php-class/pull/33
- https://github.com/stripe/stripe-php/pull/13

Test Plan: Ran a script analyzing sources by HPHP.

Reviewers: btrahan, jungejason, epriestley

Reviewed By: epriestley

CC: aran, Korvin

Differential Revision: https://secure.phabricator.com/D2713
2012-06-11 19:09:42 -07:00
vrana
6cc196a2e5 Move files in Phabricator one level up
Summary:
- `kill_init.php` said "Moving 1000 files" - I hope that this is not some limit in `FileFinder`.
- [src/infrastructure/celerity] `git mv utils.php map.php; git mv api/utils.php api.php`
- Comment `phutil_libraries` in `.arcconfig` and run `arc liberate`.

NOTE: `arc diff` timed out so I'm pushing it without review.

Test Plan:
/D1234
Browsed around, especially in `applications/repository/worker/commitchangeparser` and `applications/` in general.

Auditors: epriestley

Maniphest Tasks: T1103
2012-06-01 12:32:44 -07:00
epriestley
09c8af4de0 Upgrade phabricator to libphutil v2
Summary: Mechanical changes from D2588. No "Class.php" moves yet.

Test Plan: See D2588.

Reviewers: vrana, btrahan, jungejason

Reviewed By: vrana

CC: aran

Maniphest Tasks: T1103

Differential Revision: https://secure.phabricator.com/D2591
2012-05-30 14:26:29 -07:00
epriestley
820a6d407a Improve performance of repository discovery under Mercurial
Summary: Prelminary, I want to test this a bit more when I'm better rested. Provide a "bulk" import mode for Mercurial so we can do initial discovery more quickly.

Test Plan: Imported the Mercurial repository in 2m45s, blocked on MySQL I/O rather than Mercurial I/O.

Reviewers: csilvers, btrahan

Reviewed By: csilvers

CC: aran

Differential Revision: https://secure.phabricator.com/D2457
2012-05-11 18:29:14 -07:00
epriestley
7e0c06716f Minor, fix obvious typo that I missed in testing somehow.
Auditors: csilvers
2012-05-09 12:43:57 -07:00
epriestley
b800df8c1b Simplify daemon management: "phd start"
Summary:
  - Merge CommitTask daemon into PullLocal daemon. This is another artifact of past instability (and order-dependent parsers). We still publish to the timeline, although this was the last consumer. Long term we'll probably delete timeline and move to webhooks, since everyone who has asked about this stuff has been eager to trade away the durability and ordering of the timeline for the ease of use of webhooks. There's also no reason to timeline this anymore since parsing is no longer order-dependent.
  - Add `phd start` to start all the daemons you need. Add `phd restart` to restart all the daemons you need. So cool~
  - Simplify and improve phd and Diffusion daemon documentation.

Test Plan:
  - Ran `phd start`.
  - Ran `phd restart`.
  - Generated/read documentation.
  - Imported some stuff, got clean parses.

Reviewers: btrahan, csilvers

Reviewed By: csilvers

CC: aran, jungejason, nh

Differential Revision: https://secure.phabricator.com/D2433
2012-05-09 10:29:37 -07:00
epriestley
d2b01aead0 Use one daemon to discover commits in all repositories, not one per repository
Summary:
See D2418. This merges the commit discovery daemon into the same single daemon, and applies all the same rules to it.

There are relatively few implementation changes, but a few things did change:

  - I simplified/improved Mercurial importing, by finding full branch tip hashes with "--debug branches" and using "parents --template {node}" so we don't need to do separate "--debug id" calls.
  - Added a new "--not" flag to exclude repositories, since I switched to real arg parsing anyway.
  - I removed a web UI notification that you need to restart the daemons, this is no longer true.
  - I added a web UI notification that no pull daemon is running on the machine.

NOTE: @makinde, this doesn't change anything from your perspective, but it something breaks this is the likely cause.

This implicitly resolves T792, because discovery no longer runs before pulling.

Test Plan:

  - Swapped databases to a fresh install.
  - Ran "pulllocal" in debug mode. Verified it correctly does nothing (fixed a minor issue with min() on empty array).
  - Added an SVN repository. Verified it cloned and discovered correctly.
  - Added a Mercurial repository. Verified it cloned and discovered correctly.
  - Added a Git repository. Verified it cloned and discovered correctly.
  - Ran with arguments to verify behaviors: "--not MTEST --not STEST", "P --no-discovery", "P".

Reviewers: btrahan, csilvers, Makinde

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T792

Differential Revision: https://secure.phabricator.com/D2430
2012-05-08 12:53:41 -07:00
epriestley
1c62a35710 Run one daemon to pull all working copies, not one daemon per working copy
Summary:
Allow the pull daemon to take a list of repositories. By default, pull all repositories.

Make some effort to respect pull frequencies, although we'll necessarily suffer a bit if running with only one process.

NOTE: We still launch one discovery daemon per working copy, so this only cuts the daemon count in half.

Test Plan:
  - Ran `phd debug pulllocal`, verified behavior.
  - Ran `pull.php P MTEST SVNTEST --trace`, verified it pulled the repos and ran the right commands.
  - Ran `phd repository-launch-master`, verified the right daemons launched, checked daemon console.
  - Ran `phd repository-launch-readonly`, verified the right daemon launched, checked daemon console.

Reviewers: btrahan, csilvers, davidreuss

Reviewed By: csilvers

CC: aran

Differential Revision: https://secure.phabricator.com/D2418
2012-05-07 15:01:10 -07:00
vrana
38ffe45f8e Use committer date instead of author date for Git epoch
Summary:
This is somewhat controversial but push date is usually more useful than commit date (which can be for example a month before other people can see the commit).
We can also store both dates.

Test Plan:
  git log --pretty="%ct %at"

Reviewers: epriestley

Reviewed By: epriestley

CC: nh, aran, Koolvin

Differential Revision: https://secure.phabricator.com/D2319
2012-04-26 16:25:56 -07:00
epriestley
a5903d2a53 Use head_key() and last_key() to explicitly communicate intent
Summary:
PHP arrays have an internal "current position" marker. (I think because foreach() wasn't introduced until PHP 4 and there was no way to get rid of it by then?)

A few functions affect the position of the marker, like reset(), end(), each(), next(), and prev(). A few functions read the position of the marker, like each(), next(), prev(), current() and key().

For the most part, no one uses any of this because foreach() is vastly easier and more natural. However, we sometimes want to select the first or last key from an array. Since key() returns the key //at the current position//, and you can't guarantee that no one will introduce some next() calls somewhere, the right way to do this is reset() + key(). This is cumbesome, so we introduced head_key() and last_key() (like head() and last()) in D2161.

Switch all the reset()/end() + key() (or omitted reset() since I was feeling like taking risks + key()) calls to head_key() or last_key().

Test Plan: Verified most of these by visiting the affected pages.

Reviewers: btrahan, vrana, jungejason, Koolvin

Reviewed By: jungejason

CC: aran

Differential Revision: https://secure.phabricator.com/D2169
2012-04-09 11:08:59 -07:00
epriestley
ec736f9c50 Handle "hg pull" return code change between 2.1 and 2.1.1 more gracefully
Summary:
See <https://github.com/facebook/phabricator/issues/102>. Between Feb 1 and Mar 1, the hg released changed the exit code behavior of "hg pull". This broke us mildly (and a bunch of other applications more severely, which is why it was reverted).

Detect the common case of this (english) and don't fail.

Test Plan: @killermonk, can you try applying this? I'll try to do an upgrade to 2.1 and see if I can also do a proper test.

Reviewers: Makinde, btrahan, killermonk

Reviewed By: btrahan

CC: killermonk, aran, epriestley

Differential Revision: https://secure.phabricator.com/D1948
2012-03-19 19:19:48 -07:00
epriestley
d0af617818 Add "final" to (almost) everything else
Summary: Last of the big final patches. Left a few debatable classes (12 out of about 400) that I'll deal with individually eventually.

Test Plan: Ran testEverythingImplemented.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran, epriestley

Maniphest Tasks: T795

Differential Revision: https://secure.phabricator.com/D1881
2012-03-13 16:21:04 -07:00
epriestley
def19bb8de Add additional protections against local repository misconfigurations
Summary: We've hit a couple of these in the wild, raise better error messages when the local repo is toast / broken / nonsense.

Test Plan: Broke my local repo in all of the different ways we test for, verified I got an error message in each case.

Reviewers: btrahan, abirchall

Reviewed By: btrahan

CC: aran, epriestley

Maniphest Tasks: T964, T924

Differential Revision: https://secure.phabricator.com/D1855
2012-03-12 10:34:37 -07:00
epriestley
89dac1cf19 When updating git repositories, use --prune to prune old branches
Summary: We'll keep deleted branches around right now because Git's behavior is not to remove them without --prune.

Test Plan: Ran "git fetch --all --prune" to make sure it at least ostensibly works.

Reviewers: btrahan, 20after4

Reviewed By: 20after4

CC: aran, epriestley

Differential Revision: https://secure.phabricator.com/D1833
2012-03-08 13:24:25 -08:00
Edward Speyer
17d801a50e GitFetch daemon: more verbose
Summary: A smidgen more messaging about what's going on.

Test Plan:
Ran it, saw this:

  2012-03-07 5:37:30 PM [STDO] >>> [0] <exec> $ /data/phabricator/bin/list_db_services --tier_name 'cdb.phabricator'
  <<< [0] <exec> 47,101 us
  ...
  >>> [9] <exec> $ mkdir -p '/var/repo'
  <<< [9] <exec> 41,374 us
  Creating new directory /var/repo/fbcode for repo FBCode
  >>> [10] <exec> $ git clone --origin origin 'ssh://fbcode.git.vip.facebook.com/data/gitrepos/fbcode.git' '/var/repo/fbcode'

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, epriestley

Differential Revision: https://secure.phabricator.com/D1819
2012-03-07 19:24:06 -08:00
epriestley
b10fa8023f Support Git implicit file:// URIs
Summary: @s reported an issue with implicit file:// URIs in Git, see P270.
Recognize and handle URIs in this format. For URIs we don't understand, raise an
exception.

Test Plan:
  - Added failing tests.
  - Fixed code.
  - Tests pass.

Reviewers: btrahan, jungejason, s

Reviewed By: s

CC: aran, epriestley, s

Differential Revision: https://secure.phabricator.com/D1362
2012-01-11 09:00:18 -08:00
epriestley
99231ebba4 Allow Git repositories to track only some branches
Summary:
Some installs use Git as the backbone of a CI framework or use a Git remote to
share patches. The tracker scripts currently recognize associated revisions as
"Committed" when they appear in any branch, even if that branch is
"alincoln-personal-development_test_hack" or whatever.

To address the broadest need here, allow Git repositories to be configured to
track only certain branches instead of all branches.

This doesn't allow you to import a branch into Diffusion but ignore it in
Differential. Supporting that is somewhat technically complicated because the
parser currently goes like this:

  - Look at HEAD of all branches.
  - For any commits we haven't seen before, follow them back to something we
have seen (or the root).
  - "Discover" everything new.

Since this doesn't track <branch, commit> pairs, we currently don't have enough
information to tell when a commit appears in a branch for the first time, so we
don't have anywhere we can put a test for whether that branch is tracked and do
the Differential hook only if it is.

However, I think this cruder patch satisfies most of the need and is simple and
obvious in its implementation.

See also D1263.

Test Plan:
  - Updated a Git repository with various filters: "", "master, remote", "derp",
"  ,,, master   ,,,,,"
  - Edited SVN and Mercurial repositories to verify they didn't get caught in
the crossfire.
  - Ran daemon in debug mode on libphutil with filter "derp", got exception
about no tracked branches. Ran with filter "master", got tracking. Ran with no
filter, got tracking.
  - Looked at Diffusion with "derp" and "master", saw no branches and "master"
respectively.
  - Added unit tests to cover filtering logic.

Reviewers: btrahan, jungejason, nh, fratrik

Reviewed By: fratrik

CC: aran, fratrik, epriestley

Maniphest Tasks: T270

Differential Revision: https://secure.phabricator.com/D1290
2012-01-04 10:20:34 -08:00
epriestley
efb0fa739f Make tracked git repositories use an implicit 'origin' remote
Summary:
See T624. I originally wrote this to require an explicit remote, but this
creates an ugly "origin:" in all the URIs and makes T270 more difficult.

Treat all branch names as implying 'origin/'.

Test Plan:
  - Pulled and imported a fresh copy of libphutil without issues.
  - Browsed various git repositories.
  - Browsed Javelin's various branches.
  - Ran upgrade script, got a bunch of clean 'origin/master' -> 'master'
conversions.
  - Tried to specify an explicit remote in a default branch name.
  - Unit tests.

Reviewers: nh, jungejason, btrahan

Reviewed By: btrahan

CC: aran, btrahan

Maniphest Tasks: T624

Differential Revision: https://secure.phabricator.com/D1269
2011-12-29 08:35:32 -08:00
epriestley
abd8efc358 Further relax same origin checks
Summary: Allow paths to match even if they differ by trailing slashes and
".git".

Test Plan: Ran unit tests.

Reviewers: jungejason, btrahan

Reviewed By: jungejason

CC: aran, jungejason

Maniphest Tasks: T710

Differential Revision: https://secure.phabricator.com/D1286
2011-12-26 12:01:59 -08:00
epriestley
e786c44b6f Relax the same-origin check in the Git commit discovery daemon
Summary:
Git accepts either "git@x:/path" or "ssh://git@x.com/path" URIs to mean the
exact same thing, which is causing some false positives and confusion,
particularly because we sometimes mutate URIs.

Since this is just a sanity check, we don't really care about the username,
domain or credentials -- matching the paths is good enough. We're just trying to
make it hard to shoot yourself in the foot by copy-pasting the same local path
into two repositories and forgetting to change one, like I did. :P

Relax the check to only verify the paths are the same.

Test Plan:
  - Ran unit tests, which should fully cover things.
  - Ran commit discovery daemon in debug mode on incorrectly and correctly
configured repositories.

Reviewers: ajtrichards, jungejason, btrahan

Reviewed By: jungejason

CC: aran, jungejason

Maniphest Tasks: T710

Differential Revision: https://secure.phabricator.com/D1279
2011-12-24 08:54:44 -08:00
epriestley
3b65864ee1 Fix two minor bugs with recent patches:
- Use the computed remote URI (which may have an explicit 'ssh://' under Git in some cases).
  - Use '$id' correctly rather than casting the URI to an int in the message parser.
2011-12-22 06:57:04 -08:00
epriestley
92f9741779 Add a safety check to commit discovery daemon
Summary:
Although I couldn't repro the issue in T692, I did manage to point the "Diviner"
repository at the "Phabricator" working copy and screw some stuff up on
secure.phabricator.com.

Before discovering commits in a repository, ensure the 'origin' remote points at
the configured URI. This prevents issues where the working copy gets configured
to point at an existing (but incorrect) checkout.

Test Plan:
  - Ran gitcommitdiscovery daemon normally under "phd debug", saw it execute the
"remote show -n" command and then start working.
  - Intentionally botched the config, got an exception:

  (Exception) Working copy '/INSECURE/repos/phabricator' has origin URL
  'ssh://git@github.com/facebook/phabricator.git', but the configured URL
  'git://github.com/facebook/diviner.git' is expected. Refusing to proceed.

Reviewers: btrahan, jungejason

Reviewed By: jungejason

CC: aran, jungejason

Maniphest Tasks: T692

Differential Revision: 1253
2011-12-20 19:52:08 -08:00
epriestley
b5c9b9d059 Use remote credentials for 'git fetch' and 'hg pull' commands
Summary: These are "local" commands, but need remote credentials. If the daemon
runs as a user who does not have credentials, the initial clone will work but
subsequent updates will fail.

Test Plan:
  - Nuked a local copy of a Git repo.
  - Ran "phd debug fetch <phid>" as root (or any other user with no natural SSH
keys). Verified initial clone worked (since it passes credentials to the command
correctly).
  - Killed daemon, re-ran, verified "fetch" failed (no credentials passed).
  - Applied this patch.
  - Re-ran "phd debug fetch <phid>", verified it passed credentials and
succeeded.
  - Did all these steps for a Mercurial repo.

Reviewers: btrahan, jungejason

Reviewed By: btrahan

CC: aran, btrahan

Maniphest Tasks: T686

Differential Revision: 1236
2011-12-20 08:28:38 -08:00
Edward Speyer
6737ae4828 Run a commit discovery daemon only once
Summary:
Discover commits then return; useful when initializing new repositories
in unit tests.

By which I mean "when initializing a new repository in my unit test that
I'm working on".

Test Plan: Using this in a PhabricatorTestCase.

Reviewers: epriestley, aran

Reviewed By: epriestley

CC: aran, edward, epriestley

Differential Revision: 948
2011-10-25 14:25:40 -07:00
epriestley
07f4772d0b Make all parsers use credentials
Summary:
We need to issue all commands as $repository->junk() so we can pick up
credentials. Some of this stuff predates that change landing.

(I removed the "https" vs "svn+ssh" fallback code since it's specific to
Facebook, affected a tiny number of commits, is basically an SVN bug with UTF-8
handling and HTTP support, and doesn't make sense in the general case. The user
has the tools they need to force it via "reparse.php" if it's really an issue.)

Test Plan: Created new authenticated-remote mercurial and git repositories and
pulled/discovered them with credentials.

Reviewers: Makinde, jungejason, nh, tuomaspelkonen, aran

Reviewed By: Makinde

CC: aran, Makinde

Differential Revision: 970
2011-09-28 11:01:47 -07:00
epriestley
1c1f749eba Add an "arcanist.projectinfo" Conduit call
Summary:
We currently rely on "remote_hooks_enabled" in .arcconfig to determine whether
commands like "arc amend" and "arc merge" should imply "arc mark-committed".

However, this is a historical artifact that is now bad for a bunch of reasons:

  - The option name is confusing, it really means 'repository is tracked'.
  - The option is hard to discover and generally sucks.
  - We can empirically determine the right answer since we now know if a project
is in a tracked repository.

Add a call which arcanist can make on these workflows to figure out if it is
interacting with a project in a tracked repository or not.

Also added an "isTracked()" convenience method to reduce the number of magic
strings all over the place.

Test Plan: Ran "arcanist.projectinfo" for nonexistent, untracked and tracked
projects.

Reviewers: Makinde, jungejason, nh, tuomaspelkonen, aran

Reviewed By: Makinde

CC: aran, epriestley, Makinde

Differential Revision: 945
2011-09-21 14:19:14 -07:00
epriestley
93b3bc8e89 Add a Mercurial message parser
Summary:
See D943, this is the second parse stage. This will mark Differential revisions
as "Committed" among other things.

Almost all the logic here is shared between VCSes so the implementation itself
is straightforward.

Test Plan: Parsed all messages for the official Mercurial repository.

Reviewers: Makinde, jungejason, nh, tuomaspelkonen, aran

Reviewed By: Makinde

CC: aran, Makinde

Differential Revision: 944
2011-09-16 11:09:39 -07:00
epriestley
e0b86cc81b Add a Mercurial commit discovery daemon
Summary:
Repository import has three major steps:

  - Commit discovery (serial)
  - Message parsing (parallel, mostly VCS independent)
  - Change parsing (parallel, highly VCS dependent)

This implements commit discovery for Mercurial, similar to git's parsing:

  - List the heads of all the branches.
  - If we haven't already discovered them, follow them back to their roots (or
the first commit we have discovered).
  - Import all the newly discovered commits, oldest first.

This is a little complicated but it ensures we discover commits in depth order,
so the discovery process is robust against interruption/failure. If we just
inserted commits as we went, we might read the tip, insert it, and then crash.
When we ran again, we'd think we had already discovered commits older than HEAD.

This also allows later stages to rely on being able to find Phabricator commit
IDs which correspond to parent commits.

NOTE: This importer is fairly slow because "hg" has a large startup time
(compare "hg --version" to "git --version" and "svn --version"; on my machine,
hg has 60ms of overhead for any command) and we need to run many commands (see
the whole "hg id" mess). You can expect something like 10,000 per hour, which
means you may need to run overnight to discover a large repository (IIRC, the
svn/git discovery processes are both about an order of magnitude faster). We
could improve this with batching, but I want to keep it as simple as possible
for now.

Test Plan: Discovered all the commits in the main Mercurial repository,
http://selenic.com/repo/hg.

Reviewers: Makinde, jungejason, nh, tuomaspelkonen, aran

Reviewed By: Makinde

CC: aran, Makinde

Differential Revision: 943
2011-09-16 11:08:52 -07:00
epriestley
4da43b31a3 Add Mercurial repository configuration and local pull support
Summary: No actual parsing/import yet, but now you can define and pull Mercurial
repositories. I merged most of the local pull code so we can share it between
hg/git.

Test Plan:
  - Created a new Mercurial repository to track Codeigniter off Bitbucket
  - Edited / saved / etc.
  - Launched the mercurial pull daemon, it pulled the repo. Killed and
relaunched, it updated the repo.
  - Launched the git fetch deamon, it still works correctly.

Reviewers: Makinde, aran, jungejason, tuomaspelkonen

Reviewed By: Makinde

CC: aran, Makinde

Differential Revision: 793
2011-09-14 07:28:22 -07:00
Abdul Qabiz
6355b291ed - Added getRemoteCommandFuture(..) and getLocalCommand Future(..) methods to PhabricatorRepository
- Removed irrelevant csprintf(..)
  - Updated code to use $repository->getRemoteURI()
  - Updated code to use getRemoteCommandFuture(..) in Diffusion code
  - Updated code to use $repository->getRemoteURI()
2011-09-09 01:16:48 +05:30
epriestley
6cae153569 Allow CommitTask daemon to recover from deleted repositories
Summary: If a user partially discovers a repository and then deletes it, the
timeline will have events from the old repository which this daemon won't be
able to parse.

Test Plan: @ajtrichards, can you apply this locally and restart your daemons
(##phd stop##, then relaunch them) and let me know if it fixes the issue?

Reviewers: ajtrichards, jungejason, tuomaspelkonen, aran

Reviewed By: ajtrichards

CC: aran, epriestley, ajtrichards

Differential Revision: 845
2011-08-22 15:41:27 -07:00
Svemir Brkic
e4093e8013 SVN error message may also be "File not found" 2011-08-22 17:34:37 -04:00
epriestley
ed5c46681d Allow SVN repositories to import subdirectories instead of the entire repository
Summary:
See T325. While this is a touch hacky it ends up being fairly clean, and we can
now do initial imports much more quickly and this actually cleaned up some of
the code. I also made the repository edit interface a little less foreboding.

@tuomaspelkonen, did you get anywhere with that bug you were chasing down a
couple days ago? We can hold this if it throws a wrench into stuff you're
working on.

Test Plan:
  - Imported a subdirectory of a midsized SVN project (jQuery UI).
  - Commit discovery for ~3500/4500 commits took just a few seconds.
  - Commit discovery correctly ignored commits which didn't affect this
directory.
  - Commit discovery correctly stopped at commit 13.
  - Browse interface shows an incomplete listing, but that's fine, and
everything is otherwise functionally correct. We can add a note or something
later ("this is a view of commits affecting a subdirectory, some paths aren't
available"), but this behavior probably won't be too startling to users.
  - Edited Git and SVN repositories to test form logic.

Reviewed By: jungejason
Reviewers: tuomaspelkonen, jungejason, aran, Girish
Commenters: tuomaspelkonen
CC: jcleveley, aran, jungejason, tuomaspelkonen
Differential Revision: 696
2011-07-20 10:56:02 -07:00
jungejason
c728e4f7da Open database connection with 'w' instead of 'r' for writing
Summary:
there are several places we open an 'r' connection but use it
for writing. Fix them.

Test Plan:
ran parse_one_commit.php against one revision which executes
the code with problem. It used to throw exception. Now it works fine.

Reviewed By: Girish
Reviewers: tuomaspelkonen, Girish
Commenters: aran
CC: aran, Girish
Differential Revision: 213
2011-05-02 13:31:12 -07:00
epriestley
9630123d6a Use pull-frequency to determine pull frequency, not some sort of random
algorithm that backs off to 15m which is crazy.
2011-04-06 19:36:48 -07:00
epriestley
1f88e08761 Move 'phd parse-commit' to a dedicated script; allow message parsing tasks to
be executed in isolation, provide a script to requeue all message reparses,
stop parse-commit from inserting side-effect tasks.
2011-04-02 13:23:22 -07:00
epriestley
afe0079819 Improve awkward Diffusion query plans. 2011-03-20 17:46:02 -07:00
epriestley
0d292abbf0 Prevent daemons from spinning when they hit duplicate key exceptions. 2011-03-15 14:36:01 -07:00
epriestley
14a172d707 Make GitDiscovery terminate, maybe? 2011-03-15 14:25:47 -07:00
epriestley
383b3d740c Improve some diffusioney things.
Summary:

Test Plan:

Reviewers:

CC:
2011-03-12 22:51:40 -08:00