Summary:
Ref T13393. While doing a shard migration in the Phacility cluster, we'd like to stop writes to the migrating repository. It's safe to continue serving reads.
Add a simple maintenance mode for making repositories completely read-only during maintenance.
Test Plan: Put a repository into read-only mode, tried to write via HTTP + SSH. Viewed web UI. Took it back out of maintenance mode.
Maniphest Tasks: T13393
Differential Revision: https://secure.phabricator.com/D20748
Summary:
Ref T13369. See that task for discussion.
When the discovery daemon finds more than 64 commits to import, demote the worker queue priority of the resulting tasks.
Test Plan:
- Pushed one commit, ran `bin/repository discover --verbose --trace ...`, saw commit import with "at normal priority" message and priority 2500 ("PRIORITY_COMMIT").
- Pushed 3 commits, set threshold to 3, ran `bin/repository discover ...`, saw commist import with "at lower priority" message and priority 4000 ("PRIORITY_IMPORT").
Maniphest Tasks: T13369
Differential Revision: https://secure.phabricator.com/D20712
Summary:
Depends on D20464. Ref T13277. Broadly:
- Move all the "should publish X" and "why aren't we publishing X" stuff to a separate class (`PhabricatorRepositoryPublisher`).
- Rename things to be more consistent with modern terminology ("Publish", "Permanent Refs").
Test Plan:
This could use some trial-by-fire on `secure`, but:
- Grepped for all symbols.
- Viewed various commits.
- Reparsed commits.
- Here's a commit with an explanation:
{F6394569}
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13277
Differential Revision: https://secure.phabricator.com/D20465
Summary:
Ref T13280. In D20421, I changed our observe strategy to `git fetch <uri>` in all cases.
This doesn't work in an ancient, non-bare repository if `master` is checked out and `master` is also fetch: `git` refuses to overwrite the local ref unless we pass `--update-head-ok`. Pass this flag.
Also, remove some code which examines branches and tags in a special way for non-bare working copies. The old `git fetch <origin>` code without explicit revsets meant that `refs/remotes/orgin/heads/xyz` got updated instead of `refs/heads/xyz`. We now update our local refs in all cases (bare and non-bare) so we can throw away this special casing.
Test Plan:
- Replaced a modern bare working copy with a non-bare working copy by explicitly using `git clone` without `--bare`.
- Ran `bin/repository update`, hit the `--update-head-ok` error. Applied the patch, got a clean update.
- Used the "repository.branchquery" API method...
- ...with "contains" to trigger the "git branch" case. Got identical results after removing the special casing.
- ...without "contains" to trigger the "low level ref" case. Got identical results after removing the special casing.
- Grepped for `isWorkingCopyBare()`. The only remaining callsites deal with hook paths, and genuinely need to be specialized.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13280
Differential Revision: https://secure.phabricator.com/D20450
Summary:
Ref T13279. Charting changes alter how the "line-chart" behavior works, but the "Burnup Chart" still relies on the old behavior.
Although I'm intending to remove "Maniphest > Reports" once Facts is a minimally sufficient replacement, copy this behavior to keep it working until we're ready to pull the trigger.
Also fix a leftover typo from D20435.
Test Plan: Viewed a legacy Maniphest burnup rate report.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13279
Differential Revision: https://secure.phabricator.com/D20449
Summary:
Depends on D20434. Fixes T5963. Broadly, the issue here is that when:
- You create a new, empty repository.
- Then, you work on some branch other than `master`, without ever creating `master`.
...you get a warning on `git clone`:
> warning: remote HEAD refers to nonexistent ref, unable to checkout
To fix this, point the symbolic-ref HEAD at `refs/heads/<default-branch>` after installing commit hooks.
This fixes the warning, and also means that `git clone` will check out the repository default branch by default, which is nice.
There are a few caveats about this behavior (see T5963 for discussion) but nothing too substantial.
The only real issue is that Git prevents deletion of the default branch without a config setting. Just set that settting.
Test Plan:
See T5963.
In a repository, set `HEAD` to point somewhere invalid. Ran `bin/repository update ...`. Saw HEAD pointed back at the repository default branch.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T5963
Differential Revision: https://secure.phabricator.com/D20435
Summary:
Depends on D20427. Ref T13277. As an optimization, when we discover that a commit which was previously only on a non-permanent ref ("tmp-epriestley-123") is now reachable from a permanent ref ("master"), we currently queue only a new "message" parse step.
This is an optimization because these commits previously got the full treatment (feed, publish, audit, etc) as soon as they were discovered. Now, those steps only happen once a commit is reachable from a permanent ref, so we need to run everything.
Test Plan:
- Pushed local "tmp-123" branch to remote tag "tmp-123".
- Updated repository with "bin/repository update", saw commit import as a "not on any permanent ref" commit, with no herald/audit/etc.
- Merged "tmp-123" tag into "master".
- Pushed new "master".
- Updated repository with "bin/repository refs ... --trace --verbose", saw commit detected as now reachable from a permament ref.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13277
Differential Revision: https://secure.phabricator.com/D20428
Summary:
Depends on D20421. Ref T13277. I'd generally like to move away from "Track Only".
Some of the use cases for "Track Only" (or adjacent to "Track Only") are better resolved with "Fetch Rules" -- basically, rules to fetch only some subset of refs from the observed remote.
Add configurable "Fetch Rules" for Git repositories. For example, if you only want to fetch `master`, you can now speify:
```
refs/heads/master
```
If you only want to fetch branches and tags, you can use:
```
refs/heads/*
refs/tags/*
```
In theory, this is slightly less powerful in the general case than "Track Only", but gives us better behavior in some cases (e.g., when the remote has 50K random temporary branches). In practice, I think this and a better "Autoclose Only" will let us move away from "Track Only", get default behavior which is better aligned with what users actually expect, and dodge all the "track tags/refs" questions.
Test Plan: Configured repositories with "Fetch Refs" rules, used `bin/repository pull --verbose --trace ...` to run pulls, saw expected pull/fetch behavior.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13277
Differential Revision: https://secure.phabricator.com/D20422
Summary:
Depends on D20420. Ref T13277. We currently spend substantial effort trying to detect and correct the URL of the "origin" remote in Git repositories.
I believe this is unnecessary, and we can always `git fetch <url> ...` to get the desired result instead of `git muck-with-origin + git fetch origin ...`. We already do this in the more recent parts of the codebase (e.g., intracluster sync) and it works correctly in every case I'm aware of.
Test Plan:
- Grepped for `origin`, ` origin `.
- Ran `bin/repository update ...` to fetch a mirrored repository.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13277
Differential Revision: https://secure.phabricator.com/D20421
Summary: Depends on D19429. Depends on D19423. Ref T12164. This creates new columns `authorIdentityPHID` and `committerIdentityPHID` on commit objects and starts populating them. Also adds the ability to explicitly set an Identity's assignee to "unassigned()" to null out an incorrect auto-assign. Adds more search functionality to identities. Also creates a daemon task for handling users adding new email address and attempts to associate unclaimed identities.
Test Plan: Imported some repos, watched new columns get populated. Added a new email address for a previous commit, saw daemon job run and assign the identity to the new user. Searched for identities in various and sundry ways.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T12164
Differential Revision: https://secure.phabricator.com/D19443
Summary: Depends on D19423. Ref T12164. Adds controllers capable of listing and editing `PhabricatorRepositoryIdentity` objects. Starts creating those objects when commits are parsed.
Test Plan: Reparsed some revisions, observed objects getting created in the database. Altered some `Identity` objects using the controllers and observed effects in the database. No attempts made to validate behavior under "challenging" author/committer strings.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam
Maniphest Tasks: T12164
Differential Revision: https://secure.phabricator.com/D19429
Summary:
Ref T13096. Currently, we do a fair amount of clever digesting and string manipulation to build lock names which are less than 64 characters long while still being reasonably readable.
Instead, do more of this automatically. This will let lock acquisition become simpler and make it more possible to build a useful lock log.
Test Plan: Ran `bin/repository update`, saw a reasonable lock acquire and release.
Maniphest Tasks: T13096
Differential Revision: https://secure.phabricator.com/D19173
Summary:
Fixes T5965.
Fixes two issues:
- Observing an empty repository could write a warning to the log.
- Mirroring an empty repository to a remote could fail.
For observing:
If newly-created with `git init --bare`, `git ls-remote` will
return the empty string. Properly return an empty set of refs, rather
than attempting to parse the single "line" that is produced by
splitting that on newlines:
```
[2018-01-23 18:47:00] ERROR 8: Undefined offset: 1 at [/phab_path/phabricator/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:405]
arcanist(head=master, ref.master=5634f8410176), phabricator(head=master, ref.master=12551a1055ce), phutil(head=master, ref.master=4755785517cf)
#0 PhabricatorRepositoryPullEngine::loadGitRemoteRefs(PhabricatorRepository) called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:343]
#1 PhabricatorRepositoryPullEngine::executeGitUpdate() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:126]
#2 PhabricatorRepositoryPullEngine::pullRepositoryWithLock() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:40]
#3 PhabricatorRepositoryPullEngine::pullRepository() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementUpdateWorkflow.php:59]
...
```
For mirroring:
`git` treats `git push --mirror` specially when a repository is empty. Detect this case by seeing if `git for-each-ref --count 1` does anything. If the repository is empty, just bail.
Test Plan:
- Observed an empty and non-empty repository.
- Mirrored an empty and non-empty repository.
Reviewers: alexmv, amckinley
Reviewed By: alexmv
Subscribers: Korvin, epriestley
Maniphest Tasks: T5965
Differential Revision: https://secure.phabricator.com/D18920
Summary: Noticed a couple of typos in the docs, and then things got out of hand.
Test Plan:
- Stared at the words until my eyes watered and the letters began to swim on the screen.
- Consulted a dictionary.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18693
Summary:
Ref T11823. This is the meaty part of the change, and updates `RefEngine` to use separate RefCursor (for names) and RefPosition (for actual commit positions) tables.
I'll hold this whole series until after the release cut so it has some time to bake on `secure` to look for issues. It's also not a huge problem if there are bugs here since these tables are just caches anyway, although they do feed into some other things, and obviously it's never good to have bugs.
Test Plan:
- This logic can be invoked directly with `bin/repository refs <repository> --trace --verbose`.
- Ran that on unchanged repositories, new branches, removed branches, and modified branches. Saw appropriate output and cursor positions.
- Ran on a mercurial repository to test the close/open logic, saw it correct open/closed state of incorrect positions.
- Browed around Diffusion in various repositories.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T11823
Differential Revision: https://secure.phabricator.com/D18614
Summary:
Ref T12961. Fixes T4416. Currently, for observed Mercurial repositories, we build a working copy with `pull -u` (for "update").
This should be unnecessary, and we don't do it for hosted Mercurial repositories. We also stopped doing it years ago for Git repositories. We also don't clone Mercurial repositories with a working copy.
It's possible something has slipped through the cracks here so I'll hold this until after the release cut, but I believe there are no actual technical blockers here.
Test Plan:
- Observed a public Mercurial repository on Bitbucket.
- Let it import.
- Browsed commits, branches, file content, etc., without any apparent issues.
Reviewers: chad
Reviewed By: chad
Subscribers: cspeckmim
Maniphest Tasks: T12961, T4416
Differential Revision: https://secure.phabricator.com/D18390
Summary:
Fixes T12942.
- Adds binary version and path information to {nav Config > Version Information}.
- Replaces old code all over the place with new consolidated code.
Test Plan:
{F5073531}
Also faked some cases of missing binaries, bad versions, etc.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12942
Differential Revision: https://secure.phabricator.com/D18306
Summary: Fixes T12945.
Test Plan:
Mostly faked this, got a censored error:
```
$ ./bin/repository update R38
[2017-07-31 19:40:13] EXCEPTION: (Exception) Working copy at "/Users/epriestley/dev/core/repo/local/38/" has a mismatched origin URI, "https://********@example.com/". The expected origin URI is "https://github.com/phacility/libphutil.git". Fix your configuration, or set the remote URI correctly. To avoid breaking anything, Phabricator will not automatically fix this. at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryEngine.php:186]
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12945
Differential Revision: https://secure.phabricator.com/D18304
Summary: Fixes T12416. See that task for discussion. Slightly older versions of `git` do not appear to support use of `--` to separate flags and arguments.
Test Plan:
- Ran `bin/repository update PHABX`.
- In T12416, had a user with Git 2.1.4 confirm that `git ls-remote X` worked while `git ls-remote -- X` failed.
- Read `git help ls-remote` to look for any kind of suspicious `--destroy-the-world` flags, didn't see any that made me uneasy.
Reviewers: chad, avivey
Reviewed By: avivey
Maniphest Tasks: T12416
Differential Revision: https://secure.phabricator.com/D17508
Summary:
Ref T12392. The logic currently goes like this:
- Try a fetch.
- If that fails, try repairing the origin URI.
- Then try again.
This is pretty complicated, and we can use this simpler logic instead:
- Set the origin URI to the right value.
- Try a fetch.
Setting the origin URI is very fast. This can normally only get us in any trouble in very obscure situations which haven't occurred for many years:
- Pretty much all of this is already covered by `verifyGitOrigin()`, which we run earlier.
- Origins could be configured to have multiple URIs for some reason, but shouldn't be.
- Years ago, you could configure Phabricator to point at a local repository it didn't own and that could conceivably have a different "origin" that you might not want us to delete. If you did this, the daemons have been spewing errors for 3-4 years without you fixing it. The cost of fixing the remote URI is very small even if anyone is affected by this (just set it back to the old value) and there's zero reason to do this and the scenario is ridiculous.
Test Plan: Ran `bin/repository update PHABX --trace --verbose`, saw fetches go through cleanly after URI adjustment.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12392
Differential Revision: https://secure.phabricator.com/D17498
Summary:
Ref T12296. Ref T12392. Currently, when we're observing a remote repository, we periodically run `git fetch ...`.
Instead, periodically run `git ls-remote` (to list refs in the remote) and `git for-each-ref` (to list local refs) and only continue if the two lists are different.
The motivations for this are:
- In T12296, it appears that doing this is //faster// than doing a no-op `git fetch`. This effect seems to reproduce locally in a clean environment (900ms for `ls-remote` + 100ms for `for-each-ref` vs about 1.4s for `fetch`). I don't have any explanation for why this is, but there it is. This isn't a huge change, although the time we're saving does appear to mostly be local CPU time, which is good for us.
- Because we control all writes, we could cache `git for-each-ref` in the future and do fewer disk operations. This doesn't necessarily seem too valuable, though.
- This allows us to tell if a fetch will do anything or not, and make better decisions around clustering (in particular, simplify how observed repository versioning works). With `git fetch`, we can't easily distinguish between "fetch, but nothing changed" and "legitimate fetch".
If a repository updates very regularly we end up doing slightly more work this way (that is, if `ls-remote` always comes back with changes, we do a little extra work), but this is normally very rare.
This might not get non-bare repositories quite right in some cases (i.e., incorrectly detect them as changed when they are unchanged) but we haven't created non-bare repositories for many years.
Test Plan: Ran `bin/repository update --trace --verbose PHABX`, saw sensible construction of local and remote maps and accurate detection of whether a fetch would do anything or not.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12392, T12296
Differential Revision: https://secure.phabricator.com/D17497
Summary:
Ref T12296. This cache is used to cache Git ref heads (branches, tags, etc). Reasonable repositories may have more than 2048 of these.
When we miss the cache, we need to single-get refs to check them, which is relatively expensive.
Increasing the size of the cache to 65535 should only require about 7.5MB of RAM.
Additionally, fill only as much of the cache as actually fits. The FIFO nature of the cache can get us into trouble otherwise.
If we insert "A, B, C, D" and then lookup A, B, C, D, but the cache has maximum size 3, we get this:
- Insert A, B, C, D: cache is now "B, C, D".
- Lookup A: miss, single get, insert, purge, cache is now "C, D, A".
- Lookup B: miss, singel get, insert, purge, cache is now "D, A, B".
Test Plan:
- Reduced cache size to 5, observed reasonable behavior on the `array_slice()` locally with `bin/repository update` + `var_dump()`.
- Used this script to estimate the size of 65535 cache entries as 7.5MB:
```
epriestley@orbital ~ $ cat size.php
<?php
$cache = array();
$mem_start = memory_get_usage();
for ($ii = 0; $ii < 65535; $ii++) {
$cache[sha1($ii)] = true;
}
echo number_format(memory_get_usage() - $mem_start)." bytes\n";
epriestley@orbital ~ $ php -f size.php
7,602,176 bytes
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12296
Differential Revision: https://secure.phabricator.com/D17409
Summary:
Fixes T12062. Like the commits from the year 3500, you can artificially build commits with no date information.
We could explicitly store these as `null` to fully respect the underlying datastore. However, I think it's very unlikely that these commits are intentional/meaningful or that this is valuable.
Additionally, "git show" interprets these commits as "Jan 1, 1970". Just store a `0` to mimic its behavior.
Test Plan:
- Following the process in T11537#192019, artificially created a commit with //no// date information (I deleted all date information from the message).
- Used `git show` / `git log --format ...` to inspect it: "Jan 1, 1970" on `git show`, no information at all on `%aD`, `%aT`, etc.
- Pushed it.
- Saw exception for trying to insert empty string into epoch colum from `bin/repository update`.
- Applied patch.
- Got a clean import.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12062
Differential Revision: https://secure.phabricator.com/D17136
Summary:
Fixes T11677. This makes two minor adjustments to the repository import daemons:
- The first step ("Message") now queues at a slightly-lower-than-default (for already-imported repositories) or very-low (for newly importing repositories) priority level.
- The other steps now queue at "default" priority level. This is actually what they already did, but without this change their behavior would be to inherit the priority level of their parents.
This has two effects:
- When adding new repositories to an existing install, they shouldn't block other things from happening anymore.
- The daemons will tend to start one commit and run through all of its steps before starting another commit. This makes progress through the queue more even and predictable.
- Before, they did ALL the message tasks, then ALL the change tasks, etc. This works fine but is confusing/uneven/less-predictable because each type of task takes a different amount of time.
Test Plan:
- Added a new repository.
- Saw all of its "message" steps queue at priority 4000.
- Saw followups queue at priority 2000.
- Saw progress generally "finish what you started" -- go through the queue one commit at a time, instead of one type of task at a time.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11677
Differential Revision: https://secure.phabricator.com/D16585
Summary:
Ref T11665. Currently, when a repository hits an error, we retry it after 15s. This is correct if the error was temporary/transient/config-related (e.g., bad network or administrator setting up credentials) but not so great if the error is long-lasting (completely bad authentication, invalid URI, etc), as it can pile up to a meaningful amount of unnecessary load over time.
Instead, record how many times in a row we've hit an error and adjust backoff behavior: first error is 15s, then 30s, 45s, etc.
Additionally, when computing the backoff for an empty repository, use the repository creation time as though it was the most recent commit. This is a good proxy which gives us reasonable backoff behavior.
This required removing the `CODE_WORKING` messages, since they would have reset the error count. We could restore them (as a different type of message), but I think they aren't particularly useful since cloning usually doesn't take too long and there's more status information avilable now than there was when this stuff was written.
Test Plan:
- Ran `bin/phd debug pull`.
- Saw sensible, increasing backoffs selected for repositories with errors.
- Saw sensible backoffs selected for empty repositories.
Reviewers: chad
Maniphest Tasks: T11665
Differential Revision: https://secure.phabricator.com/D16575
Summary:
Fixes T11537. See that task for discussion.
Although we could accommodate these faithfully, it requires a huge migration and affects one repository on one install which was written with buggy tools.
At least for now, just replace out-of-32-bit-range epoch values with the current time, which is often somewhat close to the real value.
Test Plan:
- Following the instructions in T11537, created commits in 40,000 AD.
- Tried to import them, reproducing the "epoch" database issue.
- Applied the patch.
- Successfully imported future-commits, with some liberties around commit dates. Note that author date (not stored in an `epoch` column) is still shown faithfully:
{F1789302}
Reviewers: chad, avivey
Reviewed By: avivey
Maniphest Tasks: T11537
Differential Revision: https://secure.phabricator.com/D16456
Summary:
Fixes T11269. The basic issue is that `git log` in an empty repository exits with an error message.
Prior to recent Git (2.6?), this message reads:
> fatal: bad default revision 'HEAD'
This message was somewhat recently changed by <ce11360467>. After that, it reads:
> fatal: your current branch 'master' does not have any commits yet
This change isn't //technically// a //complete// fix because you could still hit this issue like this:
- Create an empty repository.
- Push some stuff to `master`.
- Delete `master`.
However, this is very rare and even in this case the repository will fix itself once you push something again. We can try to fix that if any users ever actually hit it.
Test Plan:
- Created a new empty Git repository.
- Ran `bin/repository update Rxx`.
- Before patch: "git log" error because of the empty repository.
- After patch: clean update.
- Also ran `repository update` on a non-empty repository.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11269
Differential Revision: https://secure.phabricator.com/D16234
Summary:
Ref T11208. See that task for a more detailed description of revprops.
This allows revprop changes in a hosted Subversion repository if the repository has the "allow dangerous changes" flag set.
In the future, we could expand this into real Herald support, but the only use case we have for now is letting `svnsync` work.
Test Plan:
Edited revprops with `svn propset --revprop -r 2 propkey propvalue repositoryuri`:
- Tried before patch, got a "configure a commit hook" error.
- Tried after patch, got a "dangerous change" error.
- Allowed dangerous changes.
- Did a revprop edit.
- Prevented dangerous changes.
- Got an error again.
- Made a normal commit to an SVN repository.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11208
Differential Revision: https://secure.phabricator.com/D16174
Summary:
Fixes T11180. In Git, it's possible to tag a tag (????). When you do, we try to log the tag-object, which automatically resolves to the commit and fails.
Just skip these. If "A" points at "B" which points at "C", it's fine to ignore "A" and "B" since we'll get the same stuff when we process "C".
Test Plan:
- Tagged a tag.
- Pushed it.
- Discovered it.
- Before patch: got exception similar to the one in T11180.
- After patch: got tag-tag skipped. Also got slightly better error messages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11180
Differential Revision: https://secure.phabricator.com/D16149
Summary:
Ref T9028. This allows us to detect when commits are unreachable:
- When a ref (tag, branch, etc) is moved or deleted, store the old thing it pointed at in a list.
- After discovery, go through the list and check if all the stuff on it is still reachable.
- If something isn't, try to follow its ancestors back until we find something that is reachable.
- Then, mark everything we found as unreachable.
- Finally, rebuild the repository summary table to correct the commit count.
Test Plan:
- Deleted a ref, ran `pull` + `refs`, saw oldref in database.
- Ran `discover`, saw it process the oldref, mark the unreachable commit, and update the summary table.
- Visited commit page, saw it properly marked.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9028
Differential Revision: https://secure.phabricator.com/D16133
Summary:
Ref T9028. This improves the daemon behavior for unreachable commits. There is still no way for commits to become marked unreachable on their own.
- When a daemon encounters an unreachable commit, fail permanently.
- When we revive a commit, queue new daemons to process it (since some of the daemons might have failed permanently the first time around).
- Before doing a step on a commit, check if the step has already been done and skip it if it has. This can't happen normally, but will soon be possible if a commit is repeatedly deleted and revived very quickly.
- Steps queued with `bin/repository reparse ...` still execute normally.
Test Plan:
- Used `bin/repository reparse` to run every step, verified they all mark the commit with the proper flag.
- Faked the `reparse` exception in the "skip step" code, used `repository reparse` to skip every step.
- Marked a commit as unreachable, ran `discover`, saw daemons queue for it.
- Ran daemons with `bin/worker execute --id ...`, saw them all skip + queue the next step.
- Marked a commit as unreachable, ran `bin/repository reparse` on it, got permanent failures immediately for each step.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9028
Differential Revision: https://secure.phabricator.com/D16131
Summary:
Ref T9028. This is the easy part of dealing with deleted commits:
- Add a flag for unreachable commits (nothing sets this flag yet).
- Ignore unreachable commits when querying for known commits during discovery, so we pretend they do not exist.
- When recording a commit, try just reviving an existing unreachable commit first. If that works, bail out.
Test Plan:
- Artificially marked a commit as unreachable with raw SQL.
- Verified it said "deleted: unreachable" in the UI.
- Ran `repository discover --trace --verbose`.
- Saw the discovery process ignore the commit when filling the cache.
- Saw the discovery process revive the commit instead of trying to record it again.
- Web UI now shows the commit as normal.
- Running `repository discover` again doesn't make any further changes.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9028
Differential Revision: https://secure.phabricator.com/D16130
Summary:
Ref T9028. Fixes T6878. Currently, we only fetch and discover branches. This is fine 99% of the time but sometimes commits are pushed to just a tag, e.g.:
```
git checkout <some hash>
nano file.c
git commit -am '...'
git tag wild-wild-west
git push origin wild-wild-west
```
Through a similar process, commits can also be pushed to some arbitrary named ref (we do this for staging areas).
With the current rules, we don't fetch tag refs and won't discover these commits.
Change the rules so:
- we fetch all refs; and
- we discover ancestors of all refs.
Autoclose rules for tags and arbitrary refs are just hard-coded for now. We might make these more flexible in the future, or we might do forks instead, or maybe we'll have to do both.
Test Plan:
Pushed a commit to a tag ONLY (`vegetable1`).
<cf508b8de6>
On `master`, prior to the change:
- Used `update` + `refs` + `discover`.
- Verified tag was not fetched with `git for-each-ref` in local working copy and the web UI.
- Verified commit was not discovered using the web UI.
With this patch applied:
- Used `update`, saw a `refs/*` fetch instead of a `refs/heads/*` fetch.
- Used `git for-each-ref` to verify that tag fetched.
- Used `repository refs`.
- Saw new tag appear in the tags list in the web UI.
- Saw new refcursor appear in refcursor table.
- Used `repository discover --verbose` and examine refs for sanity.
- Saw commit row appear in database.
- Saw commit skeleton appear in web UI.
- Ran `bin/phd debug task`.
- Saw commit fully parse.
{F1689319}
Reviewers: chad
Reviewed By: chad
Subscribers: avivey
Maniphest Tasks: T6878, T9028
Differential Revision: https://secure.phabricator.com/D16129
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
Summary:
Ref T4292. Currently, we hold one big lock around the whole `bin/repository update` workflow.
When running multiple daemons on different hosts, this lock can end up being contentious. In particular, we'll hold it during `git fetch` on every host globally, even though it's only useful to hold it locally per-device (that is, it's fine/good/expected if `repo001` and `repo002` happen to be fetching from a repository they are observing at the same time).
Instead, split it into two locks:
- One lock is scoped to the current device, and held during pull (usually `git fetch`). This just keeps multiple daemons accidentally running on the same host from making a mess when trying to initialize or update a working copy.
- One lock is scoped globally, and held during discovery. This makes sure daemons on different hosts don't step on each other when updating the database.
If we fail to acquire either lock, assume some other process is legitimately doing the work and bail more quietly instead of fataling. In approximately 100% of cases where users have hit this lock contention, that was the case: some other daemon was running somewhere doing the work and the error didn't actually represent an issue.
If there's an actual problem, we still raise a diagnostically useful message if you run `bin/repository update` manually, so there are still tools to figure out that something is hung or whatever.
Test Plan:
- Ran `bin/repository update`, `pull`, `discover`.
- Added `sleep(5)`, forced processes to contend, got lock exceptions and graceful exit with diagnostic message.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15903
Summary:
Ref T10923. Fixes T9554.
When hosting a repository, we currently have a heuristic that tries to detect when you're doing an initial import: if you push more than 7 commits to an empty repository, it counts as an import and we disable mail/feed/etc.
Do something similar for observed repositories: if the repository is empty and we discover more than 7 commits, switch to import mode until we catch up.
This should align behavior with user expectation more often when juggling hosted vs imported repositories.
Test Plan:
- Created a new hosted repository.
- Activated it and allowed it to fully import.
- Added an "Observe URI".
- Saw it automatically drop into "Importing" mode until the import completed.
- Swapped it back to hosted mode.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9554, T10923
Differential Revision: https://secure.phabricator.com/D15877
Summary:
Ref T10748. This migrates and swaps mirroring to `PhabricatorRepositoryURI`, obsoleting `PhabricatorRepositoryMirror`.
This prevents you from editing, adding or disabling mirrors unless you know a secret URI (until the UI cuts over fully), but existing mirroring is not affected.
Test Plan:
- Added a mirroring URI to an old repository.
- Verified it worked with `bin/repository mirror`.
- Migrated forward.
- Verified it still worked with `bin/repository mirror`.
- Wow, mirroring: https://github.com/epriestley/locktopia-mirror
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10748
Differential Revision: https://secure.phabricator.com/D15841
Summary:
Ref T4039. Long ago these were more freely editable and there were some security concerns around creating a repository, then setting its local path to point somewhere it shouldn't.
Local paths are no longer editable so there's no real reason we need to provide a uniqueness guarantee anymore, but you could still make a mistake with `bin/repository move-paths` by accident, and it's a little cleaner to pull them out into their own column with a key.
(We still don't -- and, largely can't -- guarantee that two paths aren't //equivalent// since one might be symlinked to the other, or symlinked only on some hosts, or whatever, but the primary value here is as a sanity check that you aren't goofing things up and pointing a bunch of repositories at the same working copy by mistake.)
Test Plan:
- Ran migrations.
- Grepped for `local-path`.
- Listed and moved paths with `bin/repository`.
- Created a new repository, verified its local path populated correctly.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4039
Differential Revision: https://secure.phabricator.com/D15837
Summary: Ref T10860. This doesn't change anything, it just separates all this stuff out of `PhabricatorRepository` since I'm planning to add a bit more state to it and it's already pretty big and fairly separable.
Test Plan: Pulled, pushed, browsed Diffusion.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15790
Summary:
Ref T4292. Before we write or read a hosted, clustered Git repository over SSH, check if another version of the repository exists on another node that is more up-to-date.
If such a version does exist, fetch that version first. This allows reads and writes of any node to always act on the most up-to-date code.
Test Plan: Faked my way through this and got a fetch via `bin/repository update`; this is difficult to test locally and needs more work before we can put it in production.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15757
Summary:
Ref T4245. Two effects:
- First, let hooks work for future repositories without callsigns.
- Second, provide a better error when users push directly to hosted repositories.
Test Plan: Ran `bin/commit-hook PHID-REPO-xxx`.
Reviewers: chad, avivey
Reviewed By: avivey
Maniphest Tasks: T4245
Differential Revision: https://secure.phabricator.com/D15293
Summary: Fixes T9701. I don't want to try to autofix this because destroying the directory could destroy important files, but we can improve the error message.
Test Plan: Faked a failure, ran `repository update X`, got a more tailored error message.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9701
Differential Revision: https://secure.phabricator.com/D14971
Summary: Ref T4245. These are all descriptive or UI-facing.
Test Plan:
- Ran `bin/repository pull ...` with various identifiers.
- Ran `bin/repository mirror ...` with various identifiers.
- Ran `bin/repository discover ...` with various identifiers.
- Ran `bin/phd debug pull X Y --not Z` with various identifiers.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4245
Differential Revision: https://secure.phabricator.com/D14926
Summary: Fixes T9966. In this unusual, difficult-to-reach case, we throw `Exception` (which has no censoring) instead of `CommandException` (which has censoring). Throw `CommandException` instead.
Test Plan:
- Hacked up a bunch of stuff in order to hit this: disabled origin validation, origin correction, and pointed repository at a bad domain.
- Verified message is now censored correctly.
{F1022217}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9966
Differential Revision: https://secure.phabricator.com/D14745
Summary: As described in T7959, it looks like Diffusion does not provide Mercurial the required HTTP credentials when pulling from an external repository.
Test Plan: Add an external Mercurial repository to Diffusion, that requires HTTP authentication. A private BitBucket repository for example.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley
Projects: #mercurial, #diffusion
Differential Revision: https://secure.phabricator.com/D14092
Summary: Return `$this` from setter methods for consistency. I started writing a [[https://secure.phabricator.com/differential/diff/32506/ | linter rule]] to detect this, but I don't think it is trivial to do this properly.
Test Plan: Eyeball it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13422
Summary: All classes should extend from some other class. See D13275 for some explanation.
Test Plan: `arc unit`
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D13283