1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-05 13:16:14 +01:00
Commit graph

136 commits

Author SHA1 Message Date
epriestley
de70a4ff26 Improve consistency in use of "via", "objectPHID", and "containerPHID" parameters in repository workers
Summary:
Ref T13591. Improve how parameters are passed between commit worker tasks:

  - Always pass "via", to track where tasks came from.
  - Always provide "objectPHID" (with the commit PHID).
  - Always provide "containerPHID" (with the repository PHID).

Test Plan:
  - Pushed a new commit.
  - Ran `bin/repository pull` + `bin/repository discover`, saw commit with all parameters.
  - Ran `bin/worker execute ...`, saw a Change worker and then a Publish worker with appropriate parameters.
  - Ran `bin/repository reparse ... --background`, saw workers queue with appropriate parameters.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21532
2021-02-02 13:40:09 -08:00
epriestley
ed86c42b26 Improve performance of repository discovery in repositories with >65K refs
Summary:
Ref T13593. The commit cache in this Engine has a maximum fixed size (currently 65,535 entries).

If we execute discovery in a repository with more refs than this (e.g., 180K), we get fast lookups for the first 65,535 refs and slow lookups for the remaining refs.

Instead, divide the refs into chunks no larger than the cache size, and perform an explicit cache fill before each chunk is processed.

Test Plan:
  - Created a repository with 1K refs. Set cache size to 256. Ran discovery.
    - Before patch: saw one large cache fill and then ~750 single-gets.
    - After patch: saw four large cache fills.
  - Compared `bin/repository discover ... --verbose` output before and after patch for overall effect; saw no differences.

Maniphest Tasks: T13593

Differential Revision: https://secure.phabricator.com/D21521
2021-01-26 12:27:02 -08:00
epriestley
1da94dcf49 Correct some issues around IMPORTED_PERMANENT in RefEngine
Summary: Ref T13591. Fixes a few issues with the recent updates here discovered in more thorough testing.

Test Plan:
- Stopped the daemons.
- Created a new copy of Phabricator in Diffusion.
- Pulled it with `bin/repository pull ...`.
  - Got 17,278 commits on disk with `git log --all --format=%H`.
- Set permanent refs to "master".
- Discovered it with `bin/repository discover ...`.
  - This took 31.5s and inserted 17,278 tasks.
  - Verified that all tasks have priority 4,000 (PRIORITY_IMPORT).
  - Observed that 16,799 commits have IMPORTED_PERMANENT and 479 commits do not.
    - This matches `git log master --format=%H` exactly.
- Ran `bin/repository refs ...`. Expected no changes and saw no changes.
- Ran `bin/worker execute --active` for a minute or two. It processed all the impermanent changes first (since `bin/worker` is LIFO and these are supposed to process last).
  - Ran `bin/repository refs`. Expected no changes and saw no changes.
  - Marked all refs as permanent.
  - Starting state: 16,009 message tasks, all at priority 4000.
  - Ran `bin/repository refs`, expecting 479 new tasks at priority 4000.
  - Saw count rise to 16,488 as expected.
  - Saw all the new tasks have priority 4000 and all commits now have the IMPORTED_PERMANENT flag.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21518
2021-01-22 19:51:40 -08:00
epriestley
3cb543ef8f Lift logic for queueing commit import tasks into RepositoryEngine
Summary:
Ref T13591. There are currently two pathways to queue an import task for a commit: via repository discovery, or via a ref becoming permanent.

These pathways duplicate some logic and have behavioral differences: one does not set `objectPHID` properly, one does not set the priority correctly.

Unify these pathways, make them both set `objectPHID`, and make them both use the same priority logic.

Test Plan:
  - Discovered refs.
  - See later changes in this series for more complete test cases.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21516
2021-01-22 19:51:39 -08:00
epriestley
6716d4f6ae Separate "shouldPublishRef()" from "isPermanentRef()" and set "IMPORTED_PERMANENT" more narrowly
Summary:
Ref T13591. Currently, the "IMPORTED_PERMANENT" flag (previously "IMPORTED_CLOSEABLE", until D21514) flag is set by using the result of "shouldPublishRef()".

This method returns the wrong value for the flag when there is a repository-level reason not to publish the ref (most commonly, because the repository is currently importing).

Although it's correct that commits should not be published in an importing repository, that's already handled in the "PublishWorker" by testing "shouldPublishCommit()". The "IMPORTED_PERMANENT" flag should only reflect whether a commit is reachable from a permanent ref or not.

  - Move the relevant logic to a new method in Publisher.
  - Fill "IMPORTED_PERMANENT" narrowly from "isPermanentRef()", rather than broadly from "shouldPublishRef()".
  - Deduplicate some logic in "PhabricatorRepositoryRefEngine" which has the same intent as the logic in the Publisher.

Test Plan:
  - Ran discovery on a new repository, saw permanent commits marked as permanent from the beginning.
  - See later changes in this patch series for additional testing.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21515
2021-01-22 19:51:38 -08:00
epriestley
2d0e7c37e1 Rename "IMPORTED_CLOSEABLE" to "IMPORTED_PERMANENT" to clarify the meaning of the flag
Summary:
Ref T13591. This is an old flag with an old name, and there's an import bug because the outdated concept of "closable" is confusing two different behaviors.

This flag should mean only "is this commit reachable from a permanent ref?". Rename it to "IMPORTED_PERMANENT" to make that more clear.

Rename the "Unpublished" query to "Permanent" to make that more clear, as well.

Test Plan:
  - Grepped for all affected symbols.
  - Queried for all commmits, permament commits, and impermanent commits.
  - Ran repository discovery.
  - See also further changes in this change series for more extensive tests.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21514
2021-01-22 19:51:38 -08:00
epriestley
16a14af2bb Correct the behavior of "bin/repository discover --repair"
Summary:
Ref T13591. Since D8781, this flag does not function correctly in Git and Mercurial repositories, since ref discovery pre-fills the cache.

Move the "don't look at the database" behavior the flag enables into the cache lookup. D8781 should have been slightly more aggressive and done this, it was just overlooked.

Test Plan:
  - Ran `bin/repository discover --help` and read the updated help text.
  - Ran `bin/repository discover --repair` in a fully-discovered Git repository.
    - Before: no effect.
    - After: full rediscovery.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21513
2021-01-22 19:51:38 -08:00
epriestley
0e28105ff7 Further correct and disambigutate ref selectors passed to Git on the CLI
Summary:
Ref T13589. In D21510, not every ref selector got touched, and this isn't a valid construction in Git:

```
$ git ls-tree ... -- ''
```

Thus:

  - Disambiguate more (all?) ref selectors.
  - Correct the construction of "git ls-tree" when there is no path.
  - Clean some stuff up: make the construction of some flags and arguments more explicit, get rid of a needless "%C", prefer "%Ls" over acrobatics, etc.

Test Plan: Browsed/updated a local Git repository. (This change is somewhat difficult to test exhaustively, as evidenced by the "ls-tree" issue in D21510.)

Maniphest Tasks: T13589

Differential Revision: https://secure.phabricator.com/D21511
2021-01-20 12:07:14 -08:00
epriestley
6f78e2a91c When a commit is marked "closeable", clear the "published" flag
Summary:
Ref T13552. When a previously discovered commit becomes reachable from a permanent ref, we re-queue workers to update it. However, the commit may already be marked as "published", so the publish worker may do nothing.

It would perhaps be simpler to not mark the commit as published when it isn't reachable from a permanent ref, but this is tricky because the flag is also part of the "imported / all steps" state (see T13580).

Until that can be cleaned up, just clear the flag.

Test Plan:
  - Pushed a commit with "fixes X" to a non-permanent branch.
  - Pushed it to a permanent branch.
  - Before change: task failed to close.
  - After change: task closes properly.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21460
2020-09-15 17:36:42 -07:00
epriestley
7d496f2c6d Collapse repository URI normalization code into Arcanist
Summary: Ref T13546. Companion change to D21372. Move URI normalization code to Arcanist to we can more-often resolve remote URIs correctly.

Test Plan: Grepped for affected symbols.

Maniphest Tasks: T13546

Differential Revision: https://secure.phabricator.com/D21373
2020-06-30 15:54:33 -07:00
epriestley
1e7cc72cd8 Improve performance when marking commits as unreachable after multiple ref deletions
Summary:
See PHI1688. If many refs with a large amount of shared ancestry are deleted from a repository, we can spend much longer than necessary marking their mutual ancestors as unreachable over and over again.

For example, if refs A, B and C all point near the head of an obsolete "develop" branch and have about 1K shared commits reachable from no other refs, deleting all three refs will lead to us performing 3,000 mark-as-unreachable operations (once for each "<ref, commit>" pair).

Instead, we can stop exploring history once we reach an already-unreachable commit.

Test Plan:
  - Destroyed 7 similar refs simultaneously.
  - Ran `bin/repository refs`, saw 7 entries appear in the `oldref` table.
  - Ran `bin/repository discover` with some debugging statements added, saw sensible-seeming behavior which didn't double-mark any newly-unreachable refs.

Differential Revision: https://secure.phabricator.com/D21056
2020-04-03 13:28:42 -07:00
epriestley
6ccb6a6463 Update "git rev-parse" invocation to work in Git 2.25.0
Summary:
Fixes T13479. The behavior of "git rev-parse --show-toplevel" has changed in Git 2.25.0, and it now fails in bare repositories.

Instead, use "git rev-parse --git-dir" to sanity-check the working copy. This appears to have more stable behavior across Git versions, although it's a little more complicated for our purposes.

Test Plan:
  - Ran `bin/repository update ...` on an observed, bare repository.
  - ...on an observed, non-bare ("legacy") repository.
  - ...on a hosted, bare repository.

Maniphest Tasks: T13479

Differential Revision: https://secure.phabricator.com/D20945
2020-01-16 11:39:23 -08:00
epriestley
1b40f7e540 Always initialize Git repositories with "git init", never with "git clone"
Summary:
Fixes T13448. We currently "git clone" to initialize repositories, but this will fetch too many refs if "Fetch Refs" is configured.

In modern Phabricator, there's no apparent reason to "git clone"; we can just "git init" instead. This workflow naturally falls through to an update, where we'll do a "git fetch" and pull in exactly the refs we want.

Test Plan:
  - Configured an observed repository with "Fetch Refs".
  - Destroyed the working copy.
  - Ran "bin/repository pull X --trace --verbose".
    - Before: saw "git clone" pull in the world.
    - After: saw "git init" create a bare empty working copy, then "git fetch" fill it surgically.

Both flows end up in the same place, this one is just simpler and does less work.

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20894
2019-11-07 16:17:35 -08:00
epriestley
8dd57a1ed2 When fetching Git repositories, pass "--no-tags" to make explicit "Fetch Refs" operate more narrowly
Summary:
Ref T13448. The default behavior of "git fetch '+refs/heads/master:refs/heads/master'" is to follow and fetch associated tags.

We don't want this when we pass a narrow refspec because of "Fetch Refs" configuration. Tell Git that we only want the refs we explicitly passed.

Note that this doesn't prevent us from fetching tags (if the refspec specifies them), it just stops us from fetching extra tags that aren't part of the refspec.

Test Plan:
  - Ran "bin/repository pull X --trace --verbose" in a repository with a "Fetch Refs" configuration, saw only "master" get fetched (previously: "master" and reachable tags).
  - Ran "git fetch --no-tags '+refs/*:refs/*'", saw tags fetched normally ("--no-tags" does not disable fetching tags).

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20893
2019-11-07 16:17:08 -08:00
epriestley
9dbde24dda Manually prune Git working copy refs instead of using "--prune", to improve "Fetch Refs" behavior
Summary:
Ref T13448. When "Fetch Refs" is configured:

  - We switch to a narrow mode when running "ls-remote" against the local working copy. This excludes surplus refs, so we'll incorrectly detect that the local and remote working copies are identical in cases where the local working copy really has surplus refs.
  - We rely on "--prune" to remove surplus local refs, but it only prunes refs matching the refspecs we pass "git fetch". Since these refspecs are very narrow under "Fetch Only", the pruning behavior is also very narrow.

Instead:

  - When listing local refs, always list everything. If we have too much stuff locally, we want to get rid of it.
  - When we identify surplus local refs, explicitly delete them instead of relying on "--prune". We can just do this in all cases so we don't have separate "--prune" and "manual" cases.

Test Plan:
  - Created a new repository, observed from a GitHub repository, with many tags/refs/branches. Pulled it.
  - Observed lots of refs in `git for-each-ref`.
  - Changed "Fetch Refs" to "refs/heads/master".
  - Ran `bin/repository pull X --trace --verbose`.

On deciding to do something:

  - Before: since "master" did not change, the pull declined to act.
  - After: the pull detected surplus refs and deleted them. Since the states then matched, it declined further action.

On pruning:

  - Before: if the pull was forced to act, it ran "fetch --prune" with a narrow refspec, which did not prune the working copy.
  - After: saw working copy pruned explicitly with "update-ref -d" commands.

Also, set "Fetch Refs" back to the default (empty) and pulled, saw everything pull.

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20892
2019-11-07 16:15:46 -08:00
epriestley
7badec23a7 Use "git log ... --stdin" instead of "git log ... --not ..." to avoid oversized command lines
Summary:
Depends on D20853. See PHI1474. If the list of "--not" refs is sufficiently long, we may exceed the maximum size of a command.

Use "--stdin" instead, and swap "--not" for the slightly less readable but functionally equivalent "^hash", which has the advantage of actually working with "--stdin".

Test Plan: Ran `bin/repository refs ...` with nothing to be done, and with something to be done.

Differential Revision: https://secure.phabricator.com/D20854
2019-10-02 14:08:15 -07:00
epriestley
6d74736a7e When a large number of commits need to be marked as published, issue the lookup query in chunks
Summary: See PHI1474. This query can become large enough to exceed reasonable packet limits. Chunk the query so it is split up if we have too many identifiers.

Test Plan: Ran `bin/repository refs ...` on a repository with no new commits and a repository with some new commits.

Differential Revision: https://secure.phabricator.com/D20853
2019-10-02 14:06:07 -07:00
epriestley
06edcf2709 Fix an issue where ancestors of permanent refs might not be published during import or if a branch is later made permanent
Summary:
Fixes T13284. See that task for substantial discussion. There are currently two cases where we'll skip over commits which we should publish:

  - if a branch is not permanent, then later made permanent; or
  - in some cases, the first time we examine branches in a repository.

In both cases, this error is one-shot and things work correctly going forward. The root cause is conflation between the states "this ref currently permanent" and "this ref was permanent the last time we updated refs".

Separate these pieces of state and cover all these cases. Also introduce a "--rebuild" flag to fix the state of bad commits.

Test Plan:
See T13284 for the three major cases:

  - initial import;
  - push changes to a nonpermanent branch, update, then make it permanent;
  - push chanegs to a nonpermanent branch, update, push more changes, then make it permanent.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13284

Differential Revision: https://secure.phabricator.com/D20829
2019-09-25 08:54:50 -07:00
epriestley
3c26e38487 Provide a simple read-only maintenance mode for repositories
Summary:
Ref T13393. While doing a shard migration in the Phacility cluster, we'd like to stop writes to the migrating repository. It's safe to continue serving reads.

Add a simple maintenance mode for making repositories completely read-only during maintenance.

Test Plan: Put a repository into read-only mode, tried to write via HTTP + SSH. Viewed web UI. Took it back out of maintenance mode.

Maniphest Tasks: T13393

Differential Revision: https://secure.phabricator.com/D20748
2019-08-29 15:23:10 -07:00
epriestley
82cf97ad65 When many commits are discovered at once, import them at lower priority
Summary:
Ref T13369. See that task for discussion.

When the discovery daemon finds more than 64 commits to import, demote the worker queue priority of the resulting tasks.

Test Plan:
  - Pushed one commit, ran `bin/repository discover --verbose --trace ...`, saw commit import with "at normal priority" message and priority 2500 ("PRIORITY_COMMIT").
  - Pushed 3 commits, set threshold to 3, ran `bin/repository discover ...`, saw commist import with "at lower priority" message and priority 4000 ("PRIORITY_IMPORT").

Maniphest Tasks: T13369

Differential Revision: https://secure.phabricator.com/D20712
2019-08-12 12:59:00 -07:00
epriestley
c0dc411d23 Update "phabricator/" for "topological" API changes
Summary: Ref T13325.

Test Plan:
  - Grepped for `topograph`.
  - Viewed a task graph since that's easy, looked fine.

Reviewers: amckinley

Reviewed By: amckinley

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13325

Differential Revision: https://secure.phabricator.com/D20599
2019-06-20 16:11:56 -07:00
epriestley
8f43c773b8 Remove nearly all remaining references to "Autoclose"
Summary:
Depends on D20464. Ref T13277. Broadly:

  - Move all the "should publish X" and "why aren't we publishing X" stuff to a separate class (`PhabricatorRepositoryPublisher`).
  - Rename things to be more consistent with modern terminology ("Publish", "Permanent Refs").

Test Plan:
This could use some trial-by-fire on `secure`, but:

  - Grepped for all symbols.
  - Viewed various commits.
  - Reparsed commits.
  - Here's a commit with an explanation:

{F6394569}

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13277

Differential Revision: https://secure.phabricator.com/D20465
2019-04-24 08:29:41 -07:00
epriestley
a82a2b8459 Simplify non-bare working copy rules for the new "git fetch" strategy
Summary:
Ref T13280. In D20421, I changed our observe strategy to `git fetch <uri>` in all cases.

This doesn't work in an ancient, non-bare repository if `master` is checked out and `master` is also fetch: `git` refuses to overwrite the local ref unless we pass `--update-head-ok`. Pass this flag.

Also, remove some code which examines branches and tags in a special way for non-bare working copies. The old `git fetch <origin>` code without explicit revsets meant that `refs/remotes/orgin/heads/xyz` got updated instead of `refs/heads/xyz`. We now update our local refs in all cases (bare and non-bare) so we can throw away this special casing.

Test Plan:
  - Replaced a modern bare working copy with a non-bare working copy by explicitly using `git clone` without `--bare`.
  - Ran `bin/repository update`, hit the `--update-head-ok` error. Applied the patch, got a clean update.
  - Used the "repository.branchquery" API method...
    - ...with "contains" to trigger the "git branch" case. Got identical results after removing the special casing.
    - ...without "contains" to trigger the "low level ref" case. Got identical results after removing the special casing.
  - Grepped for `isWorkingCopyBare()`. The only remaining callsites deal with hook paths, and genuinely need to be specialized.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13280

Differential Revision: https://secure.phabricator.com/D20450
2019-04-19 07:06:10 -07:00
epriestley
e1076528ef Copy the "line-chart" behavior to "line-chart-legacy" to keep "Maniphest > Reports" working
Summary:
Ref T13279. Charting changes alter how the "line-chart" behavior works, but the "Burnup Chart" still relies on the old behavior.

Although I'm intending to remove "Maniphest > Reports" once Facts is a minimally sufficient replacement, copy this behavior to keep it working until we're ready to pull the trigger.

Also fix a leftover typo from D20435.

Test Plan: Viewed a legacy Maniphest burnup rate report.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13279

Differential Revision: https://secure.phabricator.com/D20449
2019-04-19 07:05:37 -07:00
epriestley
faf0a311ec In Git repositories, use "git symbolic-ref HEAD ..." to select the default branch
Summary:
Depends on D20434. Fixes T5963. Broadly, the issue here is that when:

  - You create a new, empty repository.
  - Then, you work on some branch other than `master`, without ever creating `master`.

...you get a warning on `git clone`:

> warning: remote HEAD refers to nonexistent ref, unable to checkout

To fix this, point the symbolic-ref HEAD at `refs/heads/<default-branch>` after installing commit hooks.

This fixes the warning, and also means that `git clone` will check out the repository default branch by default, which is nice.

There are a few caveats about this behavior (see T5963 for discussion) but nothing too substantial.

The only real issue is that Git prevents deletion of the default branch without a config setting. Just set that settting.

Test Plan:
See T5963.

In a repository, set `HEAD` to point somewhere invalid. Ran `bin/repository update ...`. Saw HEAD pointed back at the repository default branch.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T5963

Differential Revision: https://secure.phabricator.com/D20435
2019-04-18 05:46:40 -07:00
epriestley
d1223ac577 When a commit appears as an ancestor of a permanent ref for the first time, run all import steps
Summary:
Depends on D20427. Ref T13277. As an optimization, when we discover that a commit which was previously only on a non-permanent ref ("tmp-epriestley-123") is now reachable from a permanent ref ("master"), we currently queue only a new "message" parse step.

This is an optimization because these commits previously got the full treatment (feed, publish, audit, etc) as soon as they were discovered. Now, those steps only happen once a commit is reachable from a permanent ref, so we need to run everything.

Test Plan:
  - Pushed local "tmp-123" branch to remote tag "tmp-123".
  - Updated repository with "bin/repository update", saw commit import as a "not on any permanent ref" commit, with no herald/audit/etc.
  - Merged "tmp-123" tag into "master".
  - Pushed new "master".
  - Updated repository with "bin/repository refs ... --trace --verbose", saw commit detected as now reachable from a permament ref.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13277

Differential Revision: https://secure.phabricator.com/D20428
2019-04-18 05:37:15 -07:00
epriestley
e910c76e65 Add "Fetch Rules" to observed Git repositories
Summary:
Depends on D20421. Ref T13277. I'd generally like to move away from "Track Only".

Some of the use cases for "Track Only" (or adjacent to "Track Only") are better resolved with "Fetch Rules" -- basically, rules to fetch only some subset of refs from the observed remote.

Add configurable "Fetch Rules" for Git repositories. For example, if you only want to fetch `master`, you can now speify:

```
refs/heads/master
```

If you only want to fetch branches and tags, you can use:

```
refs/heads/*
refs/tags/*
```

In theory, this is slightly less powerful in the general case than "Track Only", but gives us better behavior in some cases (e.g., when the remote has 50K random temporary branches). In practice, I think this and a better "Autoclose Only" will let us move away from "Track Only", get default behavior which is better aligned with what users actually expect, and dodge all the "track tags/refs" questions.

Test Plan: Configured repositories with "Fetch Refs" rules, used `bin/repository pull --verbose --trace ...` to run pulls, saw expected pull/fetch behavior.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13277

Differential Revision: https://secure.phabricator.com/D20422
2019-04-18 05:09:36 -07:00
epriestley
78aab6d4a5 When observing a repository in Git, just "fetch <url>" without worrying about the "origin" remote
Summary:
Depends on D20420. Ref T13277. We currently spend substantial effort trying to detect and correct the URL of the "origin" remote in Git repositories.

I believe this is unnecessary, and we can always `git fetch <url> ...` to get the desired result instead of `git muck-with-origin + git fetch origin ...`. We already do this in the more recent parts of the codebase (e.g., intracluster sync) and it works correctly in every case I'm aware of.

Test Plan:

  - Grepped for `origin`, ` origin `.
  - Ran `bin/repository update ...` to fetch a mirrored repository.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13277

Differential Revision: https://secure.phabricator.com/D20421
2019-04-18 05:05:43 -07:00
Austin McKinley
fe5fde5910 Assign RepositoryIdentity objects to commits
Summary: Depends on D19429. Depends on D19423. Ref T12164. This creates new columns `authorIdentityPHID` and `committerIdentityPHID` on commit objects and starts populating them. Also adds the ability to explicitly set an Identity's assignee to "unassigned()" to null out an incorrect auto-assign. Adds more search functionality to identities. Also creates a daemon task for handling users adding new email address and attempts to associate unclaimed identities.

Test Plan: Imported some repos, watched new columns get populated. Added a new email address for a previous commit, saw daemon job run and assign the identity to the new user. Searched for identities in various and sundry ways.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T12164

Differential Revision: https://secure.phabricator.com/D19443
2018-05-31 07:28:23 -07:00
Austin McKinley
f191a66490 Add controllers/search/edit engine functionality to RepositoryIdentity
Summary: Depends on D19423. Ref T12164. Adds controllers capable of listing and editing `PhabricatorRepositoryIdentity` objects. Starts creating those objects when commits are parsed.

Test Plan: Reparsed some revisions, observed objects getting created in the database. Altered some `Identity` objects using the controllers and observed effects in the database. No attempts made to validate behavior under "challenging" author/committer strings.

Reviewers: epriestley

Reviewed By: epriestley

Subscribers: Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T12164

Differential Revision: https://secure.phabricator.com/D19429
2018-05-31 07:03:25 -07:00
epriestley
fd367adbaf Parameterize PhabricatorGlobalLock
Summary:
Ref T13096. Currently, we do a fair amount of clever digesting and string manipulation to build lock names which are less than 64 characters long while still being reasonably readable.

Instead, do more of this automatically. This will let lock acquisition become simpler and make it more possible to build a useful lock log.

Test Plan: Ran `bin/repository update`, saw a reasonable lock acquire and release.

Maniphest Tasks: T13096

Differential Revision: https://secure.phabricator.com/D19173
2018-03-05 15:30:27 -08:00
epriestley
d28ddc21a5 Don't error when trying to mirror or observe an empty repository
Summary:
Fixes T5965.

Fixes two issues:

  - Observing an empty repository could write a warning to the log.
  - Mirroring an empty repository to a remote could fail.

For observing:

If newly-created with `git init --bare`, `git ls-remote` will
return the empty string.  Properly return an empty set of refs, rather
than attempting to parse the single "line" that is produced by
splitting that on newlines:

```
[2018-01-23 18:47:00] ERROR 8: Undefined offset: 1 at [/phab_path/phabricator/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:405]
arcanist(head=master, ref.master=5634f8410176), phabricator(head=master, ref.master=12551a1055ce), phutil(head=master, ref.master=4755785517cf)
  #0 PhabricatorRepositoryPullEngine::loadGitRemoteRefs(PhabricatorRepository) called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:343]
  #1 PhabricatorRepositoryPullEngine::executeGitUpdate() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:126]
  #2 PhabricatorRepositoryPullEngine::pullRepositoryWithLock() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryPullEngine.php:40]
  #3 PhabricatorRepositoryPullEngine::pullRepository() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementUpdateWorkflow.php:59]
  ...
```

For mirroring:

`git` treats `git push --mirror` specially when a repository is empty. Detect this case by seeing if `git for-each-ref --count 1` does anything. If the repository is empty, just bail.

Test Plan:
  - Observed an empty and non-empty repository.
  - Mirrored an empty and non-empty repository.

Reviewers: alexmv, amckinley

Reviewed By: alexmv

Subscribers: Korvin, epriestley

Maniphest Tasks: T5965

Differential Revision: https://secure.phabricator.com/D18920
2018-01-24 15:50:30 -08:00
Dmitri Iouchtchenko
9bd6a37055 Fix spelling
Summary: Noticed a couple of typos in the docs, and then things got out of hand.

Test Plan:
  - Stared at the words until my eyes watered and the letters began to swim on the screen.
  - Consulted a dictionary.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam

Differential Revision: https://secure.phabricator.com/D18693
2017-10-09 10:48:04 -07:00
epriestley
8982e3e52d Update major RefCursor callsites to work properly with RefPosition
Summary:
Ref T11823. This is the meaty part of the change, and updates `RefEngine` to use separate RefCursor (for names) and RefPosition (for actual commit positions) tables.

I'll hold this whole series until after the release cut so it has some time to bake on `secure` to look for issues. It's also not a huge problem if there are bugs here since these tables are just caches anyway, although they do feed into some other things, and obviously it's never good to have bugs.

Test Plan:
  - This logic can be invoked directly with `bin/repository refs <repository> --trace --verbose`.
  - Ran that on unchanged repositories, new branches, removed branches, and modified branches. Saw appropriate output and cursor positions.
  - Ran on a mercurial repository to test the close/open logic, saw it correct open/closed state of incorrect positions.
  - Browed around Diffusion in various repositories.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T11823

Differential Revision: https://secure.phabricator.com/D18614
2017-09-15 10:21:32 -07:00
epriestley
2c150076b0 Stop populating or updating working copies in observed Mercurial repositories
Summary:
Ref T12961. Fixes T4416. Currently, for observed Mercurial repositories, we build a working copy with `pull -u` (for "update").

This should be unnecessary, and we don't do it for hosted Mercurial repositories. We also stopped doing it years ago for Git repositories. We also don't clone Mercurial repositories with a working copy.

It's possible something has slipped through the cracks here so I'll hold this until after the release cut, but I believe there are no actual technical blockers here.

Test Plan:
  - Observed a public Mercurial repository on Bitbucket.
  - Let it import.
  - Browsed commits, branches, file content, etc., without any apparent issues.

Reviewers: chad

Reviewed By: chad

Subscribers: cspeckmim

Maniphest Tasks: T12961, T4416

Differential Revision: https://secure.phabricator.com/D18390
2017-08-10 19:14:56 -07:00
epriestley
f48f2dae9f Move Phabricator to use PhutilBinaryAnalyzer and show binary versions
Summary:
Fixes T12942.

  - Adds binary version and path information to {nav Config > Version Information}.
  - Replaces old code all over the place with new consolidated code.

Test Plan:
{F5073531}

Also faked some cases of missing binaries, bad versions, etc.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12942

Differential Revision: https://secure.phabricator.com/D18306
2017-08-01 07:14:48 -07:00
epriestley
a546b029b0 Censor credentials possibly present in Git remote URIs when pulls fail because of a URI mismatch
Summary: Fixes T12945.

Test Plan:
Mostly faked this, got a censored error:

```
$ ./bin/repository update R38
[2017-07-31 19:40:13] EXCEPTION: (Exception) Working copy at "/Users/epriestley/dev/core/repo/local/38/" has a mismatched origin URI, "https://********@example.com/". The expected origin URI is "https://github.com/phacility/libphutil.git". Fix your configuration, or set the remote URI correctly. To avoid breaking anything, Phabricator will not automatically fix this. at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryEngine.php:186]
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12945

Differential Revision: https://secure.phabricator.com/D18304
2017-07-31 09:43:01 -07:00
epriestley
d19fc2335e Don't use "--" to separate flags and arguments in "git ls-remote"
Summary: Fixes T12416. See that task for discussion. Slightly older versions of `git` do not appear to support use of `--` to separate flags and arguments.

Test Plan:
  - Ran `bin/repository update PHABX`.
  - In T12416, had a user with Git 2.1.4 confirm that `git ls-remote X` worked while `git ls-remote -- X` failed.
  - Read `git help ls-remote` to look for any kind of suspicious `--destroy-the-world` flags, didn't see any that made me uneasy.

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T12416

Differential Revision: https://secure.phabricator.com/D17508
2017-03-18 17:54:09 -07:00
epriestley
20892ae502 Simplify "git fetch" behavior in the Pull daemon
Summary:
Ref T12392. The logic currently goes like this:

  - Try a fetch.
  - If that fails, try repairing the origin URI.
  - Then try again.

This is pretty complicated, and we can use this simpler logic instead:

  - Set the origin URI to the right value.
  - Try a fetch.

Setting the origin URI is very fast. This can normally only get us in any trouble in very obscure situations which haven't occurred for many years:

  - Pretty much all of this is already covered by `verifyGitOrigin()`, which we run earlier.
  - Origins could be configured to have multiple URIs for some reason, but shouldn't be.
  - Years ago, you could configure Phabricator to point at a local repository it didn't own and that could conceivably have a different "origin" that you might not want us to delete. If you did this, the daemons have been spewing errors for 3-4 years without you fixing it. The cost of fixing the remote URI is very small even if anyone is affected by this (just set it back to the old value) and there's zero reason to do this and the scenario is ridiculous.

Test Plan: Ran `bin/repository update PHABX --trace --verbose`, saw fetches go through cleanly after URI adjustment.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12392

Differential Revision: https://secure.phabricator.com/D17498
2017-03-17 16:43:37 -07:00
epriestley
2b0ad243d1 Use "git ls-remote" to guess if "git fetch" is a no-op
Summary:
Ref T12296. Ref T12392. Currently, when we're observing a remote repository, we periodically run `git fetch ...`.

Instead, periodically run `git ls-remote` (to list refs in the remote) and `git for-each-ref` (to list local refs) and only continue if the two lists are different.

The motivations for this are:

  - In T12296, it appears that doing this is //faster// than doing a no-op `git fetch`. This effect seems to reproduce locally in a clean environment (900ms for `ls-remote` + 100ms for `for-each-ref` vs about 1.4s for `fetch`). I don't have any explanation for why this is, but there it is. This isn't a huge change, although the time we're saving does appear to mostly be local CPU time, which is good for us.
  - Because we control all writes, we could cache `git for-each-ref` in the future and do fewer disk operations. This doesn't necessarily seem too valuable, though.
  - This allows us to tell if a fetch will do anything or not, and make better decisions around clustering (in particular, simplify how observed repository versioning works). With `git fetch`, we can't easily distinguish between "fetch, but nothing changed" and "legitimate fetch".

If a repository updates very regularly we end up doing slightly more work this way (that is, if `ls-remote` always comes back with changes, we do a little extra work), but this is normally very rare.

This might not get non-bare repositories quite right in some cases (i.e., incorrectly detect them as changed when they are unchanged) but we haven't created non-bare repositories for many years.

Test Plan: Ran `bin/repository update --trace --verbose PHABX`, saw sensible construction of local and remote maps and accurate detection of whether a fetch would do anything or not.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12392, T12296

Differential Revision: https://secure.phabricator.com/D17497
2017-03-17 16:43:04 -07:00
epriestley
4270649abe Increase the size of the Diffusion commit cache
Summary:
Ref T12296. This cache is used to cache Git ref heads (branches, tags, etc). Reasonable repositories may have more than 2048 of these.

When we miss the cache, we need to single-get refs to check them, which is relatively expensive.

Increasing the size of the cache to 65535 should only require about 7.5MB of RAM.

Additionally, fill only as much of the cache as actually fits. The FIFO nature of the cache can get us into trouble otherwise.

If we insert "A, B, C, D" and then lookup A, B, C, D, but the cache has maximum size 3, we get this:

  - Insert A, B, C, D: cache is now "B, C, D".
  - Lookup A: miss, single get, insert, purge, cache is now "C, D, A".
  - Lookup B: miss, singel get, insert, purge, cache is now "D, A, B".

Test Plan:
  - Reduced cache size to 5, observed reasonable behavior on the `array_slice()` locally with `bin/repository update` + `var_dump()`.
  - Used this script to estimate the size of 65535 cache entries as 7.5MB:

```
epriestley@orbital ~ $ cat size.php
<?php

$cache = array();

$mem_start = memory_get_usage();
for ($ii = 0; $ii < 65535; $ii++) {
  $cache[sha1($ii)] = true;
}

echo number_format(memory_get_usage() - $mem_start)." bytes\n";
epriestley@orbital ~ $ php -f size.php
7,602,176 bytes
```

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12296

Differential Revision: https://secure.phabricator.com/D17409
2017-02-24 10:54:19 -08:00
Jakub Vrana
9f3cde4db7 Fix errors found by PHPStan
Test Plan: None.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley

Differential Revision: https://secure.phabricator.com/D17377
2017-02-18 09:24:56 +00:00
epriestley
4516109495 Survive hand-crafted Git commits which are missing timestamp information
Summary:
Fixes T12062. Like the commits from the year 3500, you can artificially build commits with no date information.

We could explicitly store these as `null` to fully respect the underlying datastore. However, I think it's very unlikely that these commits are intentional/meaningful or that this is valuable.

Additionally, "git show" interprets these commits as "Jan 1, 1970". Just store a `0` to mimic its behavior.

Test Plan:
  - Following the process in T11537#192019, artificially created a commit with //no// date information (I deleted all date information from the message).
  - Used `git show` / `git log --format ...` to inspect it: "Jan 1, 1970" on `git show`, no information at all on `%aD`, `%aT`, etc.
  - Pushed it.
  - Saw exception for trying to insert empty string into epoch colum from `bin/repository update`.
  - Applied patch.
  - Got a clean import.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12062

Differential Revision: https://secure.phabricator.com/D17136
2017-01-04 09:07:46 -08:00
epriestley
db2425b300 Do initial repository imports at a lower priority and finish importing commits before starting new ones
Summary:
Fixes T11677. This makes two minor adjustments to the repository import daemons:

  - The first step ("Message") now queues at a slightly-lower-than-default (for already-imported repositories) or very-low (for newly importing repositories) priority level.
  - The other steps now queue at "default" priority level. This is actually what they already did, but without this change their behavior would be to inherit the priority level of their parents.

This has two effects:

  - When adding new repositories to an existing install, they shouldn't block other things from happening anymore.
  - The daemons will tend to start one commit and run through all of its steps before starting another commit. This makes progress through the queue more even and predictable.
    - Before, they did ALL the message tasks, then ALL the change tasks, etc. This works fine but is confusing/uneven/less-predictable because each type of task takes a different amount of time.

Test Plan:
  - Added a new repository.
  - Saw all of its "message" steps queue at priority 4000.
  - Saw followups queue at priority 2000.
  - Saw progress generally "finish what you started" -- go through the queue one commit at a time, instead of one type of task at a time.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11677

Differential Revision: https://secure.phabricator.com/D16585
2016-09-21 16:41:01 -07:00
epriestley
d3280c406d When repositories hit pull errors, stop updating them as frequently
Summary:
Ref T11665. Currently, when a repository hits an error, we retry it after 15s. This is correct if the error was temporary/transient/config-related (e.g., bad network or administrator setting up credentials) but not so great if the error is long-lasting (completely bad authentication, invalid URI, etc), as it can pile up to a meaningful amount of unnecessary load over time.

Instead, record how many times in a row we've hit an error and adjust backoff behavior: first error is 15s, then 30s, 45s, etc.

Additionally, when computing the backoff for an empty repository, use the repository creation time as though it was the most recent commit. This is a good proxy which gives us reasonable backoff behavior.

This required removing the `CODE_WORKING` messages, since they would have reset the error count. We could restore them (as a different type of message), but I think they aren't particularly useful since cloning usually doesn't take too long and there's more status information avilable now than there was when this stuff was written.

Test Plan:
  - Ran `bin/phd debug pull`.
  - Saw sensible, increasing backoffs selected for repositories with errors.
  - Saw sensible backoffs selected for empty repositories.

Reviewers: chad

Maniphest Tasks: T11665

Differential Revision: https://secure.phabricator.com/D16575
2016-09-19 17:29:56 -07:00
epriestley
d952dd5912 When importing Git repositories, treat out-of-range timestamps as the current time
Summary:
Fixes T11537. See that task for discussion.

Although we could accommodate these faithfully, it requires a huge migration and affects one repository on one install which was written with buggy tools.

At least for now, just replace out-of-32-bit-range epoch values with the current time, which is often somewhat close to the real value.

Test Plan:
  - Following the instructions in T11537, created commits in 40,000 AD.
  - Tried to import them, reproducing the "epoch" database issue.
  - Applied the patch.
  - Successfully imported future-commits, with some liberties around commit dates. Note that author date (not stored in an `epoch` column) is still shown faithfully:

{F1789302}

Reviewers: chad, avivey

Reviewed By: avivey

Maniphest Tasks: T11537

Differential Revision: https://secure.phabricator.com/D16456
2016-08-26 07:38:53 -07:00
Aviv Eyal
c56a4fce66 Only load refs that are actual commits
Summary: Fix T11301. Git is git.

Test Plan: tagged a file! run discover. no crash.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: Korvin

Maniphest Tasks: T11301

Differential Revision: https://secure.phabricator.com/D16261
2016-07-08 22:34:25 +00:00
epriestley
5ffdb73273 Don't try to prune unreachable commits from repositories with no outdated refs
Summary:
Fixes T11269. The basic issue is that `git log` in an empty repository exits with an error message.

Prior to recent Git (2.6?), this message reads:

> fatal: bad default revision 'HEAD'

This message was somewhat recently changed by <ce11360467>. After that, it reads:

> fatal: your current branch 'master' does not have any commits yet

This change isn't //technically// a //complete// fix because you could still hit this issue like this:

  - Create an empty repository.
  - Push some stuff to `master`.
  - Delete `master`.

However, this is very rare and even in this case the repository will fix itself once you push something again. We can try to fix that if any users ever actually hit it.

Test Plan:
  - Created a new empty Git repository.
  - Ran `bin/repository update Rxx`.
  - Before patch: "git log" error because of the empty repository.
  - After patch: clean update.
  - Also ran `repository update` on a non-empty repository.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11269

Differential Revision: https://secure.phabricator.com/D16234
2016-07-05 09:09:46 -07:00
epriestley
89f9f97159 Provide basic support for Subversion revprops
Summary:
Ref T11208. See that task for a more detailed description of revprops.

This allows revprop changes in a hosted Subversion repository if the repository has the "allow dangerous changes" flag set.

In the future, we could expand this into real Herald support, but the only use case we have for now is letting `svnsync` work.

Test Plan:
Edited revprops with `svn propset --revprop -r 2 propkey propvalue repositoryuri`:

  - Tried before patch, got a "configure a commit hook" error.
  - Tried after patch, got a "dangerous change" error.
  - Allowed dangerous changes.
  - Did a revprop edit.
  - Prevented dangerous changes.
  - Got an error again.
  - Made a normal commit to an SVN repository.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11208

Differential Revision: https://secure.phabricator.com/D16174
2016-06-24 13:43:32 -07:00
epriestley
9a2c2505a0 Handle tag tags properly in discovery
Summary:
Fixes T11180. In Git, it's possible to tag a tag (????). When you do, we try to log the tag-object, which automatically resolves to the commit and fails.

Just skip these. If "A" points at "B" which points at "C", it's fine to ignore "A" and "B" since we'll get the same stuff when we process "C".

Test Plan:
  - Tagged a tag.
  - Pushed it.
  - Discovered it.
  - Before patch: got exception similar to the one in T11180.
  - After patch: got tag-tag skipped. Also got slightly better error messages.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11180

Differential Revision: https://secure.phabricator.com/D16149
2016-06-20 11:10:02 -07:00