1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-08 13:51:02 +01:00
Commit graph

1162 commits

Author SHA1 Message Date
epriestley
42c0c0e3d2 Remove or correct various "phabricator/" references to "libphutil"
Summary:
Ref T13395. "libphutil/" was stripped for parts, but some documentation still references it. This is mostly minor corrections, but:

  - Removes "Javelin at Facebook", long obsolete.
  - Removes "php FPM warmup", which was always a prototype and is obsoleted by PHP preloading in recent PHP.

Test Plan: `grep` / reading

Maniphest Tasks: T13395

Differential Revision: https://secure.phabricator.com/D21624
2021-03-16 10:28:07 -07:00
epriestley
ac2f5a1046 Modernize and clean up "PhabricatorAuditStatusConstants"
Summary:
Ref T13631. Move "PhabricatorAuditStatusConstants" to a more modern object ("PhabricatorAuditRequestStatus").

Expose the status value via Conduit.

Test Plan:
  - Ran `bin/audit delete`.
  - Viewed a commit with auditors in the web UI.
  - Grepped for affected symbols.
  - Called Conduit with the "auditors" attachment, saw auditor statuses.

Maniphest Tasks: T13631

Differential Revision: https://secure.phabricator.com/D21599
2021-03-10 09:21:55 -08:00
epriestley
2636d84d0c Remove very old Audit status constants and AuditRequest data
Summary:
Ref T13631. See that task for discussion.

  - "NONE": Probably never used?
  - "CC": Obsoleted by subscribers.
  - "AUDIT_NOT_REQUIRED": For Owners packages, obsoleted by edges.
  - "CLOSED": For "Close Audit", obsoleted by "Request Verification".

Test Plan:
  - Grepped for constants, browsed Diffusion.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13631

Differential Revision: https://secure.phabricator.com/D21598
2021-03-10 09:21:54 -08:00
epriestley
55532b3f74 Add a very basic "auditors" attachment to "differential.commit.search"
Summary: Ref T13631. For now, this only shows the auditor PHID. The current status constants could use some cleanup before they're exposed.

Test Plan: Queried with "auditors" attachment, saw basic auditor information.

Maniphest Tasks: T13631

Differential Revision: https://secure.phabricator.com/D21597
2021-03-10 09:21:54 -08:00
epriestley
f970b350ea Correct behavior of "writable" Almanac service binding for repository services
Summary: Ref T13611. This property worked correctly when implemented in D19357. The behavior was broken by D20775, which tested node-level routing but did not specifically re-test the "writable" property. This was difficult to spot because ref query outcomes weren't observable in the UI, and the ref itself had the correct property value.

Test Plan:
See D21575. After this change, the UI shows the correct state, rather than showing a read-only service ref as writable:

{F8465865}

Maniphest Tasks: T13611

Differential Revision: https://secure.phabricator.com/D21576
2021-02-25 12:29:17 -08:00
epriestley
e9804bb7e5 Provide hovercards for generic edge stories, and include more message information in commit hovercards
Summary:
Ref T13620.

  - Make generic edge stories render links with hovercards. Other story types (like subscriptions) already do this so I'm fairly certain this is just old code from before hovercards.
  - Include a longer commit message snippet in hovercards.

Test Plan: {F8465645}

Maniphest Tasks: T13620

Differential Revision: https://secure.phabricator.com/D21574
2021-02-25 10:29:58 -08:00
epriestley
00cf93548b Add an "--ignore-locality" flag to "bin/repository pull"
Summary:
Ref T13600. When migrating observed repositories between cluster services, impact can be better controlled by fetching a copy of the repository on the target host before clusterizing it.

In particular, in the Phacility cluster, migrations are generally from one shared shard to one dedicated shard. It's helpful to perform these migrations synchronously without waiting for the cluster to sync in the background (helpful in the sense that there are fewer steps and fewer commands to run).

This supports an "--observe" mode to the internal "bin/services load-repository" workflow, which transfers repository data by refetching it from the remote rather than by getting it from the older host. This fetch occurs before cluster configuration is adjusted.

Test Plan: Ran locally as a sanity check, will apply in production.

Maniphest Tasks: T13600

Differential Revision: https://secure.phabricator.com/D21544
2021-02-04 11:33:14 -08:00
epriestley
faf3f7b787 Internally, align commit processing tasks around PHIDs, not IDs
Summary: Ref T13591. This is a minor consistency change to use PHIDs instead of IDs in the commit import processing pipeline. PHIDs are generally more powerful in more contexts and it would be unusual for a modern worker to use an ID here.

Test Plan:
  - Made the "accept either ID or PHID" part of the change only.
  - Pushed a commit, parsed and reparsed it step by step (this tests that "commitID" tasks can still process normally).
  - Made the "write PHIDs" part of the change.
  - Pushed a commit, parsed and reparsed it step by step.
  - Looked at the task row in the database, saw PHID data.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21533
2021-02-02 13:40:10 -08:00
epriestley
de70a4ff26 Improve consistency in use of "via", "objectPHID", and "containerPHID" parameters in repository workers
Summary:
Ref T13591. Improve how parameters are passed between commit worker tasks:

  - Always pass "via", to track where tasks came from.
  - Always provide "objectPHID" (with the commit PHID).
  - Always provide "containerPHID" (with the repository PHID).

Test Plan:
  - Pushed a new commit.
  - Ran `bin/repository pull` + `bin/repository discover`, saw commit with all parameters.
  - Ran `bin/worker execute ...`, saw a Change worker and then a Publish worker with appropriate parameters.
  - Ran `bin/repository reparse ... --background`, saw workers queue with appropriate parameters.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21532
2021-02-02 13:40:09 -08:00
epriestley
3a74701555 Return Git HTTP error messages in an HTTP header
Summary:
Ref T13590. Currently, when you encounter a HTTP error in Git, there is no apparent way to make the client show any additional useful information. In particular, the response body is ignored.

We can partially get around this by putting the information in an "X-Phabricator-Message: ..." HTTP header, which is visible with "GIT_CURL_VERBOSE=1 git ...". Users won't normally know to look here, but it's still better than nothing.

Test Plan:
  - Ran "GIT_CURL_VERBOSE=1 git fetch" against a Phabricator HTTP URI that returned a HTTP/500 error.
    - Before: no clue what happened on the client.
    - After: client shows useful message in the "X-Phabricator-Message" header in debug output.

Maniphest Tasks: T13590

Differential Revision: https://secure.phabricator.com/D21523
2021-01-26 16:14:03 -08:00
epriestley
ed86c42b26 Improve performance of repository discovery in repositories with >65K refs
Summary:
Ref T13593. The commit cache in this Engine has a maximum fixed size (currently 65,535 entries).

If we execute discovery in a repository with more refs than this (e.g., 180K), we get fast lookups for the first 65,535 refs and slow lookups for the remaining refs.

Instead, divide the refs into chunks no larger than the cache size, and perform an explicit cache fill before each chunk is processed.

Test Plan:
  - Created a repository with 1K refs. Set cache size to 256. Ran discovery.
    - Before patch: saw one large cache fill and then ~750 single-gets.
    - After patch: saw four large cache fills.
  - Compared `bin/repository discover ... --verbose` output before and after patch for overall effect; saw no differences.

Maniphest Tasks: T13593

Differential Revision: https://secure.phabricator.com/D21521
2021-01-26 12:27:02 -08:00
epriestley
1da94dcf49 Correct some issues around IMPORTED_PERMANENT in RefEngine
Summary: Ref T13591. Fixes a few issues with the recent updates here discovered in more thorough testing.

Test Plan:
- Stopped the daemons.
- Created a new copy of Phabricator in Diffusion.
- Pulled it with `bin/repository pull ...`.
  - Got 17,278 commits on disk with `git log --all --format=%H`.
- Set permanent refs to "master".
- Discovered it with `bin/repository discover ...`.
  - This took 31.5s and inserted 17,278 tasks.
  - Verified that all tasks have priority 4,000 (PRIORITY_IMPORT).
  - Observed that 16,799 commits have IMPORTED_PERMANENT and 479 commits do not.
    - This matches `git log master --format=%H` exactly.
- Ran `bin/repository refs ...`. Expected no changes and saw no changes.
- Ran `bin/worker execute --active` for a minute or two. It processed all the impermanent changes first (since `bin/worker` is LIFO and these are supposed to process last).
  - Ran `bin/repository refs`. Expected no changes and saw no changes.
  - Marked all refs as permanent.
  - Starting state: 16,009 message tasks, all at priority 4000.
  - Ran `bin/repository refs`, expecting 479 new tasks at priority 4000.
  - Saw count rise to 16,488 as expected.
  - Saw all the new tasks have priority 4000 and all commits now have the IMPORTED_PERMANENT flag.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21518
2021-01-22 19:51:40 -08:00
epriestley
3cb543ef8f Lift logic for queueing commit import tasks into RepositoryEngine
Summary:
Ref T13591. There are currently two pathways to queue an import task for a commit: via repository discovery, or via a ref becoming permanent.

These pathways duplicate some logic and have behavioral differences: one does not set `objectPHID` properly, one does not set the priority correctly.

Unify these pathways, make them both set `objectPHID`, and make them both use the same priority logic.

Test Plan:
  - Discovered refs.
  - See later changes in this series for more complete test cases.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21516
2021-01-22 19:51:39 -08:00
epriestley
6716d4f6ae Separate "shouldPublishRef()" from "isPermanentRef()" and set "IMPORTED_PERMANENT" more narrowly
Summary:
Ref T13591. Currently, the "IMPORTED_PERMANENT" flag (previously "IMPORTED_CLOSEABLE", until D21514) flag is set by using the result of "shouldPublishRef()".

This method returns the wrong value for the flag when there is a repository-level reason not to publish the ref (most commonly, because the repository is currently importing).

Although it's correct that commits should not be published in an importing repository, that's already handled in the "PublishWorker" by testing "shouldPublishCommit()". The "IMPORTED_PERMANENT" flag should only reflect whether a commit is reachable from a permanent ref or not.

  - Move the relevant logic to a new method in Publisher.
  - Fill "IMPORTED_PERMANENT" narrowly from "isPermanentRef()", rather than broadly from "shouldPublishRef()".
  - Deduplicate some logic in "PhabricatorRepositoryRefEngine" which has the same intent as the logic in the Publisher.

Test Plan:
  - Ran discovery on a new repository, saw permanent commits marked as permanent from the beginning.
  - See later changes in this patch series for additional testing.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21515
2021-01-22 19:51:38 -08:00
epriestley
2d0e7c37e1 Rename "IMPORTED_CLOSEABLE" to "IMPORTED_PERMANENT" to clarify the meaning of the flag
Summary:
Ref T13591. This is an old flag with an old name, and there's an import bug because the outdated concept of "closable" is confusing two different behaviors.

This flag should mean only "is this commit reachable from a permanent ref?". Rename it to "IMPORTED_PERMANENT" to make that more clear.

Rename the "Unpublished" query to "Permanent" to make that more clear, as well.

Test Plan:
  - Grepped for all affected symbols.
  - Queried for all commmits, permament commits, and impermanent commits.
  - Ran repository discovery.
  - See also further changes in this change series for more extensive tests.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21514
2021-01-22 19:51:38 -08:00
epriestley
16a14af2bb Correct the behavior of "bin/repository discover --repair"
Summary:
Ref T13591. Since D8781, this flag does not function correctly in Git and Mercurial repositories, since ref discovery pre-fills the cache.

Move the "don't look at the database" behavior the flag enables into the cache lookup. D8781 should have been slightly more aggressive and done this, it was just overlooked.

Test Plan:
  - Ran `bin/repository discover --help` and read the updated help text.
  - Ran `bin/repository discover --repair` in a fully-discovered Git repository.
    - Before: no effect.
    - After: full rediscovery.

Maniphest Tasks: T13591

Differential Revision: https://secure.phabricator.com/D21513
2021-01-22 19:51:38 -08:00
epriestley
0e28105ff7 Further correct and disambigutate ref selectors passed to Git on the CLI
Summary:
Ref T13589. In D21510, not every ref selector got touched, and this isn't a valid construction in Git:

```
$ git ls-tree ... -- ''
```

Thus:

  - Disambiguate more (all?) ref selectors.
  - Correct the construction of "git ls-tree" when there is no path.
  - Clean some stuff up: make the construction of some flags and arguments more explicit, get rid of a needless "%C", prefer "%Ls" over acrobatics, etc.

Test Plan: Browsed/updated a local Git repository. (This change is somewhat difficult to test exhaustively, as evidenced by the "ls-tree" issue in D21510.)

Maniphest Tasks: T13589

Differential Revision: https://secure.phabricator.com/D21511
2021-01-20 12:07:14 -08:00
epriestley
f21a00a315 Fix an out-of-order issue in the new update-during-publish behavior
Summary:
Ref T13552. The Herald field "Accepted Differential revision" (and similar fields) depend on the task/revision update steps running before Herald executes.

Herald currently executes first, so it never sees associated revisions. Swap this order.

Test Plan: Published a commit, got a clean parse/import. Will test with production rules ("Cowboy Commits").

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21468
2020-09-17 13:40:45 -07:00
epriestley
6f78e2a91c When a commit is marked "closeable", clear the "published" flag
Summary:
Ref T13552. When a previously discovered commit becomes reachable from a permanent ref, we re-queue workers to update it. However, the commit may already be marked as "published", so the publish worker may do nothing.

It would perhaps be simpler to not mark the commit as published when it isn't reachable from a permanent ref, but this is tricky because the flag is also part of the "imported / all steps" state (see T13580).

Until that can be cleaned up, just clear the flag.

Test Plan:
  - Pushed a commit with "fixes X" to a non-permanent branch.
  - Pushed it to a permanent branch.
  - Before change: task failed to close.
  - After change: task closes properly.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21460
2020-09-15 17:36:42 -07:00
epriestley
a39c590442 Move task and revision closure to the "publishing" step of the commit import pipeline
Summary:
Ref T13552. Now that these steps can build their own "CommitRef" object from storage on the "CommitData" object, move them from the "Message" step to the "Publishing" step.

This should resolve the root issue in T13552, where a commit moved from a non-permanent branch to a permanent branch does not publish closures properly.

Test Plan: Used "bin/repository reparse --publish ..." to republish changes.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21450
2020-09-15 17:36:40 -07:00
epriestley
cebde34425 Make "CommitData" wrap and persist a "CommitRef" record
Summary:
Ref T13552. Turn "CommitData" into an application-level layer on top of the repository-level "CommitRef" object.

For older commits which will not have a "CommitRef" record on disk, build a synthetic one at runtime. This could eventually be migrated.

Test Plan: Ran "bin/repository reparse --message", browsed Diffusion.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21449
2020-09-15 17:36:40 -07:00
epriestley
e454c3dafe Wrap all direct access to author/committer properties on "CommitData"
Summary: Ref T13552. Currently, various callers read raw properties off "CommitData" directly. Wrap these in accessors to support storage changes which persist "CommitRef" information instead.

Test Plan:
- Ran "diffusion.querycommits", saw the same data before and after.
- Looked at a commit, saw authorship information and date.
- Viewed tags in a repository, saw author information.
- Ran "rebuild-identities", saw no net effect.
- Grepped for callers to "getCommitDetail(...)".

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21448
2020-09-15 17:36:39 -07:00
epriestley
3a80efa440 Build "DiffusionCommitRef" objects from "internal.commit.search", not "diffusion.querycommits", in the message parser worker
Summary: Ref T13552. Swap the call we're using to build "CommitRef" objects here to the recently-introduced "internal.commit.search" method.

Test Plan: Used "bin/repository reparse --message ..." to reparse commits, added "var_dump()" to inspect results. Saw sensible CommitRef and CommitData objects get built.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21446
2020-09-15 17:36:39 -07:00
epriestley
f6238f9d9b Remove "bin/repository lookup-users" workflow
Summary:
Ref T13552. This is one of two callsites to "diffusion.querycommits". It's an old debugging workflow which I haven't used in years and which is likely obsoleted by identities and other changes.

I believe the root problem here was also ultimately user error (a user has misconfigured their local Git author email as another user).

Test Plan: Grepped for "lookup-users", got no hits.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21444
2020-09-15 17:36:38 -07:00
epriestley
a9506097ea Add "internal.commit.search" to replace the cache bypass mode of "diffusion.querycommits"
Summary:
Ref T13552. Commit parsers currently invoke a special mode of "diffusion.querycommits", which is an older frozen method.

The replacement, "diffusion.commit.search", is not really appropriate for low-level access. This mode of having a single method which operates in "cache" or "non-cache" modes also ends up in a lot of unnecessary field shuffling.

Provide "internal.commit.search" as a modern equivalent that returns a "DiffusionCommitRef"-compatible structure.

Test Plan: Executed "internal.commit.search", got sensible low-level commit results.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21443
2020-09-15 17:36:38 -07:00
epriestley
a745055813 Lift Diffusion Conduit call proxying to the root level of Conduit
Summary:
Ref T13552. Some Diffusion conduit calls may only be served by a node which hosts a working copy on disk, so they're proxied if received by a different node.

This capability is currently bound tightly to "DiffusionRequest", which is a bundle of context parameters used by some Diffusion calls. However, call proxying is not fundamentally a Diffusion behavior.

I want to perform proxying on a "*.search" call which does not use the "DiffusionRequest" parameter bundle. Lift proxying to the root level of Conduit.

Test Plan: Browsed diffusion in a clusterized repsository.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21442
2020-09-15 17:36:37 -07:00
epriestley
367cd28927 Delete some commit dead parsing code
Summary: Ref T13552. Neither "$hashes" or "$user" are used, and constructing them has no side effects.

Test Plan: Searched for these symbols.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21441
2020-09-15 17:36:37 -07:00
epriestley
c6de7c66a3 Remove the "Graph" view as a dedicated repository view
Summary:
Ref T13552. Currently, Diffusion has two effectively identical history views, the "Graph" view and the "History" view.

These arose out of product uncertainty about the importance of the graph, but I think we can just put the graph on the "object item list" view and merge these views.

Test Plan: Looked at repositories in Diffusion, no longer saw a "Graph" tab. Grepped for "graph"-related symbols.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21409
2020-08-12 08:59:16 -07:00
epriestley
7fd6bf26a9 Modernize "Author" and "Committer" rendering for commits
Summary:
Ref T13552. Give "Commit" objects a more modern, identity-aware way to render author and committer information.

This uses handles in a more modern way and gives us a single read callsite for raw author and committer names.

Test Plan:
  - Grepped for callers to the old methods, found none. (There are a lot of "renderAuthor()" callers in transactions, but this call takes no parameters.)
  - Viewed some commits, saw sensible lists of authors and committers.

Maniphest Tasks: T13552

Differential Revision: https://secure.phabricator.com/D21405
2020-08-12 08:58:43 -07:00
epriestley
7d496f2c6d Collapse repository URI normalization code into Arcanist
Summary: Ref T13546. Companion change to D21372. Move URI normalization code to Arcanist to we can more-often resolve remote URIs correctly.

Test Plan: Grepped for affected symbols.

Maniphest Tasks: T13546

Differential Revision: https://secure.phabricator.com/D21373
2020-06-30 15:54:33 -07:00
epriestley
0c51885cf7 Survive importing Git commits with no commit message and/or no author
Summary:
Ref T13538. See PHI1739. Synthetic Git commits with no author and/or no commit message currently extract `null` and then fail to parse.

Ideally, we would carefully distinguish between `null` and empty string. In practice, that requires significant schema changes (these columns are non-nullable and have indexing requirements) and these cases are degenerate. These commits are challenging to build and can not normally be constructed with `git commit`.

At least for now, merge the `null` cases into the empty string cases so we can survive import.

Test Plan:
  - Constructed a commit with no author and no commit message using the approach described in T13538; pushed and parsed it.
  - Before: fatals during identity selection and storing the commit message (both roughly NULL inserts into non-null columns).
  - After: clean import.

This produces a less-than-ideal UI in Diffusion, but it doesn't break anything:

{F7492094}

Maniphest Tasks: T13538

Differential Revision: https://secure.phabricator.com/D21266
2020-05-18 20:02:21 -07:00
epriestley
1e7cc72cd8 Improve performance when marking commits as unreachable after multiple ref deletions
Summary:
See PHI1688. If many refs with a large amount of shared ancestry are deleted from a repository, we can spend much longer than necessary marking their mutual ancestors as unreachable over and over again.

For example, if refs A, B and C all point near the head of an obsolete "develop" branch and have about 1K shared commits reachable from no other refs, deleting all three refs will lead to us performing 3,000 mark-as-unreachable operations (once for each "<ref, commit>" pair).

Instead, we can stop exploring history once we reach an already-unreachable commit.

Test Plan:
  - Destroyed 7 similar refs simultaneously.
  - Ran `bin/repository refs`, saw 7 entries appear in the `oldref` table.
  - Ran `bin/repository discover` with some debugging statements added, saw sensible-seeming behavior which didn't double-mark any newly-unreachable refs.

Differential Revision: https://secure.phabricator.com/D21056
2020-04-03 13:28:42 -07:00
epriestley
1a59cae743 Update some Phabricator behaviors for changes to Futures
Summary:
Depends on D21053. Ref T11968. Three things have changed:

  - Overseers can no longer use FutureIterator to continue execution of an arbitrary list of futures from any state. Use FuturePool instead.
  - Same with repository daemons.
  - Probably (?) fix an API change in the Harbormaster exec future.

Test Plan:
  - Ran "bin/phd debug task" and "bin/phd debug pull", no longer saw Future-management related errors.
  - The Harbormaster future is easiest to test by just seeing if production works once this change is deployed there.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T11968

Differential Revision: https://secure.phabricator.com/D21054
2020-04-03 12:28:16 -07:00
epriestley
6ccb6a6463 Update "git rev-parse" invocation to work in Git 2.25.0
Summary:
Fixes T13479. The behavior of "git rev-parse --show-toplevel" has changed in Git 2.25.0, and it now fails in bare repositories.

Instead, use "git rev-parse --git-dir" to sanity-check the working copy. This appears to have more stable behavior across Git versions, although it's a little more complicated for our purposes.

Test Plan:
  - Ran `bin/repository update ...` on an observed, bare repository.
  - ...on an observed, non-bare ("legacy") repository.
  - ...on a hosted, bare repository.

Maniphest Tasks: T13479

Differential Revision: https://secure.phabricator.com/D20945
2020-01-16 11:39:23 -08:00
epriestley
374f8b10b3 Add a "--dry-run" flag to "bin/repository rebuild-identities"
Summary: Ref T13444. Allow the effects of performing an identity rebuild to be previewed without committing to any changes.

Test Plan: Ran "bin/repository rebuild-identities --all-identities" with and without "--dry-run".

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20922
2019-11-19 12:38:20 -08:00
epriestley
63d84e0b44 Improve use of keys when iterating over commits in "bin/audit delete" and "bin/repository rebuild-identities"
Summary:
Fixes T13457. Ref T13444. When we iterate over commits in a particular repository, the default iteration strategy can't effectively use the keys on the table.

Tweak the ordering so the "<repositoryID, epoch, [id]>" key can be used.

Test Plan:
  - Ran `bin/audit delete --repository X` and `bin/repository rebuild-identities --repository X` before and after changes.
    - With just the key changes, performance was slightly better. My local data isn't large enough to really emphasize the key changes.
    - With the page size changes, performance was a bit better (~30%, but on 1-3 second run durations).
  - Used `--trace` and ran `EXPLAIN ...` on the new queries, saw them select the "<repositoryID, epoch, [id]>" key and report a bare "Using index condition" in the "Extra" column.

Maniphest Tasks: T13457, T13444

Differential Revision: https://secure.phabricator.com/D20921
2019-11-19 10:18:55 -08:00
epriestley
a7aca500bc Update repository identities after all mutations to users and email addresses
Summary:
Ref T13444. Currently, many mutations to users and email addresses (particularly: user creation; and user and address destruction) do not propagate properly to repository identities.

Add hooks to all mutation workflows so repository identities get rebuilt properly when users are created, email addresses are removed, users or email addresses are destroyed, or email addresses are reassigned.

Test Plan:
- Added random email address to account, removed it.
- Added unassociated email address to account, saw identity update (and associate).
  - Removed it, saw identity update (and disassociate).
- Registered an account with an unassociated email address, saw identity update (and associate).
  - Destroyed the account, saw identity update (and disassociate).
- Added address X to account A, unverified.
  - Invited address X.
  - Clicked invite link as account B.
  - Confirmed desire to steal address.
  - Saw identity update and reassociate.

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20914
2019-11-19 09:41:59 -08:00
epriestley
18da346972 Add additional flags to "bin/repository rebuild-identities" to improve flexibility
Summary:
Ref T13444. Repository identities have, at a minimum, some bugs where they do not update relationships properly after many types of email address changes.

It is currently very difficult to fix this once the damage is done since there's no good way to inspect or rebuild them.

Take some steps toward improving observability and providing repair tools: allow `bin/repository rebuild-identities` to effect more repairs and operate on identities more surgically.

Test Plan: Ran `bin/repository rebuild-identities` with all new flags, saw what looked like reasonable rebuilds occur.

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20911
2019-11-19 09:39:48 -08:00
epriestley
0014d0404c Consolidate repository identity resolution and detection code
Summary: Ref T13444. Send all repository identity/detection through a new "DiffusionRepositoryIdentityEngine" which handles resolution and detection updates in one place.

Test Plan:
  - Ran `bin/repository reparse --message ...`, saw author/committer identity updates.
  - Added "goose@example.com" to my email addresses, ran daemons, saw the identity relationship get picked up.
  - Ran `bin/repository rebuild-identities ...`, saw sensible rebuilds.

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20910
2019-11-19 09:39:11 -08:00
epriestley
6afbb6102d Remove "PhabricatorEventType::TYPE_DIFFUSION_LOOKUPUSER" event
Summary: Ref T13444. This is an ancient event and part of the old event system. It is not likely to be in use anymore, and repository identities should generally replace it nowadays anyway.

Test Plan: Grepped for constant and related methods, no longer found any hits.

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20909
2019-11-19 09:38:03 -08:00
epriestley
a2b2c391a1 Distinguish between "Assigned" and "Effective" identity PHIDs more clearly and consistently
Summary:
Ref T13444. You can currently explicitly unassign an identity (useful if the matching algorithm is misfiring). However, this populates the "currentEffectiveUserPHID" with the "unassigned()" token, which mostly makes things more difficult.

When an identity is explicitly unassigned, convert that into an explicit `null` in the effective user PHID.

Then, realign "assigned" / "effective" language a bit. Previously, `withAssigneePHIDs(...)` actualy queried effective users, which was misleading. Finally, bulk up the list view a little bit to make testing slightly easier.

Test Plan:
  - Unassigned an identity, ran migration, saw `currentEffectiveUserPHID` become `NULL` for the identity.
  - Unassigned a fresh identity, saw NULL.
  - Queried for various identities under the modified constraints.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20908
2019-11-19 09:37:44 -08:00
epriestley
df0f5c6cee Make repository identity email address association case-insensitive
Summary:
Ref T13444. Currently, identities for a particular email address are queried with "LIKE" against a binary column, which makes the query case-sensitive.

  - Extract the email address into a separate "sort255" column.
  - Add a key for it.
  - Make the query a standard "IN (%Ls)" query.
  - Deal with weird cases where an email address is 10000 bytes long or full of binary junk.

Test Plan:
  - Ran migration, inspected database for general sanity.
  - Ran query script in T13444, saw it return the same hits for "git@" and "GIT@".

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13444

Differential Revision: https://secure.phabricator.com/D20907
2019-11-19 09:37:26 -08:00
epriestley
1b40f7e540 Always initialize Git repositories with "git init", never with "git clone"
Summary:
Fixes T13448. We currently "git clone" to initialize repositories, but this will fetch too many refs if "Fetch Refs" is configured.

In modern Phabricator, there's no apparent reason to "git clone"; we can just "git init" instead. This workflow naturally falls through to an update, where we'll do a "git fetch" and pull in exactly the refs we want.

Test Plan:
  - Configured an observed repository with "Fetch Refs".
  - Destroyed the working copy.
  - Ran "bin/repository pull X --trace --verbose".
    - Before: saw "git clone" pull in the world.
    - After: saw "git init" create a bare empty working copy, then "git fetch" fill it surgically.

Both flows end up in the same place, this one is just simpler and does less work.

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20894
2019-11-07 16:17:35 -08:00
epriestley
8dd57a1ed2 When fetching Git repositories, pass "--no-tags" to make explicit "Fetch Refs" operate more narrowly
Summary:
Ref T13448. The default behavior of "git fetch '+refs/heads/master:refs/heads/master'" is to follow and fetch associated tags.

We don't want this when we pass a narrow refspec because of "Fetch Refs" configuration. Tell Git that we only want the refs we explicitly passed.

Note that this doesn't prevent us from fetching tags (if the refspec specifies them), it just stops us from fetching extra tags that aren't part of the refspec.

Test Plan:
  - Ran "bin/repository pull X --trace --verbose" in a repository with a "Fetch Refs" configuration, saw only "master" get fetched (previously: "master" and reachable tags).
  - Ran "git fetch --no-tags '+refs/*:refs/*'", saw tags fetched normally ("--no-tags" does not disable fetching tags).

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20893
2019-11-07 16:17:08 -08:00
epriestley
9dbde24dda Manually prune Git working copy refs instead of using "--prune", to improve "Fetch Refs" behavior
Summary:
Ref T13448. When "Fetch Refs" is configured:

  - We switch to a narrow mode when running "ls-remote" against the local working copy. This excludes surplus refs, so we'll incorrectly detect that the local and remote working copies are identical in cases where the local working copy really has surplus refs.
  - We rely on "--prune" to remove surplus local refs, but it only prunes refs matching the refspecs we pass "git fetch". Since these refspecs are very narrow under "Fetch Only", the pruning behavior is also very narrow.

Instead:

  - When listing local refs, always list everything. If we have too much stuff locally, we want to get rid of it.
  - When we identify surplus local refs, explicitly delete them instead of relying on "--prune". We can just do this in all cases so we don't have separate "--prune" and "manual" cases.

Test Plan:
  - Created a new repository, observed from a GitHub repository, with many tags/refs/branches. Pulled it.
  - Observed lots of refs in `git for-each-ref`.
  - Changed "Fetch Refs" to "refs/heads/master".
  - Ran `bin/repository pull X --trace --verbose`.

On deciding to do something:

  - Before: since "master" did not change, the pull declined to act.
  - After: the pull detected surplus refs and deleted them. Since the states then matched, it declined further action.

On pruning:

  - Before: if the pull was forced to act, it ran "fetch --prune" with a narrow refspec, which did not prune the working copy.
  - After: saw working copy pruned explicitly with "update-ref -d" commands.

Also, set "Fetch Refs" back to the default (empty) and pulled, saw everything pull.

Maniphest Tasks: T13448

Differential Revision: https://secure.phabricator.com/D20892
2019-11-07 16:15:46 -08:00
epriestley
b0d9f89c95 Remove "State Icons" from handles
Summary: Ref T13440. This feature is used in only one interface which I'm about to rewrite, so throw it away.

Test Plan: Grepped for all affected symbols, didn't find any hits anywhere.

Maniphest Tasks: T13440

Differential Revision: https://secure.phabricator.com/D20882
2019-10-31 12:04:43 -07:00
epriestley
5d8457a07e In the repository URI index, store Phabricator's own URIs as tokens
Summary:
Fixes T13435. If you move Phabricator or copy data from one environment to another, the repository URI index currently still references the old URI, since it writes the URI as a plain string. This may make "arc which" and similar workflows have difficulty identifying repositories.

Instead, store the "phabricator.base-uri" domain and the "diffusion.ssh-host" domain as tokens, so lookups continue to work correctly even after these values change.

Test Plan:
  - Added unit tests to cover the normalization.
  - Ran migration, ran daemons, inspected `repository_uriindex` table, saw a mixture of sensible tokens (for local domains) and static domains (like "github.com").
  - Ran this thing:

```
$ echo '{"remoteURIs": ["ssh://git@local.phacility.com/diffusion/P"]}' | ./bin/conduit call --method repository.query --trace --input -
Reading input from stdin...
>>> [2] (+0) <conduit> repository.query()
>>> [3] (+3) <connect> local_repository
<<< [3] (+3) <connect> 555 us
>>> [4] (+5) <query> SELECT `r`.* FROM `repository` `r` LEFT JOIN `local_repository`.`repository_uriindex` uri ON r.phid = uri.repositoryPHID WHERE (uri.repositoryURI IN ('<base-uri>/diffusion/P')) GROUP BY `r`.phid ORDER BY `r`.`id` DESC LIMIT 101
<<< [4] (+5) <query> 596 us
<<< [2] (+6) <conduit> 6,108 us
{
  "result": [
    {
      "id": "1",
      "name": "Phabricator",
      "phid": "PHID-REPO-2psrynlauicce7d3q7g2",
      "callsign": "P",
      "monogram": "rP",
      "vcs": "git",
      "uri": "http://local.phacility.com/source/phabricator/",
      "remoteURI": "https://github.com/phacility/phabricator.git",
      "description": "asdf",
      "isActive": true,
      "isHosted": false,
      "isImporting": false,
      "encoding": "UTF-8",
      "staging": {
        "supported": true,
        "prefix": "phabricator",
        "uri": null
      }
    }
  ]
}
```

Note the `WHERE` clause in the query normalizes the URI into "<base-uri>", and the lookup succeeds.

Subscribers: PHID-OPKG-gm6ozazyms6q6i22gyam

Maniphest Tasks: T13435

Differential Revision: https://secure.phabricator.com/D20872
2019-10-28 14:44:06 -07:00
epriestley
5b36f0c97a Add default branch, description, and metrics (commit count, recent commit) to "diffusion.repository.search"
Summary: Fixes T13430. Provide more information about repositories in "diffusion.repository.search".

Test Plan: Used API console to call method (with new "metrics" attachment), reviewed output. Saw new fields returned.

Maniphest Tasks: T13430

Differential Revision: https://secure.phabricator.com/D20862
2019-10-24 17:16:58 -07:00
epriestley
7badec23a7 Use "git log ... --stdin" instead of "git log ... --not ..." to avoid oversized command lines
Summary:
Depends on D20853. See PHI1474. If the list of "--not" refs is sufficiently long, we may exceed the maximum size of a command.

Use "--stdin" instead, and swap "--not" for the slightly less readable but functionally equivalent "^hash", which has the advantage of actually working with "--stdin".

Test Plan: Ran `bin/repository refs ...` with nothing to be done, and with something to be done.

Differential Revision: https://secure.phabricator.com/D20854
2019-10-02 14:08:15 -07:00
epriestley
6d74736a7e When a large number of commits need to be marked as published, issue the lookup query in chunks
Summary: See PHI1474. This query can become large enough to exceed reasonable packet limits. Chunk the query so it is split up if we have too many identifiers.

Test Plan: Ran `bin/repository refs ...` on a repository with no new commits and a repository with some new commits.

Differential Revision: https://secure.phabricator.com/D20853
2019-10-02 14:06:07 -07:00