Summary:
Ref T13151. Ref T12164. Two small tweaks:
- If we aren't actually going to change anything, just skip the writes. This makes re-running/resuming a lot faster (~20x, locally).
- Print when we touch a commit so there's some kind of visible status.
This is just a small quality-of-life tweak that I wrote anyway while investigating T13152, and will make finishing off db024, db025 and db010 manually a little easier.
Test Plan:
- Set `authorIdentityPHID` + `committerIdentityPHID` to `NULL`.
- Ran `rebuild-identities`, saw status information.
- Ran `rebuild-identiites` again, saw it go faster with status information.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13151, T12164
Differential Revision: https://secure.phabricator.com/D19484
Summary:
Depends on D19443. Creates a workflow for populating the new identity table by iterating over commits, either one repo at a time or all at once. Locally caches identities to avoid fetching them `inf` times. An actual migration that invokes this workflow will come in another revision that won't land until at least next week.
Performance is ~2k commits in 4.9s on my local machine.
Test Plan: Ran locally a few times with a few different states of the `repository_identity` table.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: jcox, Korvin, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D19446
Summary:
Ref T13114. See PHI514. This makes some attempt to undo the damage caused by incorrectly publishing a repository.
Don't run this.
Test Plan: Yikes.
Maniphest Tasks: T13114
Differential Revision: https://secure.phabricator.com/D19271
Summary: Noticed a couple of typos in the docs, and then things got out of hand.
Test Plan:
- Stared at the words until my eyes watered and the letters began to swim on the screen.
- Consulted a dictionary.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam
Differential Revision: https://secure.phabricator.com/D18693
Summary:
Ref T11823. I think this is the last callsite which relies on the old data format: `bin/repository parents` rebuilds a cache which we don't currently use very heavily.
Update it to work with the new data.
Test Plan: Ran `bin/repository parents <repository> --trace`, saw successful script execution and reasonable-looking output.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T11823
Differential Revision: https://secure.phabricator.com/D18615
Summary:
Fixes T12087. When transitioning into a clustered configuration for the first time, the documentation recommends using a one-device cluster as a transitional step.
However, installs may not do this for whatever reason, and we aren't as clear as we could be in warning about clusterizing directly into a multi-device cluster.
Roughly, when you do this, we end up believing that working copies exist on several different devices, but have no information about which copy or copies are up to date. //Usually// they all were already synchronized and are all up to date, but we can't make this assumption safely without risking data.
Instead, we err on the side of caution, and require a human to tell us which copy we should consider to be up-to-date, using `bin/repository thaw --promote`.
Test Plan:
```
$ ./bin/repository clusterize rLOCKS --service repos001.phacility.net
Service "repos001.phacility.net" is actively bound to more than one device
(local002.local, local001.phacility.net).
If you clusterize a repository onto this service it will be unclear which
devices have up-to-date copies of the repository. This leader/follower
ambiguity will freeze the repository. You may need to manually promote a
device to unfreeze it. See "Ambiguous Leaders" in the documentation for
discussion.
Continue anyway? [y/N]
```
Read other changes.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12087
Differential Revision: https://secure.phabricator.com/D17169
Summary:
Ref T11522. This provides storage for tracking rewritten commits (new feature) and unreadable commits (existing feature, but really hacky).
This doesn't do anything yet, just adds a table and a CLI tool for updating it. I'll document the tool once it works. You just pipe in some JSON, but I need to document the format.
Test Plan:
- Piped JSON for "none", "rewritten" and "unreadable" hints into `bin/repository hint`.
- Examined the database to see that the table was written properly.
- Tried to pipe bad JSON in, invalid hint types, etc. Got reasonable human-readable error messages.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11522
Differential Revision: https://secure.phabricator.com/D16434
Summary:
Ref T7148. The automated export process runs this via daemon, which can't answer "Y" to this prompt. Let it "--force" instead.
(Some of my test instances didn't have any repositories, which is why I didn't catch this sooner.)
Test Plan: Ran `bin/repository move-paths --force ...`, saw change applied without a prompt.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T7148
Differential Revision: https://secure.phabricator.com/D16426
Summary:
Fixes T11309. When checking if a repository was fully imported, we incorrectly allow unreachable, un-imported commits to prevent the repository from moving to "Imported".
This can happen if you delete branches from a repository while it is importing.
Instead, ignore unreachable commits when checking for remaining imports, and when reporting status via `bin/repository importing`.
Test Plan:
- Stopped daemons.
- Created a new repository and activated it.
- Ran `bin/repository update Rxx`.
- Deleted a branch in the repository.
- Ran `bin/repository update Rxx`.
- Ran daemons to flush queue.
Now:
- Ran `bin/repository importing`. Old behavior: showed unreachable commits as importing. New behavior: does not show unreachable commits.
- Ran `bin/repository update`. Old behavior: failed to move repository to "imported" status. New behavior: correctly moves repository to "imported" status.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T11309
Differential Revision: https://secure.phabricator.com/D16269
Summary:
Ref T9028. This corrects the reachability of existing commits in a repository.
In particular, it can be used to mark deleted commits as unreachable.
Test Plan:
- Ran it on a bad repository, with bad args, etc.
- Ran it on a clean repo, got no changes.
- Marked a reachable commit as unreachable, ran script, got it marked reachable.
- Started deleting tags and branches from the local working copy while running the script, saw greater parts of the repository get marked unreachable.
- Pulled repository again, everything automatically revived.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T9028
Differential Revision: https://secure.phabricator.com/D16132
Summary:
Ref T4292. Currently, we hold one big lock around the whole `bin/repository update` workflow.
When running multiple daemons on different hosts, this lock can end up being contentious. In particular, we'll hold it during `git fetch` on every host globally, even though it's only useful to hold it locally per-device (that is, it's fine/good/expected if `repo001` and `repo002` happen to be fetching from a repository they are observing at the same time).
Instead, split it into two locks:
- One lock is scoped to the current device, and held during pull (usually `git fetch`). This just keeps multiple daemons accidentally running on the same host from making a mess when trying to initialize or update a working copy.
- One lock is scoped globally, and held during discovery. This makes sure daemons on different hosts don't step on each other when updating the database.
If we fail to acquire either lock, assume some other process is legitimately doing the work and bail more quietly instead of fataling. In approximately 100% of cases where users have hit this lock contention, that was the case: some other daemon was running somewhere doing the work and the error didn't actually represent an issue.
If there's an actual problem, we still raise a diagnostically useful message if you run `bin/repository update` manually, so there are still tools to figure out that something is hung or whatever.
Test Plan:
- Ran `bin/repository update`, `pull`, `discover`.
- Added `sleep(5)`, forced processes to contend, got lock exceptions and graceful exit with diagnostic message.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15903
Summary:
Fixes T10940. Two issues currently:
First, `PullLocal` deamon refuses to update non-cluster repositories on cluster devices. However, this is surprising/confusing/bad because as soon as you enroll a repository host in the cluster, most of the repositories on it stop working until you `clusterize` them. This is especially confusing because the documentation gives you a very nice, gradual walkthrough about going through things slowly and being able to check your work at every step, but we really drop you off a bit of a cliff here. The workflow implied by the documentation is a desirable one.
This operation is generally only unsafe/problematic if the daemon would be creating a //new// working copy. If a working copy already exists, we can reasonably guess that it's almost certainly because you've enrolled a previously un-clustered host into a new cluster. This allows the nice, gradual workflow the documentation describes to proceed as expected, without any weird surprises.
Instead of refusing to update these repositories, only refuse to update them if updating would create a new working copy. This should make transitioning much smoother without any meaningful reduction in safety.
Second, the lower-level `bin/repository update`, `refs`, `mirror`, etc., commands don't apply this same check. However, these commands are potentially just as dangerous. Use the same code to do a similar check there, making sure we only operate on repositories that are either expected to be on the current device, or which already exist here.
Test Plan:
- Ran `bin/phd debug pull`, saw diagnostic information choose to update most repositories (including some non-cluster repositories) but properly skip non-cluster repositories that do not exist locally.
- Ran `bin/repository update`, etc., saw the command apply consistent rules to the rules applied by `PullLocal` and refuse to update non-local repositories it would need to create.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10940
Differential Revision: https://secure.phabricator.com/D15902
Summary:
Ref T10748. This needs more extensive testing and is sure to have some rough edges, but seems to basically work so far.
Throwing this up so I can work through it more deliberately and make notes.
Test Plan:
- Ran migration.
- Used `bin/repository list` to list existing repositories.
- Used `bin/repository update <repository>` to update various repositories.
- Updated a migrated, hosted Git repository.
- Updated a migrated, observed Git repository.
- Converted an observed repository into a hosted repository by toggling the I/O mode of the URI.
- Conveted a hosted repository into an observed repository by toggling it back.
- Created and activated a new empty hosted Git repository.
- Created and activated an observed Git repository.
- Updated a mirrored repository.
- Cloned and pushed over HTTP.
- Tried to HTTP push a read-only repository.
- Cloned and pushed over SSH.
- Tried to SSH push a read-only repository.
- Updated several Mercurial repositories.
- Updated several Subversion repositories.
- Created and edited repositories via the API.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10748
Differential Revision: https://secure.phabricator.com/D15842
Summary:
Ref T10748. This migrates and swaps mirroring to `PhabricatorRepositoryURI`, obsoleting `PhabricatorRepositoryMirror`.
This prevents you from editing, adding or disabling mirrors unless you know a secret URI (until the UI cuts over fully), but existing mirroring is not affected.
Test Plan:
- Added a mirroring URI to an old repository.
- Verified it worked with `bin/repository mirror`.
- Migrated forward.
- Verified it still worked with `bin/repository mirror`.
- Wow, mirroring: https://github.com/epriestley/locktopia-mirror
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10748
Differential Revision: https://secure.phabricator.com/D15841
Summary:
Ref T10748. In D14250#158181, I accepted this conditional on removing it once Conduit could handle it.
Conduit can now handle it, or at least will be able to as soon as T10748 cuts over.
Test Plan: Grepped for `repository edit`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10748
Differential Revision: https://secure.phabricator.com/D15839
Summary:
Ref T4039. Long ago these were more freely editable and there were some security concerns around creating a repository, then setting its local path to point somewhere it shouldn't.
Local paths are no longer editable so there's no real reason we need to provide a uniqueness guarantee anymore, but you could still make a mistake with `bin/repository move-paths` by accident, and it's a little cleaner to pull them out into their own column with a key.
(We still don't -- and, largely can't -- guarantee that two paths aren't //equivalent// since one might be symlinked to the other, or symlinked only on some hosts, or whatever, but the primary value here is as a sanity check that you aren't goofing things up and pointing a bunch of repositories at the same working copy by mistake.)
Test Plan:
- Ran migrations.
- Grepped for `local-path`.
- Listed and moved paths with `bin/repository`.
- Created a new repository, verified its local path populated correctly.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4039
Differential Revision: https://secure.phabricator.com/D15837
Summary: Ref T4292. This provides at least some sort of hint about how to set up cluster repositories.
Test Plan:
- Read documentation.
- Ran `bin/repository clusterize` to add + remove clusters.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15798
Summary: This gets over-escaped instead of bolded right now, but I only ever hit it when exporting/importing and never both cleaning it up.
Test Plan: Ran `bin/repository move-paths`, saw bolded "Move" instead of ANSI escape sequences.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D15797
Summary:
Ref T4292. When the daemons make a query for repository information, we need to make sure the working copy on disk is up to date before we serve the response, since we might not have the inforamtion we need to respond otherwise.
We do this automatically for almost all Diffusion methods, but this particular method is a little unusual and does not get this check for free. Add this check.
Test Plan:
- Made this code throw.
- Ran `bin/repository reparse --message ...`, saw the code get hit.
- Ran `bin/repository lookup-user ...`, saw this code get hit.
- Made this code not throw.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15783
Summary:
Ref T10751. Add support tooling for manually prying your way out of trouble if disaster strikes.
Refine documentation, try to refer to devices as "devices" more consistently instead of sometimes calling them "nodes".
Test Plan: Promoted and demoted repository devices with `bin/repository thaw`.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10751
Differential Revision: https://secure.phabricator.com/D15768
Summary:
Ref T10537. For Nuance, I want to introduce new sources (like "GitHub" or "GitHub via Nuance" or something) but this needs to modularize eventually.
Split ContentSource apart so applications can add new content sources.
Test Plan:
This change has huge surface area, so I'll hold it until post-release. I think it's fairly safe (and if it does break anything, the breaks should be fatals, not anything subtle or difficult to fix), there's just no reason not to hold it for a few hours.
- Viewed new module page.
- Grepped for all removed functions/constants.
- Viewed some transactions.
- Hovered over timestamps to get content source details.
- Added a comment via Conduit.
- Added a comment via web.
- Ran `bin/storage upgrade --namespace XXXXX --no-quickstart -f` to re-run all historic migrations.
- Generated some objects with `bin/lipsum`.
- Ran a bulk job on some tasks.
- Ran unit tests.
{F1190182}
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10537
Differential Revision: https://secure.phabricator.com/D15521
Summary:
Ref T4245. Prepare these scripts for a callsign-free world. This also makes them more flexible and easier to use.
The following are now valid ways to identify a repository for these scripts: ID (`3`), PHID (`PHID-REPO-wxyz`), R<ID> (`R3`), r<CALLSIGN> (`rSKYNET`), CALLSIGN (`SKYNET`).
In the future, a human-readable label (`skynet`) may also become valid.
Test Plan:
- Ran `bin/repository reparse --all ...` with `rX`, `X`, `3`, `R3`.
- Ran `bin/repository reparse --change ...` with `rXaaa`, including short versions.
- Ran `bin/repository update ...` with `rX`, `X`, `3`, `R3`.
- Ran `bin/repository refs ...` with various identifiers.
- Ran `bin/repository pull ...` with various identifiers.
- Ran `bin/repository mirror ...` with various identifiers.
- Ran `bin/repository mark-imported ...` with various identifiers.
- Ran `bin/repository list`.
- Ran `bin/repository importing ...` with various identifiers and examined output.
- Ran `bin/repository edit ...` with various identifiers.
- Ran `bin/repository discover ...` with various identifiers.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4245
Differential Revision: https://secure.phabricator.com/D14924
Summary:
Exposes the serve-over-http and serve-over-ssh options for a repository
to the `bin/repository edit` endpoint.
Test Plan: Ran `bin/repository` with the new options over several hundred repos
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, chasemp, 20after4, epriestley
Differential Revision: https://secure.phabricator.com/D14250
Summary: Ref T7149. These formalize the local path adjustment step for cluster imports, rather than relying on an ad-hoc script.
Test Plan: Used `list-paths` and `move-paths` to list and move repositories.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7149
Differential Revision: https://secure.phabricator.com/D13621
Summary: Fixes T6839. Sometimes, worker tasks go astray for whatever reason. This automates the step of `bin/repository importing | xargs | mangle mangle | bin/repostiory reparse`.
Test Plan: Ran various flavors of the command, got good looking results.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6839
Differential Revision: https://secure.phabricator.com/D13362
Summary: These format strings use `%d` instead of `%s`.
Test Plan: Eyeball it.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Differential Revision: https://secure.phabricator.com/D12996
Summary:
Fixes T6958. Ref T7484.
- When we collide on a lock in `bin/repository update`, explain what that means.
- GlobalLock currently uses a "lock name" which is different from the lock's actual name. Don't do this. There's a small chance this fixes T7484, but I don't have high hopes.
Test Plan: Ran `bin/repository update X` in two windows really fast, got the new message in one of them.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T6958, T7484
Differential Revision: https://secure.phabricator.com/D12574
Summary:
Fixes T7484. There's a bunch of spooky mystery here but the current behavior can probably cause problems in at least some situations.
Also moves a couple callsigns to monograms (see T4245).
Test Plan:
- Faked a short lock length to hit the exception.
- Updated normally.
- Grepped for other use sites, none seemed suspicious or likely to overflow the lock length.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7484
Differential Revision: https://secure.phabricator.com/D12263
Summary: Fixes T7484. If the lock failed, we'd still try to unlock it, which is incorrect.
Test Plan: Ran two `bin/repository update X` in different windows, got proper LockException instead of indirect symptomatic "not locked by this process" exception.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7484
Differential Revision: https://secure.phabricator.com/D12253
Summary:
Fixes T7310. We have a whole mechanism for surfacing update errors, but only surface actual update errors, not pull errors.
Instead, surface pull errors too.
Then format them a little more nicely.
Test Plan: {F309769}
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T7310
Differential Revision: https://secure.phabricator.com/D11821
Summary: Ref T6822.
Test Plan: `grep`. This method is only called from within `PhutilArgumentWorkflow::__construct`.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Maniphest Tasks: T6822
Differential Revision: https://secure.phabricator.com/D11415
Summary:
Fixes T5966. Accomplishes a few things
- see title
- adds a force-autoclose flag and the plumbing for it
- removes references to some HarborMaster thing that used to key off commits and seems long dead, but forgotten :/
Test Plan:
ran a few commands. These first three had great success:
`./repository reparse --all FIRSTREPO --message --change --herald --owners`
`./repository reparse --all FIRSTREPO --message --change --herald --owners --min-date yesterday`
`./repository reparse --all FIRSTREPO --message --change --herald --owners --min-date yesterday --force-autoclose`
...and these next two showed me some errors as expected:
`./repository reparse --all FIRSTREPO --message --change --herald --owners --min-date garbagedata`
`./repository reparse --all GARBAGEREPO --message --change --herald --owners`
Also, made a diff in a repository with autoclose disabled and commited the diff. Later, reparse the diff with force-autoclose. Verified the diff closed and that the reason "why" had the proper message text.
Reviewers: epriestley
Reviewed By: epriestley
Subscribers: joshuaspence, epriestley, Korvin
Maniphest Tasks: T5966
Differential Revision: https://secure.phabricator.com/D10492
Summary:
Ref T2783.
This updates PhabricatorRepositoryManagementLookupUsersWorkflow to use ConduitCall to retrieve information about the commit.
Test Plan:
Ran `bin/repository lookup-users rTESTe9683b64d3283f0b2d355fdbf231bc918b5ac0ab --trace` and saw the information returned (by making a request to `diffusion.querycommits` as the omnipotent user, signed with the device key).
Mucked with `cluster.addresses` and saw requests rejected.
Reviewers: hach-que, btrahan
Reviewed By: btrahan
Subscribers: Krenair, epriestley, Korvin
Maniphest Tasks: T2783
Differential Revision: https://secure.phabricator.com/D10403
Summary:
Fixes T5839. If a repository has been force pushed and garbage collected, we might have a ref cursor in the database which still points at the old commit (which no longer exists).
We'll then run a command like `git log <new hash> --not <old hash>` to figure out which commits are newly pushed, and this will bomb out because `<old hash>` is invalid.
Instead, validate all the `<old hash>` values before we try to make use of them.
Test Plan:
- Forced a repository into a bad state by mucking with the datbase, generating a reproducible failure similar to the one in T5839.
- Applied patch.
- `bin/repository update <callsign> --trace` filtered the bad commit and put the repository into the right state.
- Saw new commits recognized correctly.
- Ran `bin/repository update <callsign>` for a Mercurial and SVN repo as a sanity check.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T5839
Differential Revision: https://secure.phabricator.com/D10226
Summary: Ran `arc lint --apply-patches --everything` over rP, mainly to change double quotes to single quotes where appropriate. These changes also validate that the `ArcanistXHPASTLinter::LINT_DOUBLE_QUOTE` rule is working as expected.
Test Plan: Eyeballed it.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin, hach-que
Differential Revision: https://secure.phabricator.com/D9431
Summary: Fixes T5255. Currently the `./bin/repository parents` workflow is quite slow. Batching up the SQL operations should make the workflow //seem// much faster.
Test Plan: Not yet tested.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T5255
Differential Revision: https://secure.phabricator.com/D9361
Summary: Currently, repositories can be deleted using `./bin/repository delete`. It makes sense to expose this operate to the `./bin/remove` script as well, for consistency.
Test Plan: Deleted a repository with `./bin/remove rTEST`.
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: epriestley, Korvin
Differential Revision: https://secure.phabricator.com/D9350
Summary: Fixes T5195. Currently, the `./bin/repository parents` workflow doesn't respect tracked branches and will attempt to build parents caches for all branches.
Test Plan: For at least one of our repositories, this patch fixes the `Unknown commit` exception. Unfortunately, it doesn't seem to completely solve this problem though, but I suspect that this is due to commits that were overwritten with a `git push --force` or similar.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T5195
Differential Revision: https://secure.phabricator.com/D9322
Summary:
Ref T2683. This is a refinement and simplification of D5257. In particular:
- D5257 only cached the commit chain, not path changes. This meant that we had to go issue an awkward query (which was slow on Facebook's install) periodically while reading the cache. This was reasonable locally but killed performance at FB scale. Instead, we can include path information in the cache. It is very rare that this is large except in Subversion, and we do not need to use this cache in Subversion. In other VCSes, the scale of this data is quite small (a handful of bytes per commit on average).
- D5257 required a large, slow offline computation step. This relies on D9044 to populate parent data so we can build the cache online at will, and let it expire with normal LRU/LFU/whatever semantics. We need this parent data for other reasons anyway.
- D5257 separated graph chunks per-repository. This change assumes we'll be able to pull stuff from APC most of the time and that the cost of switching chunks is not very large, so we can just build one chunk cache across all repositories. This allows the cache to be simpler.
- D5257 needed an offline cache, and used a unique cache structure. Since this one can be built online it can mostly use normal cache code.
- This also supports online appends to the cache.
- Finally, this has a timeout to guarantee a ceiling on the worst case: the worst case is something like a query for a file that has never existed, in a repository which receives exactly 1 commit every time other repositories receive 4095 commits, on a cold cache. If we hit cases like this we can bail after warming the cache up a bit and fall back to asking the VCS for an answer.
This cache isn't perfect, but I believe it will give us substantial gains in the average case. It can often satisfy "average-looking" queries in 4-8ms, and pathological-ish queries in 20ms on my machine; `hg` usually can't even start up in less than 100ms. The major thing that's attractive about this approach is that it does not require anything external or complicated, and will "just work", even producing reasonble improvements for users without APC.
In followups, I'll modify queries to use this cache and see if it holds up in more realistic workloads.
Test Plan:
- Used `bin/repository cache` to examine the behavior of this cache.
- Did some profiling/testing from the web UI using `debug.php`.
- This //appears// to provide a reasonable fast way to issue this query very quickly in the average case, without the various issues that plagued D5257.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, jhurwitz
Maniphest Tasks: T2683
Differential Revision: https://secure.phabricator.com/D9045
Summary:
Ref T4455. This adds a `repository_parents` table which stores `<childCommitID, parentCommitID>` relationships.
For new commits, it is populated when commits are discovered.
For older commits, there's a `bin/repository parents` script to rebuild the data.
Right now, there's no UI suggestion that you should run the script. I haven't come up with a super clean way to do this, and this table will only improve performance for now, so it's not important that we get everyone to run the script right away. I'm just leaving it for the moment, and we can figure out how to tell admins to run it later.
The ultimate goal is to solve T2683, but solving T4455 gets us some stuff anyway (for example, we can serve `diffusion.commitparentsquery` faster out of this cache).
Test Plan:
- Used `bin/repository discover` to discover new commits in Git, SVN and Mercurial repositories.
- Used `bin/repository parents` to rebuild Git and Mercurial repositories (SVN repos just exit with a message).
- Verified that the table appears to be sensible.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: jhurwitz, epriestley
Maniphest Tasks: T4455
Differential Revision: https://secure.phabricator.com/D9044
Summary:
Ref T4605. Currently, the PullLocal daemon is responsible for two relatively distinct things:
- scheduling repository updates; and
- actually updating repositories.
Move the "actually updating" part into a new `bin/repository update` command, which basically runs the pull, discover, refs and mirror commands. This will let the parent process focus on scheduling in a more understandable way and update multiple repositories at once. It also makes it easier to debug and understand update behavior since the non-scheduling pipeline can be run separately.
Test Plan:
- Ran `update --trace` on SVN, Mercurial and Git repos.
- Ran PullLocal daemon for a while without issues.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T4605
Differential Revision: https://secure.phabricator.com/D8780
Summary:
Ref T4338. Currently, there's no diagnostic command to execute mirroring (so I can't give users an easy command to run), and it's roughly the last piece of real logic left in the PullLocal daemon.
Separate mirroring out, and provide `bin/repository mirror`.
Test Plan:
- Ran `bin/repository mirror` to mirror a repository.
- Ran PullLocalDaemon and verified it also continued mirroring normally.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4338
Differential Revision: https://secure.phabricator.com/D8066
Summary: After adding the CLOSEABLE flag, `bin/repository importing` reports CLOSEABLE commits with no information. Just don't report these, for consistency with the old behavior. Adding flags to show them might be nice at some point if we run into issues where that output would be useful.
Test Plan: Ran `bin/repositroy importing` on a repository with CLOSEABLE commits.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D8024
Summary:
Ref T4327. This moves the last pieces of discovery responsibility out of the PullLocal daemon and into the DiscoveryEngine.
(This makes it easier to discover repositories in unit tests in the future, since we don't need to build a PullLocal daemon and can just build a DiscoveryEngine.)
Test Plan: Ran `phd debug pulllocal`. Ran `repostory discover`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4327
Differential Revision: https://secure.phabricator.com/D7997
Summary:
Ref T4327. I want to make change parsing testable; one thing which is blocking this is that the Git discovery process is still part of `PullLocal` daemon instead of being part of `DiscoveryEngine`. The unit test stuff which I want to use for change parsing relies on `DiscoveryEngine` to discover repositories during unit tests.
The major reason git discovery isn't part of `DiscoveryEngine` is that it relies on the messy "autoclose" logic, which we never implemented for Mercurial. Generally, I don't like how autoclose was implemented: it's complicated and gross and too hard to figure out and extend.
Instead, I want to do something more similar to what we do for pushes, which is cleaner overall. Basically this means remembering the old branch heads from the last time we parsed a repository, and figuring out what's new by comparing the old and new branch heads. This should give us several advantages:
- It should be simpler to understand than the autoclose stuff, which is pretty mind-numbing, at least for me.
- It will let us satisfy branch and tag queries cheaply (from the database) instead of having to go to the repository. We could also satisfy some ref-resolve queries from the database.
- It should be easier to extend to Mercurial.
This implements the basics -- pretty much a table to store the cursors, which we update only for Git for now.
Test Plan:
- Ran migration.
- Ran `bin/repository discover X --trace --verbose` on various repositories with branches and tags, before and after modifying pushes.
- Pushed commits to a git repo.
- Looked at database tables.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4327
Differential Revision: https://secure.phabricator.com/D7982