Summary: Fixes T5195. Currently, the `./bin/repository parents` workflow doesn't respect tracked branches and will attempt to build parents caches for all branches.
Test Plan: For at least one of our repositories, this patch fixes the `Unknown commit` exception. Unfortunately, it doesn't seem to completely solve this problem though, but I suspect that this is due to commits that were overwritten with a `git push --force` or similar.
Reviewers: #blessed_reviewers, epriestley
Reviewed By: #blessed_reviewers, epriestley
Subscribers: epriestley, Korvin
Maniphest Tasks: T5195
Differential Revision: https://secure.phabricator.com/D9322
Summary:
Ref T2683. This is a refinement and simplification of D5257. In particular:
- D5257 only cached the commit chain, not path changes. This meant that we had to go issue an awkward query (which was slow on Facebook's install) periodically while reading the cache. This was reasonable locally but killed performance at FB scale. Instead, we can include path information in the cache. It is very rare that this is large except in Subversion, and we do not need to use this cache in Subversion. In other VCSes, the scale of this data is quite small (a handful of bytes per commit on average).
- D5257 required a large, slow offline computation step. This relies on D9044 to populate parent data so we can build the cache online at will, and let it expire with normal LRU/LFU/whatever semantics. We need this parent data for other reasons anyway.
- D5257 separated graph chunks per-repository. This change assumes we'll be able to pull stuff from APC most of the time and that the cost of switching chunks is not very large, so we can just build one chunk cache across all repositories. This allows the cache to be simpler.
- D5257 needed an offline cache, and used a unique cache structure. Since this one can be built online it can mostly use normal cache code.
- This also supports online appends to the cache.
- Finally, this has a timeout to guarantee a ceiling on the worst case: the worst case is something like a query for a file that has never existed, in a repository which receives exactly 1 commit every time other repositories receive 4095 commits, on a cold cache. If we hit cases like this we can bail after warming the cache up a bit and fall back to asking the VCS for an answer.
This cache isn't perfect, but I believe it will give us substantial gains in the average case. It can often satisfy "average-looking" queries in 4-8ms, and pathological-ish queries in 20ms on my machine; `hg` usually can't even start up in less than 100ms. The major thing that's attractive about this approach is that it does not require anything external or complicated, and will "just work", even producing reasonble improvements for users without APC.
In followups, I'll modify queries to use this cache and see if it holds up in more realistic workloads.
Test Plan:
- Used `bin/repository cache` to examine the behavior of this cache.
- Did some profiling/testing from the web UI using `debug.php`.
- This //appears// to provide a reasonable fast way to issue this query very quickly in the average case, without the various issues that plagued D5257.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley, jhurwitz
Maniphest Tasks: T2683
Differential Revision: https://secure.phabricator.com/D9045
Summary:
Ref T4455. This adds a `repository_parents` table which stores `<childCommitID, parentCommitID>` relationships.
For new commits, it is populated when commits are discovered.
For older commits, there's a `bin/repository parents` script to rebuild the data.
Right now, there's no UI suggestion that you should run the script. I haven't come up with a super clean way to do this, and this table will only improve performance for now, so it's not important that we get everyone to run the script right away. I'm just leaving it for the moment, and we can figure out how to tell admins to run it later.
The ultimate goal is to solve T2683, but solving T4455 gets us some stuff anyway (for example, we can serve `diffusion.commitparentsquery` faster out of this cache).
Test Plan:
- Used `bin/repository discover` to discover new commits in Git, SVN and Mercurial repositories.
- Used `bin/repository parents` to rebuild Git and Mercurial repositories (SVN repos just exit with a message).
- Verified that the table appears to be sensible.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: jhurwitz, epriestley
Maniphest Tasks: T4455
Differential Revision: https://secure.phabricator.com/D9044
Summary:
Ref T4605. Currently, the PullLocal daemon is responsible for two relatively distinct things:
- scheduling repository updates; and
- actually updating repositories.
Move the "actually updating" part into a new `bin/repository update` command, which basically runs the pull, discover, refs and mirror commands. This will let the parent process focus on scheduling in a more understandable way and update multiple repositories at once. It also makes it easier to debug and understand update behavior since the non-scheduling pipeline can be run separately.
Test Plan:
- Ran `update --trace` on SVN, Mercurial and Git repos.
- Ran PullLocal daemon for a while without issues.
Reviewers: btrahan
Reviewed By: btrahan
Subscribers: epriestley
Maniphest Tasks: T4605
Differential Revision: https://secure.phabricator.com/D8780
Summary:
Ref T4338. Currently, there's no diagnostic command to execute mirroring (so I can't give users an easy command to run), and it's roughly the last piece of real logic left in the PullLocal daemon.
Separate mirroring out, and provide `bin/repository mirror`.
Test Plan:
- Ran `bin/repository mirror` to mirror a repository.
- Ran PullLocalDaemon and verified it also continued mirroring normally.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4338
Differential Revision: https://secure.phabricator.com/D8066
Summary: After adding the CLOSEABLE flag, `bin/repository importing` reports CLOSEABLE commits with no information. Just don't report these, for consistency with the old behavior. Adding flags to show them might be nice at some point if we run into issues where that output would be useful.
Test Plan: Ran `bin/repositroy importing` on a repository with CLOSEABLE commits.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Differential Revision: https://secure.phabricator.com/D8024
Summary:
Ref T4327. This moves the last pieces of discovery responsibility out of the PullLocal daemon and into the DiscoveryEngine.
(This makes it easier to discover repositories in unit tests in the future, since we don't need to build a PullLocal daemon and can just build a DiscoveryEngine.)
Test Plan: Ran `phd debug pulllocal`. Ran `repostory discover`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4327
Differential Revision: https://secure.phabricator.com/D7997
Summary:
Ref T4327. I want to make change parsing testable; one thing which is blocking this is that the Git discovery process is still part of `PullLocal` daemon instead of being part of `DiscoveryEngine`. The unit test stuff which I want to use for change parsing relies on `DiscoveryEngine` to discover repositories during unit tests.
The major reason git discovery isn't part of `DiscoveryEngine` is that it relies on the messy "autoclose" logic, which we never implemented for Mercurial. Generally, I don't like how autoclose was implemented: it's complicated and gross and too hard to figure out and extend.
Instead, I want to do something more similar to what we do for pushes, which is cleaner overall. Basically this means remembering the old branch heads from the last time we parsed a repository, and figuring out what's new by comparing the old and new branch heads. This should give us several advantages:
- It should be simpler to understand than the autoclose stuff, which is pretty mind-numbing, at least for me.
- It will let us satisfy branch and tag queries cheaply (from the database) instead of having to go to the repository. We could also satisfy some ref-resolve queries from the database.
- It should be easier to extend to Mercurial.
This implements the basics -- pretty much a table to store the cursors, which we update only for Git for now.
Test Plan:
- Ran migration.
- Ran `bin/repository discover X --trace --verbose` on various repositories with branches and tags, before and after modifying pushes.
- Pushed commits to a git repo.
- Looked at database tables.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4327
Differential Revision: https://secure.phabricator.com/D7982
Summary:
Ref T2015. Not directly related to Drydock, but I've wanted to do this for a bit.
Introduce a common base class for all the workflows in the scripts in `bin/*`. This slightly reduces code duplication by moving `isExecutable()` to the base, but also provides `getViewer()`. This is a little nicer than `PhabricatorUser::getOmnipotentUser()` and gives us a layer of indirection if we ever want to introduce more general viewer mechanisms in scripts.
Test Plan: Lint; ran some of the scripts.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2015
Differential Revision: https://secure.phabricator.com/D7838
Summary:
Ref T4195. To implement the "Author" and "Committer" rules, I need to resolve author/committer strings into Phabricator users.
The code to do this is currently buried in the daemons. Extract it into a standalone query.
I also added `bin/repository lookup-users <commit>` to test this query, both to improve confidence I'm getting this right and to provide a diagnostic command for users, since there's occasionally some confusion over how author/committer strings resolve into valid users.
Test Plan:
I tested this using `bin/repository lookup-users` and `reparse.php --message` on Git, Mercurial and SVN commits. Here's the `lookup-users` output:
>>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rINIS3
Examining commit rINIS3...
Raw author string: epriestley
Phabricator user: epriestley (Evan Priestley )
Raw committer string: null
>>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rPOEMS165b6c54f487c8
Examining commit rPOEMS165b6c54f487...
Raw author string: epriestley <git@epriestley.com>
Phabricator user: epriestley (Evan Priestley )
Raw committer string: epriestley <git@epriestley.com>
Phabricator user: epriestley (Evan Priestley )
>>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rINIH6d24c1aee7741e
Examining commit rINIH6d24c1aee774...
Raw author string: epriestley <hg@yghe.net>
Phabricator user: epriestley (Evan Priestley )
Raw committer string: null
>>> orbital ~/devtools/phabricator $
The `reparse.php` output was similar, and all VCSes resolved authors correctly.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1731, T4195
Differential Revision: https://secure.phabricator.com/D7801
Summary:
Ref T4039. This is mostly to deal with that, to prevent the security issues associated with mutable local paths. The next diff will lock them in the web UI.
I also added a confirmation prompt to `bin/repository delete`, which was a little scary without one.
See one comment inline about the `--as` flag. I don't love this, but when I started adding all the stuff we'd need to let this transaction show up as "Administrator" it quickly got pretty big.
Test Plan: Ran `bin/repository edit ...`, saw an edit with a transaction show up on the web UI.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4039
Differential Revision: https://secure.phabricator.com/D7579
Summary: Ref T4068. Adds a command to list all commits in an "importing" status. This will allow users to use `reparse.php` to diagnose and repair issues.
Test Plan:
- Ran `bin/repository importing P`, etc.
- Used `reparse.php` to reparse some commit stages and saw status update correctly.
- Ran on a repo with no importing commits.
- Ran with `... --simple | xargs`, which saves us having to put an `awk` or something in there for users.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T4068
Differential Revision: https://secure.phabricator.com/D7515
Summary:
Ref T4068. In some cases like that one, I anticipate a repository not fully importing when a handful of random commits are broken. In the long run we should just deal with that properly, but in the meantime provide an administrative escape hatch so you can mark the repository as imported and get it running normally.
The major reason to do this is that Herald, Feed, Harbormaster, etc., won't activate until a repository is "imported".
Test Plan:
- Tried to mark an imported repository as imported, got an "already imported" message.
- Same for not-imported.
- Marked a repository not-imported.
- Marked a repository imported.
- Marked a repository not-imported, then waited for the daemons to mark it imported again automatically.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, kbrownlees
Maniphest Tasks: T4068
Differential Revision: https://secure.phabricator.com/D7514
Summary: Ref T603. Move to real Query classes.
Test Plan:
- Ran `phd debug pull X` (where `X` does not match a repository).
- Ran `phd debug pull Y` (where `Y` does match a repository).
- Ran `phd debug pull`.
- Ran `repository pull`.
- Ran `repository pull X`.
- Ran `repository pull Y`.
- Ran `repository discover`.
- Ran `repository delete`.
- Ran `grep`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T603
Differential Revision: https://secure.phabricator.com/D7137
Summary: Ref T603. This swaps almost all queries against the repository table over to be policy aware.
Test Plan:
- Made an audit comment on a commit.
- Ran `save_lint.php`.
- Looked up a commit with `diffusion.getcommits`.
- Looked up lint messages with `diffusion.getlintmessages`.
- Clicked an external/submodule in Diffusion.
- Viewed main lint and repository lint in Diffusion.
- Completed and validated Owners paths in Owners.
- Executed dry runs via Herald.
- Queried for package owners with `owners.query`.
- Viewed Owners package.
- Edited Owners package.
- Viewed Owners package list.
- Executed `repository.query`.
- Viewed "Repository" tool repository list.
- Edited Arcanist project.
- Hit "Delete" on repository (this just tells you to use the CLI).
- Created a repository.
- Edited a repository.
- Ran `bin/repository list`.
- Ran `bin/search index rGTESTff45d13dffcfb3ea85b03aac8cc36251cacdf01c`
- Pushed and parsed a commit.
- Skipped all the Drydock stuff, as it it's hard to test and isn't normally reachable.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T603
Differential Revision: https://secure.phabricator.com/D7132
Summary:
Ref T2784. Begins pulling discovery into Engines and covering it with tests. In particular:
- Discovery is currently a one-shot process where we find all the new commits and write them to the database in one go. Split it apart so we find and return the new commits first, then write them to the database separately. This makes things simpler and more testable.
- This diff only brings SVN into an engine (and only the "find the commits" part), since it's simpler than Git or Mercurial.
- Creates a base Engine class and moves common functionality there.
- Restores the `--verbose` flag to `repository pull`.
Test Plan: Added unit tests. Ran `bin/repository discover`. Ran `bin/phd debug pulllocal`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T2784
Differential Revision: https://secure.phabricator.com/D5906
Summary:
Ref T2784. This moves us toward being able to test the background and Conduit pipelines for repositories. In particular:
- Separate the logic for pulling repositories (`git pull`, `hg pull`) out of `PhabricatorRepositoryPullLocalDaemon` and put it in `PhabricatorRepositoryPullEngine`. This allows repositories to be pulled directly without invoking the daemons.
- Add tests for the engine, including a future-looking base test case.
- Add basic `PhutilDirectoryFixture`-based repositories.
Next steps:
# Do the same for repo discovery.
# Then we can start writing tests against specific Conduit methods.
Test Plan: Ran unit tests. Ran `bin/repository pull` on SVN, Hg and Git repositories. Ran `bin/phd debug pulllocal`.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran, nh
Maniphest Tasks: T2784
Differential Revision: https://secure.phabricator.com/D5904
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).
We are removing the headers for these reasons:
- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.
This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).
Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.
Reviewers: epriestley, davidrecordon
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T2035
Differential Revision: https://secure.phabricator.com/D3886
Summary: fancy title. really just make the delete() method aware of related objects and build a quick workflow which calls delete(). also make commit delete savvy about audit requests.
Test Plan: deleted a repository per the instructions given to me in the web UI
Reviewers: epriestley
Reviewed By: epriestley
CC: aran, Korvin
Maniphest Tasks: T1416, T1958, T1372
Differential Revision: https://secure.phabricator.com/D3822
Summary:
If a repository is missing commits because they mysteriously vanished, there's no reasonable way to get them back right now. Provide a way to ignore the state in the database and rediscover the entire repository unconditionally.
We don't queue any reparses or anything, but when I move reparse into this script we can hook things up or something. This generally shouldn't be too important anyway.
Test Plan: Ran `repository discover --repair` on Phabricator.
Reviewers: jungejason, nh, vrana
Reviewed By: vrana
CC: aran
Differential Revision: https://secure.phabricator.com/D2850
Summary:
Nothing new or exciting here yet, just moving the random scripts/repositories/ things to bin/repository. Also add `repository list`.
(Console stuff comes from D2841.)
Test Plan: Ran `repository list`, `repository pull`, `repository discover`, `repository discover --verbose`, `repository help`.
Reviewers: jungejason, vrana
Reviewed By: vrana
CC: aran
Differential Revision: https://secure.phabricator.com/D2849