1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-11 08:06:13 +01:00
Commit graph

71 commits

Author SHA1 Message Date
Joshua Spence
2f668493a0 Don't attempt to discover parents commits for untracked branchs.
Summary: Fixes T5195. Currently, the `./bin/repository parents` workflow doesn't respect tracked branches and will attempt to build parents caches for all branches.

Test Plan: For at least one of our repositories, this patch fixes the `Unknown commit` exception. Unfortunately, it doesn't seem to completely solve this problem though, but I suspect that this is due to commits that were overwritten with a `git push --force` or similar.

Reviewers: #blessed_reviewers, epriestley

Reviewed By: #blessed_reviewers, epriestley

Subscribers: epriestley, Korvin

Maniphest Tasks: T5195

Differential Revision: https://secure.phabricator.com/D9322
2014-05-29 12:02:37 -07:00
epriestley
e4ea092f60 Implement a chunked, APC-backed graph cache
Summary:
Ref T2683. This is a refinement and simplification of D5257. In particular:

  - D5257 only cached the commit chain, not path changes. This meant that we had to go issue an awkward query (which was slow on Facebook's install) periodically while reading the cache. This was reasonable locally but killed performance at FB scale. Instead, we can include path information in the cache. It is very rare that this is large except in Subversion, and we do not need to use this cache in Subversion. In other VCSes, the scale of this data is quite small (a handful of bytes per commit on average).
  - D5257 required a large, slow offline computation step. This relies on D9044 to populate parent data so we can build the cache online at will, and let it expire with normal LRU/LFU/whatever semantics. We need this parent data for other reasons anyway.
  - D5257 separated graph chunks per-repository. This change assumes we'll be able to pull stuff from APC most of the time and that the cost of switching chunks is not very large, so we can just build one chunk cache across all repositories. This allows the cache to be simpler.
  - D5257 needed an offline cache, and used a unique cache structure. Since this one can be built online it can mostly use normal cache code.
  - This also supports online appends to the cache.
  - Finally, this has a timeout to guarantee a ceiling on the worst case: the worst case is something like a query for a file that has never existed, in a repository which receives exactly 1 commit every time other repositories receive 4095 commits, on a cold cache. If we hit cases like this we can bail after warming the cache up a bit and fall back to asking the VCS for an answer.

This cache isn't perfect, but I believe it will give us substantial gains in the average case. It can often satisfy "average-looking" queries in 4-8ms, and pathological-ish queries in 20ms on my machine; `hg` usually can't even start up in less than 100ms. The major thing that's attractive about this approach is that it does not require anything external or complicated, and will "just work", even producing reasonble improvements for users without APC.

In followups, I'll modify queries to use this cache and see if it holds up in more realistic workloads.

Test Plan:
  - Used `bin/repository cache` to examine the behavior of this cache.
  - Did some profiling/testing from the web UI using `debug.php`.
  - This //appears// to provide a reasonable fast way to issue this query very quickly in the average case, without the various issues that plagued D5257.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley, jhurwitz

Maniphest Tasks: T2683

Differential Revision: https://secure.phabricator.com/D9045
2014-05-12 11:47:23 -07:00
epriestley
95eab2f3b0 Record parent relationships when discovering commits
Summary:
Ref T4455. This adds a `repository_parents` table which stores `<childCommitID, parentCommitID>` relationships.

For new commits, it is populated when commits are discovered.

For older commits, there's a `bin/repository parents` script to rebuild the data.

Right now, there's no UI suggestion that you should run the script. I haven't come up with a super clean way to do this, and this table will only improve performance for now, so it's not important that we get everyone to run the script right away. I'm just leaving it for the moment, and we can figure out how to tell admins to run it later.

The ultimate goal is to solve T2683, but solving T4455 gets us some stuff anyway (for example, we can serve `diffusion.commitparentsquery` faster out of this cache).

Test Plan:
  - Used `bin/repository discover` to discover new commits in Git, SVN and Mercurial repositories.
  - Used `bin/repository parents` to rebuild Git and Mercurial repositories (SVN repos just exit with a message).
  - Verified that the table appears to be sensible.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: jhurwitz, epriestley

Maniphest Tasks: T4455

Differential Revision: https://secure.phabricator.com/D9044
2014-05-12 11:47:22 -07:00
epriestley
118c696f72 Separate repository updates from the pull daemon
Summary:
Ref T4605. Currently, the PullLocal daemon is responsible for two relatively distinct things:

  - scheduling repository updates; and
  - actually updating repositories.

Move the "actually updating" part into a new `bin/repository update` command, which basically runs the pull, discover, refs and mirror commands. This will let the parent process focus on scheduling in a more understandable way and update multiple repositories at once. It also makes it easier to debug and understand update behavior since the non-scheduling pipeline can be run separately.

Test Plan:
  - Ran `update --trace` on SVN, Mercurial and Git repos.
  - Ran PullLocal daemon for a while without issues.

Reviewers: btrahan

Reviewed By: btrahan

Subscribers: epriestley

Maniphest Tasks: T4605

Differential Revision: https://secure.phabricator.com/D8780
2014-04-16 13:00:29 -07:00
epriestley
dd944f7d83 Separate repository mirroring into an Engine and provide bin/repository mirror
Summary:
Ref T4338. Currently, there's no diagnostic command to execute mirroring (so I can't give users an easy command to run), and it's roughly the last piece of real logic left in the PullLocal daemon.

Separate mirroring out, and provide `bin/repository mirror`.

Test Plan:
  - Ran `bin/repository mirror` to mirror a repository.
  - Ran PullLocalDaemon and verified it also continued mirroring normally.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4338

Differential Revision: https://secure.phabricator.com/D8066
2014-01-25 14:01:58 -08:00
epriestley
be59578794 Fix bin/repository importing for CLOSEABLE flag
Summary: After adding the CLOSEABLE flag, `bin/repository importing` reports CLOSEABLE commits with no information. Just don't report these, for consistency with the old behavior. Adding flags to show them might be nice at some point if we run into issues where that output would be useful.

Test Plan: Ran `bin/repositroy importing` on a repository with CLOSEABLE commits.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Differential Revision: https://secure.phabricator.com/D8024
2014-01-21 14:09:51 -08:00
epriestley
8520d9e070 Move more discovery responsibilities into DiscoveryEngine
Summary:
Ref T4327. This moves the last pieces of discovery responsibility out of the PullLocal daemon and into the DiscoveryEngine.

(This makes it easier to discover repositories in unit tests in the future, since we don't need to build a PullLocal daemon and can just build a DiscoveryEngine.)

Test Plan: Ran `phd debug pulllocal`. Ran `repostory discover`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4327

Differential Revision: https://secure.phabricator.com/D7997
2014-01-17 16:09:24 -08:00
epriestley
f4b9efe256 Introduce ref cursors for repository parsing
Summary:
Ref T4327. I want to make change parsing testable; one thing which is blocking this is that the Git discovery process is still part of `PullLocal` daemon instead of being part of `DiscoveryEngine`. The unit test stuff which I want to use for change parsing relies on `DiscoveryEngine` to discover repositories during unit tests.

The major reason git discovery isn't part of `DiscoveryEngine` is that it relies on the messy "autoclose" logic, which we never implemented for Mercurial. Generally, I don't like how autoclose was implemented: it's complicated and gross and too hard to figure out and extend.

Instead, I want to do something more similar to what we do for pushes, which is cleaner overall. Basically this means remembering the old branch heads from the last time we parsed a repository, and figuring out what's new by comparing the old and new branch heads. This should give us several advantages:

  - It should be simpler to understand than the autoclose stuff, which is pretty mind-numbing, at least for me.
  - It will let us satisfy branch and tag queries cheaply (from the database) instead of having to go to the repository. We could also satisfy some ref-resolve queries from the database.
  - It should be easier to extend to Mercurial.

This implements the basics -- pretty much a table to store the cursors, which we update only for Git for now.

Test Plan:
  - Ran migration.
  - Ran `bin/repository discover X --trace --verbose` on various repositories with branches and tags, before and after modifying pushes.
  - Pushed commits to a git repo.
  - Looked at database tables.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4327

Differential Revision: https://secure.phabricator.com/D7982
2014-01-17 11:48:53 -08:00
epriestley
e397103bf2 Extend all "ManagementWorkflow" classes from a base class
Summary:
Ref T2015. Not directly related to Drydock, but I've wanted to do this for a bit.

Introduce a common base class for all the workflows in the scripts in `bin/*`. This slightly reduces code duplication by moving `isExecutable()` to the base, but also provides `getViewer()`. This is a little nicer than `PhabricatorUser::getOmnipotentUser()` and gives us a layer of indirection if we ever want to introduce more general viewer mechanisms in scripts.

Test Plan: Lint; ran some of the scripts.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2015

Differential Revision: https://secure.phabricator.com/D7838
2013-12-27 13:15:40 -08:00
epriestley
d667b12206 Provide a standalone query for resolution of commit author/committer into Phabricator users
Summary:
Ref T4195. To implement the "Author" and "Committer" rules, I need to resolve author/committer strings into Phabricator users.

The code to do this is currently buried in the daemons. Extract it into a standalone query.

I also added `bin/repository lookup-users <commit>` to test this query, both to improve confidence I'm getting this right and to provide a diagnostic command for users, since there's occasionally some confusion over how author/committer strings resolve into valid users.

Test Plan:
I tested this using `bin/repository lookup-users` and `reparse.php --message` on Git, Mercurial and SVN commits. Here's the `lookup-users` output:

  >>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rINIS3
  Examining commit rINIS3...
  Raw author string: epriestley
  Phabricator user: epriestley (Evan Priestley   )
  Raw committer string: null
  >>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rPOEMS165b6c54f487c8
  Examining commit rPOEMS165b6c54f487...
  Raw author string: epriestley <git@epriestley.com>
  Phabricator user: epriestley (Evan Priestley   )
  Raw committer string: epriestley <git@epriestley.com>
  Phabricator user: epriestley (Evan Priestley   )
  >>> orbital ~/devtools/phabricator $ ./bin/repository lookup-users rINIH6d24c1aee7741e
  Examining commit rINIH6d24c1aee774...
  Raw author string: epriestley <hg@yghe.net>
  Phabricator user: epriestley (Evan Priestley   )
  Raw committer string: null
  >>> orbital ~/devtools/phabricator $

The `reparse.php` output was similar, and all VCSes resolved authors correctly.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T1731, T4195

Differential Revision: https://secure.phabricator.com/D7801
2013-12-19 11:05:17 -08:00
epriestley
f5ca647d2c Add bin/repository edit for CLI repository editing
Summary:
Ref T4039. This is mostly to deal with that, to prevent the security issues associated with mutable local paths. The next diff will lock them in the web UI.

I also added a confirmation prompt to `bin/repository delete`, which was a little scary without one.

See one comment inline about the `--as` flag. I don't love this, but when I started adding all the stuff we'd need to let this transaction show up as "Administrator" it quickly got pretty big.

Test Plan: Ran `bin/repository edit ...`, saw an edit with a transaction show up on the web UI.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4039

Differential Revision: https://secure.phabricator.com/D7579
2013-11-13 11:26:05 -08:00
epriestley
bd29784a32 Add an administrative bin/repository importing command to list importing commits
Summary: Ref T4068. Adds a command to list all commits in an "importing" status. This will allow users to use `reparse.php` to diagnose and repair issues.

Test Plan:
  - Ran `bin/repository importing P`, etc.
  - Used `reparse.php` to reparse some commit stages and saw status update correctly.
  - Ran on a repo with no importing commits.
  - Ran with `... --simple | xargs`, which saves us having to put an `awk` or something in there for users.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4068

Differential Revision: https://secure.phabricator.com/D7515
2013-11-06 11:26:41 -08:00
epriestley
e3a5ab1f8c Add an administrative bin/repository mark-imported command
Summary:
Ref T4068. In some cases like that one, I anticipate a repository not fully importing when a handful of random commits are broken. In the long run we should just deal with that properly, but in the meantime provide an administrative escape hatch so you can mark the repository as imported and get it running normally.

The major reason to do this is that Herald, Feed, Harbormaster, etc., won't activate until a repository is "imported".

Test Plan:
  - Tried to mark an imported repository as imported, got an "already imported" message.
  - Same for not-imported.
  - Marked a repository not-imported.
  - Marked a repository imported.
  - Marked a repository not-imported, then waited for the daemons to mark it imported again automatically.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran, kbrownlees

Maniphest Tasks: T4068

Differential Revision: https://secure.phabricator.com/D7514
2013-11-06 11:26:24 -08:00
epriestley
79abe6653e Remove PhabricatorRepository::loadAllByPHIDOrCallsign()
Summary: Ref T603. Move to real Query classes.

Test Plan:
  - Ran `phd debug pull X` (where `X` does not match a repository).
  - Ran `phd debug pull Y` (where `Y` does match a repository).
  - Ran `phd debug pull`.
  - Ran `repository pull`.
  - Ran `repository pull X`.
  - Ran `repository pull Y`.
  - Ran `repository discover`.
  - Ran `repository delete`.
  - Ran `grep`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T603

Differential Revision: https://secure.phabricator.com/D7137
2013-09-26 12:36:24 -07:00
epriestley
c467cc464f Make most repository reads policy-aware
Summary: Ref T603. This swaps almost all queries against the repository table over to be policy aware.

Test Plan:
  - Made an audit comment on a commit.
  - Ran `save_lint.php`.
  - Looked up a commit with `diffusion.getcommits`.
  - Looked up lint messages with `diffusion.getlintmessages`.
  - Clicked an external/submodule in Diffusion.
  - Viewed main lint and repository lint in Diffusion.
  - Completed and validated Owners paths in Owners.
  - Executed dry runs via Herald.
  - Queried for package owners with `owners.query`.
  - Viewed Owners package.
  - Edited Owners package.
  - Viewed Owners package list.
  - Executed `repository.query`.
  - Viewed "Repository" tool repository list.
  - Edited Arcanist project.
  - Hit "Delete" on repository (this just tells you to use the CLI).
  - Created a repository.
  - Edited a repository.
  - Ran `bin/repository list`.
  - Ran `bin/search index rGTESTff45d13dffcfb3ea85b03aac8cc36251cacdf01c`
  - Pushed and parsed a commit.
  - Skipped all the Drydock stuff, as it it's hard to test and isn't normally reachable.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T603

Differential Revision: https://secure.phabricator.com/D7132
2013-09-25 16:54:48 -07:00
epriestley
e7fde9a77c Make repository discovery partially testable
Summary:
Ref T2784. Begins pulling discovery into Engines and covering it with tests. In particular:

  - Discovery is currently a one-shot process where we find all the new commits and write them to the database in one go. Split it apart so we find and return the new commits first, then write them to the database separately. This makes things simpler and more testable.
  - This diff only brings SVN into an engine (and only the "find the commits" part), since it's simpler than Git or Mercurial.
  - Creates a base Engine class and moves common functionality there.
  - Restores the `--verbose` flag to `repository pull`.

Test Plan: Added unit tests. Ran `bin/repository discover`. Ran `bin/phd debug pulllocal`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T2784

Differential Revision: https://secure.phabricator.com/D5906
2013-05-12 19:08:37 -07:00
epriestley
105dd1899c Make repository pulls testable
Summary:
Ref T2784. This moves us toward being able to test the background and Conduit pipelines for repositories. In particular:

  - Separate the logic for pulling repositories (`git pull`, `hg pull`) out of `PhabricatorRepositoryPullLocalDaemon` and put it in `PhabricatorRepositoryPullEngine`. This allows repositories to be pulled directly without invoking the daemons.
  - Add tests for the engine, including a future-looking base test case.
  - Add basic `PhutilDirectoryFixture`-based repositories.

Next steps:

  # Do the same for repo discovery.
  # Then we can start writing tests against specific Conduit methods.

Test Plan: Ran unit tests. Ran `bin/repository pull` on SVN, Hg and Git repositories. Ran `bin/phd debug pulllocal`.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran, nh

Maniphest Tasks: T2784

Differential Revision: https://secure.phabricator.com/D5904
2013-05-12 19:05:52 -07:00
vrana
ef85f49adc Delete license headers from files
Summary:
This commit doesn't change license of any file. It just makes the license implicit (inherited from LICENSE file in the root directory).

We are removing the headers for these reasons:

- It wastes space in editors, less code is visible in editor upon opening a file.
- It brings noise to diff of the first change of any file every year.
- It confuses Git file copy detection when creating small files.
- We don't have an explicit license header in other files (JS, CSS, images, documentation).
- Using license header in every file is not obligatory: http://www.apache.org/dev/apply-license.html#new.

This change is approved by Alma Chao (Lead Open Source and IP Counsel at Facebook).

Test Plan: Verified that the license survived only in LICENSE file and that it didn't modify externals.

Reviewers: epriestley, davidrecordon

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T2035

Differential Revision: https://secure.phabricator.com/D3886
2012-11-05 11:16:51 -08:00
Bob Trahan
731a6900bd upgrade repository delete function to full-blown workflow
Summary: fancy title. really just make the delete() method aware of related objects and build a quick workflow which calls delete(). also make commit delete savvy about audit requests.

Test Plan: deleted a repository per the instructions given to me in the web UI

Reviewers: epriestley

Reviewed By: epriestley

CC: aran, Korvin

Maniphest Tasks: T1416, T1958, T1372

Differential Revision: https://secure.phabricator.com/D3822
2012-10-25 16:23:41 -07:00
epriestley
ca31e3e84b Add a --repair flag to "bin/repository discover"
Summary:
If a repository is missing commits because they mysteriously vanished, there's no reasonable way to get them back right now. Provide a way to ignore the state in the database and rediscover the entire repository unconditionally.

We don't queue any reparses or anything, but when I move reparse into this script we can hook things up or something. This generally shouldn't be too important anyway.

Test Plan: Ran `repository discover --repair` on Phabricator.

Reviewers: jungejason, nh, vrana

Reviewed By: vrana

CC: aran

Differential Revision: https://secure.phabricator.com/D2850
2012-06-25 12:35:47 -07:00
epriestley
13d96e6377 Introduce "bin/repository" for repository management
Summary:
Nothing new or exciting here yet, just moving the random scripts/repositories/ things to bin/repository. Also add `repository list`.

(Console stuff comes from D2841.)

Test Plan: Ran `repository list`, `repository pull`, `repository discover`, `repository discover --verbose`, `repository help`.

Reviewers: jungejason, vrana

Reviewed By: vrana

CC: aran

Differential Revision: https://secure.phabricator.com/D2849
2012-06-25 12:35:37 -07:00