@title Diffusion User Guide @group userguide Guide to Diffusion, the Phabricator repository browser. = Overview = Diffusion is a repository browser which allows you to explore source code in a Git or SVN repository, similar to software like Trac and GitWeb. Diffusion provides a very high-performance SVN browser and a moderately high-performance Git browser. It achieves performance by denormalizing large amounts of data about repository history into a database and using this information like a cache so it can avoid querying the repository directly. This data is generated by daemons which track repositories, discover new commits, and parse and import them. Diffusion is integrated with the other tools in the Phabricator suite. For instance: - when you commit Differential revisions to a tracked repository, they are automatically updated and linked to the corresponding commits; - you can add Herald rules to notify you about commits that match certain rules; - the Owners tool uses Diffusion to map repositories; and - in all the tools, commit names are automatically linked. = Repository Callsigns and Commit Names = Each repository is identified by a "callsign", which is a short uppercase string like "P" (for Phabricator) or "ARC" (for Arcanist). Each repository must have a unique callsign. Callsigns must be unique within an install but do not need to be globally unique, so you are free to use the single-letter callsigns for brevity. For example, Facebook uses "E" for the Engineering repository, "O" for the Ops repository, "Y" for a Yum package repository, and so on, while Phabricator uses "P", "ARC", "PHU" for libphutil, and "J" for Javelin. Keeping callsigns brief will make them easier to use, and the use of one-character callsigns is recommended if they are reasonably evocative and you have no more than 26 tracked repositories. The primary goal of callsigns is to namespace commits to SVN repositories: if you use multiple SVN repositories, each repository has a revision 1, revision 2, etc., so referring to them by number alone is ambiguous. However, even for Git they impart additional information to human readers and allow parsers to detect that something is a commit name with high probability. Diffusion uses this callsign and information about the commit itself to generate a commit name, like "rE12345" or "rP28146171ce1278f2375e3646a1e1ea3fd56fc5a3". The "r" stands for "revision". It is followed by the repository callsign, and then a VCS-specific commit identifier (for SVN, the commit number; for Git, the commit hash). When writing the name of a Git commit you may abbreviate the hash, but note that hash collisions are probable for short prefix lengths. See this post on the LKML for a historical explanation of Git's occasional internal use of 7-character hashes: https://lkml.org/lkml/2010/10/28/287 Because 7-character hashes are likely to collide for even moderately large repositories, Diffusion generally uses either a 16-character prefix (which makes collisions very unlikely) or the full 40-character SHA1 (which makes collisions astronomically unlikely). = Adding Repositories = Repository administration is accomplished through the "Repository" tool, which is primarily a set of administrative interfaces for Diffusion. To add a repository to Diffusion, you need to: - create a new repository in the Repository tool; and - start the daemons that will track and import the repository. To create a new repository (or edit or delete an existing repository), **you must be an administrator** (see @{article:Configuring Accounts and Registration} for instructions on making an existing account an administrator account). As an administrator, go to the Repository tool and you'll have the options to create or edit repositories. When you create a new repository, you need to specify a human-readable name, a permanent "Callsign" (see previous section), and the underlying VCS type. Once you have created a repository, you can go to the "Tracking" tab and set up tracking in Diffusion. Most of the options in the **Tracking** tab should be self-explanatory or are safe to leave at their defaults. In broad strokes, Diffusion tracks SVN repositories by issuing an "svn log" command periodically against the remote to look for new commits. It tracks Git repositories by cloning a local copy and issuing "git fetch" periodically. Once you've configured everything (and made sure **Tracking** is set to "Enabled"), you can launch the daemons to begin actually tracking the repository. = Running Diffusion Daemons = For an introduction to Phabricator daemons, see @{article:Managing Daemons with phd}. To actually track repositories, you need to: - run ##phd repository-launch-master## on one machine; - run at least one @{class:PhabricatorTaskmasterDaemon} with ##phd launch taskmaster##. You should probably launch a few of these somewhere. They are generic workers which run many different kinds of background tasks, so if you already have some running you don't need to launch more. However, if you are importing a very large repository, import rate will primarily be a function of how many taskmasters you are running so you may want to launch a bunch of them; and - if you have multiple web frontends and have tracked Git repositories, run ##phd repository-launch-readonly## on each web frontend. You can use the Daemon Console to monitor the daemons and their progress importing the repository. Small repositories should import quickly, while larger repositories may take some time (it takes about 10 minutes to begin discovering commits in Facebook's 350,000-commit primary repository, and about 18 hours to import it all with 64 taskmasters on modern hardware). Commits should begin appearing in Diffusion within a few minutes for all but the largest repositories. In detail, Diffusion uses several daemons to track, parse and import repositories: - **PhabricatorRepositoryGitFetchDaemon**: periodically runs "git fetch" to keep git repositories up to date - **PhabricatorRepositoryGitCommitDiscoveryDaemon**: periodically looks for new commits and imports them - **PhabricatorRepositorySvnCommitDiscoveryDaemon**: periodically runs "svn log" to look for new commits and import them - **PhabricatorRepositoryCommitTaskDaemon**: creates tasks to parse and import newly discovered commits The ##repository-launch-master## command just chooses the right daemons to launch based on which repositories you've configured to be tracked. If you add new repositories in the future, you should stop all the daemons and rerun ##repository-launch-master##. If you run Phabricator with multiple web frontends, have your deployment script do a ##phd stop## and ##phd repository-launch-readonly## when it deploys. It is very unlikely you are impacted by this unless you are one of the largest installs in the world. = Building New Parsers = You can add new classes which will extend or enhance Diffusion's ability to parse commit messages. TODO: This is an advanced feature which doesn't currently have documentation and isn't terribly stable.