1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-23 13:08:18 +01:00

Add/update some documentation on branches and reviewable code

Summary:
  - The most common workflow complaint I've seen recently is something like "how do I use Differential with a branch full of random code that me and several other developers all commit to"? There are some okay answers ("commandeer") but I think the best answer is "don't do that". Add a document explaining how development works at Facebook (and many other companies) without the use of feature branching, why it's better, and how you can lay the technical groundwork you need to to stop doing this.
  - Add a general "smaller commits are better" and "your commit messsage should provide context" document.
  - Minor updates to other stuff as my understanding of Mercurial has been refined.

Test Plan: Generated and read documentation.

Reviewers: btrahan, vrana, schrockn

Reviewed By: btrahan

CC: aran

Differential Revision: https://secure.phabricator.com/D3025
This commit is contained in:
epriestley 2012-07-23 10:33:53 -07:00
parent 4dd707b189
commit 49205f977c
4 changed files with 383 additions and 24 deletions

View file

@ -0,0 +1,188 @@
@title Recommendations on Branching
@group flavortext
Project recommendations on how to organize branches.
This document discusses organizing branches in your remote/origin for feature
development and release management, not the use of local branches in Git or
queues or bookmarks in Mercurial.
This document is purely advisory. Phabricator works with a variety of branching
strategies, and diverging from the recommendations in this document
will not impact your ability to use it for code review and source management.
= Overview =
This document describes a branching strategy used by Facebook and Phabricator to
develop software. It scales well and removes the pain associated with most
branching strategies. This strategy is most applicable to web applications, and
may be less applicable to other types of software. The basics are:
- Never put feature branches in the remote/origin/trunk.
- Control access to new features with runtime configuration, not branching.
The next sections describe these points in more detail, explaining why you
should consider abandoning feature branches and how to build runtime access
controls for features.
= Feature Branches =
Suppose you are developing a new feature, like a way for users to "poke" each
other. A traditional strategy is to create a branch for this feature in the
remote (say, "poke_branch"), develop the feature on the branch over some period
of time (say, a week or a month), and then merge the entire branch back into
master/default/trunk when the feature is complete.
This strategy has some drawbacks:
- You have to merge. Merging can be painful and error prone, especially if the
feature takes a long time to develop. Reducing merge pain means spending
time merging master into the branch regularly. As branches fall further
out of sync, merge pain/risk tends to increase.
- This strategy generally aggregates risk into a single high-risk merge event
at the end of development. It does this both explicitly (all the code lands
at once) and more subtly: since commits on the branch aren't going live any
time soon, it's easy to hold them to a lower bar of quality.
- When you have multiple feature branches, it's impossible to test
interactions between the features until they are merged.
- You generally can't A/B test code in feature branches, can't roll it out to
a small percentage of users, and can't easily turn it on for just employees
since it's in a separate branch.
Of course, it also has some advantages:
- If the new feature replaces an older feature, the changes can delete the
older feature outright, or at least transition from the old feature to the
new feature fairly rapidly.
- The chance that this code will impact production before the merge is nearly
zero (it normally requires substantial human error). This is the major
reason to do it at all.
Instead, consider abandoning all use of feature branching. The advantages are
straightforward:
- You don't have to do merges.
- Risk is generally spread out more evenly into a large number of very small
risks created as each commit lands.
- You can test interactions between features in development easily.
- You can A/B test and do controlled rollouts easily.
But it has some tradeoffs:
- If a new feature replaces an older feature, both have to exist in the same
codebase for a while. But even with feature branching, you generally have to
do almost all this work anyway to avoid situations where you flip a switch
and can't undo it.
- You need an effective way to control access to features so they don't launch
before they're ready. Even with this, there is a small risk a feature may
launch or leak because of a smaller human error than would be required with
feature branching. However, for most applications, this isn't a big deal.
This second point is a must-have, but entirely tractable. The next section
describes how to build it, so you can stop doing feature branching and never
deal with the pain and risk of merging again.
= Controlling Access to Features =
Controlling access to features is straightforward: build some kind of runtime
configuration which defines which features are visible, based on the tier
(e.g., development, testing or production?) code is deployed on, the logged in
user, global configuration, random buckets, A/B test groups, or whatever else.
Your code should end up looking something like this:
if (is_feature_launched('poke')) {
show_poke();
}
Behind that is some code which knows about the 'poke' feature and can go lookup
configuration to determine if it should be visible or not. Facebook has a very
sophisticated system for this (called GateKeeper) which also integrates with A/B
tests, allows you to define complicated business rules, etc.
You don't need this in the beginning. Before GateKeeper, Facebook used a much
simpler system (called Sitevars) to handle this. Here are some resources
describing similar systems:
- There's a high-level overview of Facebook's system in this 2011 tech talk:
[[http://techcrunch.com/2011/05/30/facebook-source-code/ | Facebook Push
Tech Talk]].
- Flickr described their similar system in a 2009 blog post here:
[[http://code.flickr.com/blog/2009/12/02/flipping-out/ | Flickr Feature
Flags and Feature Flippers]].
- Disqus described their similar system in a 2010 blog post here:
[[http://blog.disqus.com/post/789540337/partial-deployment-with-feature-switches |
Disqus Feature Switches]].
- Forrst describes their similar system in a 2010 blog post here:
[[http://blog.forrst.com/post/782356699/how-we-deploy-new-features-on-forrst |
Forrst Buckets]].
- Martin Fowler discusses these systems in a 2010 blog post here:
[[http://martinfowler.com/bliki/FeatureToggle.html |
Martin Fowler's FeatureToggle]].
- Phabricator just adds config options but defaults them to off. When
developing, we turn them on locally. Once a feature is ready, we default it
on. We have a vastly less context to deal with than most projects, however,
and sometimes get away with simply not linking new features in the UI until
they mature (it's open source anyway so there's no value in hiding things).
When building this system there are a few things to avoid, mostly related to not
letting the complexity of this system grow too wildly:
- Facebook initially made it very easy to turn things on to everyone by
accident in GateKeeper. Don't do this. The UI should make it obvious when
you're turning something on or off, and default to off.
- Since GateKeeper is essentially a runtime business rules engine, it was
heavily abused to effectively execute code living in a database. Avoid this
through simpler design or a policy of sanity.
- Facebook allowed GateKeeper rules to depend on other GateKeeper rules
(for example, 'new_profile_tour' is launched if 'new_profile' is launched)
but did not perform cycle detection, and then sat on a bug describing
how to introduce a cycle and bring the site down for a very long time,
until someone introduced a cycle and brought the site down. If you implement
dependencies, implement cycle detection.
- Facebook implemented some very expensive GateKeeper conditions and was
spending 100+ ms per page running complex rulesets to do launch checks for a
number of months in 2009. Keep an eye on how expensive your checks are.
That said, not all complexity is bad:
- Allowing features to have states like "3%" instead of just "on" or "off"
allows you to roll out features gradually and watch for trouble (e.g.,
services collapsing from load).
- Building a control panel where you hit "Save" and all production servers
immediately reflect the change allows you to quickly turn things off if
there are problems.
- If you perform A/B testing, integrating A/B tests with feature rollouts
is probably a natural fit.
- Allowing new features to be launched to all employees before they're
launched to the world is a great way to get feedback and keep everyone
in the loop.
Adopting runtime feature controls increases the risk of features leaking (or
even launching) before they're ready. This is generally a small risk which is
probably reasonable for most projects to accept, although it might be
unacceptable for some projects. There are some ways you can mitigate this
risk:
- Essentially every launch/leak at Facebook was because someone turned on
a feature by accident when they didn't mean to. The control panel made this
very easy: features defaulted to "on", and if you tried to do something
common like remove yourself from the test group for a feature you could
easily launch it to the whole world. Design the UI defensively so that it
is hard to turn features on to everyone and/or obvious when a feature is
launching and this shouldn't be a problem.
- The rest were through CSS or JS changes that mentioned new features being
shipped to the client as part of static resource packaging or because
the code was just added to existing files. If this is a risk you're
concerned about, consider integration with static resource management.
In general, you can start with a very simple system and expand it as it makes
sense. Even a simple system can let you move away from feature branches.
= Next Steps =
Continue by:
- reading recommendations on structuring revision control with
@{article:Recommendations on Revision Control}; or
- reading recommendations on structuring changes with
@{article:Writing Reviewable Code}.

View file

@ -5,7 +5,7 @@ Project recommendations on how to organize revision control.
This document is purely advisory. Phabricator works with a variety of revision
control strategies, and diverging from the recommendations in this document
will not impact your ability to use it for code review and management.
will not impact your ability to use it for code review and source management.
This is my (epriestley's) personal take on the issue and not necessarily
representative of the views of the Phabricator team as a whole.
@ -27,17 +27,19 @@ exactly one commit (in some form), not a sequence of checkpoint commits.
- In SVN, this means don't ##commit## until after an idea has been completely
written. All reasonable SVN workflows naturally enforce this.
- In Git, this means squashing checkpoint commits as you go (with ##git commit
--amend##) or before pushing (with ##git rebase -i##), or having a strict
policy where your master/trunk contains only merge commits and each is a
merge between the old master and a branch which represents a single idea.
Although this preserves the checkpoint commits along the branches, you can
view master alone as a series of single-idea commits.
- In Mercurial, it is not generally possible to avoid having ideas spread
across multiple commits. Mercurial does not support liberal mutability
doctrines (so you can't squash commits) and does not let you build a default
--amend##) or before pushing (with ##git rebase -i## or `git merge
--squash`), or having a strict policy where your master/trunk contains only
merge commits and each is a merge between the old master and a branch which
represents a single idea. Although this preserves the checkpoint commits
along the branches, you can view master alone as a series of single-idea
commits.
- In Mercurial, you can use the "queues" extension before 2.2 or `--amend`
after Mercurial 2.2, or wait to commit until a change is complete (like
SVN), although the latter is not recommended. Without extensions, older
versions of Mercurial do not support liberal mutability doctrines (so you
can't ever combine checkpoint commits) and do not let you build a default
out of only merge commits, so it is not possible to have an authoritative
repository where one commit represents one idea in any real sense. Thus,
Mercurial is not recommended for projects which may ever need to scale.
repository where one commit represents one idea in any real sense.
= Why This Matters =
@ -71,12 +73,20 @@ particular:
commit for every keystroke. This VCS would obviously be unusable. But many
checkpoint commits aren't much different, and conceptually represent some
relatively arbitrary point in the sequence of keystrokes that went into
writing a larger idea. Get rid of them
or create an abstraction layer (merge commits) which allows you to ignore
them when you are trying to understand the repository in terms of ideas
(which is almost always).
writing a larger idea. Get rid of them or create an abstraction layer (merge
commits) which allows you to ignore them when you are trying to understand
the repository in terms of ideas (which is almost always).
All of these become problems only at scale. Facebook pushes dozens of ideas
every day and thousands on a weekly basis, and could not do this (at least, not
without more people or more errors) without choosing a repository strategy where
**one idea is one commit**.
= Next Steps =
Continue by:
- reading recommendations on structuring branches with
@{article:Recommendations on Branching}; or
- reading recommendations on structuring changes with
@{article:Writing Reviewable Code}.

View file

@ -0,0 +1,160 @@
@title Writing Reviewable Code
@group flavortext
Project recommendations on how to structure changes.
This document is purely advisory. Phabricator works with a variety of revision
control strategies, and diverging from the recommendations in this document
will not impact your ability to use it for code review and source management.
= Overview =
This document describes a strategy for structuring changes used successfully at
Facebook and in Phabricator. In essence:
- Each commit should be as small as possible, but no smaller.
- The smallest a commit can be is a single cohesive idea: don't make commits
so small that they are meaningless on their own.
- There should be a one-to-one mapping between ideas and commits:
each commit should build one idea, and each idea should be implemented by
one commit.
- Turn large commits into small commits by dividing large problems into
smaller problems and solving the small problems one at a time.
- Write sensible commit messages.
= Many Small Commits =
Small, simple commits are generally better than large, complex commits. They are
easier to understand, easier to test, and easier to review. The complexity of
understanding, testing and reviewing a change often increases faster than its
size: ten 200-line changes each doing one thing are often far easier to
understand than one 2,000 line change doing ten things. Splitting a change which
does many things into smaller changes which each do only one thing can decrease
the total complexity associated with accomplishing the same goal.
Each commit should do one thing. Generally, this means that you should separate
distinct changes into different commits when developing. For example, if you're
developing a feature and run into a preexisting bug, stash or checkpoint your
change, check out a clean HEAD/tip, fix the bug in one change, and then
merge/rebase your new feature on top of your bugfix so that you have two
changes, each with one idea ("add feature x", "fix a bug in y"), not one change
with two ideas ("add feature x and fix a bug in y").
(In Git, you can do this easily with local feature branches and commands like
`git rebase`, `git rebase -i`, and `git stash`, or with merges. In Mercurial,
you can use bookmarks or the queues extension. In SVN, there are few builtin
tools, but you can use multiple working copies or treat Differential like a
stash you access with `arc patch`.)
Even changes like fixing style problems should ideally be separated: they're
accomplishing a different goal. And it is far easier to review one 300-line
change which "converts tabs to spaces" plus one 30-line change which "implements
feature z" than one 330-line change which "implements feature z and also
converts a bunch of tabs to spaces".
Similarly, break related but complex changes into smaller, simpler components.
Here's a ridiculous analogy: if you're adding a new house, don't make one
5,000-line change which adds the whole house in one fell sweep. Split it apart
into smaller steps which are each easy to understand: start with the foundation,
then build the frame, etc. If you decided to dig the foundation with a shovel or
build the frame out of cardboard, it's both easier to miss and harder to correct
if the decisions are buried in 5,000 lines of interior design and landscaping.
Do it one piece at a time, providing enough context that the larger problem
can be understood but accomplishing no more with each step than you need to in
order for it to stand on its own.
The minimum size of a change should be a complete implementation of the simplest
subproblem which works on its own and expresses an entire idea, not just part
of an idea. You could mechanically split a 1,000-line change into ten 100-line
changes by choosing lines at random, but none of the individual changes would
make any sense and you would increase the collective complexity. The real goal
is for each change to have minimal complexity, line size is just a proxy that is
often well-correlated with complexity.
We generally follow these practices in Phabricator. The median change size for
Phabricator is 35 lines.
= Write Sensible Commit Messages =
There are lots of resources for this on the internet. All of the say pretty much
the same thing; this one does too.
The single most important thing is: **commit messages should explain //why// you
are making the change**.
Differential attempts to encourage the construction of sensible commit messages,
but can only enforce structure, not content. Structurally, commit messages
should probably:
- Have a title, briefly describing the change in one line.
- Have a summary, describing the change in more detail.
- Maybe have some other fields.
The content is far more important than the structure. In particular, the summary
should explain //why// you're making the change and //why// you're choosing the
implementation you're choosing. The //what// of the change is generally
well-explained by the change itself. For example, this is obviously an awful
commit message:
COUNTEREXAMPLE
fix a bug
But this one is almost as bad:
COUNTEREXAMPLE
Allow dots in usernames
Change the regexps so usernames can have dots in them.
This is better than nothing but just summarizes information which can be
inferred from the text of the diff. Instead, you should provide context and
explain why you're making the change you're making, and why it's the right one:
lang=txt
Allow dots in usernames to support Google and LDAP auth
To prevent nonsense, usernames are currently restricted to A-Z0-9. Now that
we have Google and LDAP auth, a couple of installs want to allow "." too
since they have schemes like "abraham.lincoln@mycompany.com" (see Tnnn). There
are no technical reasons not to do this, so I opened up the regexps a bit.
We could mostly open this up more but I figured I'd wait until someone asks
before allowing "ke$ha", etc., because I personally find such names
distasteful and offensive.
This information can not be extracted from the change itself, and is much more
useful for the reviewer and for anyone trying to understand the change after the
fact.
An easy way to explain //why// is to reference other objects
(bugs/issues/revisions) which motivate the change.
Differential also includes a "Test Plan" field which is required by default.
There is a detailed description of this field in @{article:Differential User
Guide: Test Plans}. You can make it optional or disable it in the configuration,
but consider adopting it. Having this information can be particularly helpful
for reviewers.
Some things that people sometimes feel strongly about but which are probably not
really all that important in commit messages include:
- If/where text is wrapped.
- Maximum length of the title.
- Whether there should be a period or not in the title.
- Use of voice/tense, e.g. "fix"/"add" vs "fixes"/"adds".
- Other sorts of pedantry not related to getting the context and
reasons //why// a change is happening into the commit message.
- Although maybe the spelling and grammar shouldn't be egregiously bad?
Phabricator does not have guidelines for this stuff. You can obviously set
guidelines at your organization if you prefer, but getting the //why// into the
message is the most important part.
= Next Steps =
Continue by:
- reading recommendations on structuring revision control with
@{article:Recommendations on Revision Control}; or
- reading recommendations on structuring branches with
@{article:Recommendations on Branching}.

View file

@ -6,12 +6,12 @@ plan.
= Overview =
When you send a revision for review in Differential you must include a test
plan. A test plan is a repeatable list of steps which document what you have
done to verify the behavior of a change. A good test plan convinces a reviewer
that you have been thorough in making sure your change works as intended and
has enough detail to allow someone unfamiliar with your change to verify its
behavior.
When you send a revision for review in Differential you must include a test plan
(this can be disabled or made optional in the config). A test plan is a
repeatable list of steps which document what you have done to verify the
behavior of a change. A good test plan convinces a reviewer that you have been
thorough in making sure your change works as intended and has enough detail to
allow someone unfamiliar with your change to verify its behavior.
This document has some common things to think about when developing or reviewing
a test plan. Some of these suggestions might not be applicable to the software
@ -28,8 +28,9 @@ you are writing; they are adapted from Facebook's internal documentation.
thrift, or databases? Are you adding new cachekeys or queries? Will
this change add a lot of load to services?
- **Performance:** How does your change affect performance? **NOTE**: If
your change is a performance-motivated change, you should include profiles
in your test plan proving that you have improved performance.
your change is a performance-motivated change, you should include
measurements, profiles or other data in your test plan proving that you have
improved performance.
- **Unit Tests:** Is your change adequately covered by unit tests? Could you
improve test coverage? If you're fixing a bug, did you add a test to prevent
it from happening again? Are the unit tests testing just the code in
@ -41,7 +42,7 @@ you are writing; they are adapted from Facebook's internal documentation.
order of some function from ##f(a, b)## to ##f(b, a)## and a new callsite is
introduced in the meantime, could it go unnoticed? How bad would that be?
(Because of this risk, you should almost never make parameter order
changes.)
changes in weakly typed languages like PHP and Javascript.)
- **Revert Plan:** If your change needs to be reverted and you aren't around,
are any special steps or considerations that the reverter needs to know
about? If there are, make sure they're adequately described in the "Revert