Documentation updates.

2025-01-27 15:08:20 +01:00 · 2011-05-15 08:12:56 -07:00 · 2011-05-15 08:12:56 -07:00 · 8a096ce2dd
commit 8a096ce2dd
parent 43405c354d
10 changed files with 976 additions and 2 deletions
--- a/.divinerconfig
+++ b/.divinerconfig
@ -6,6 +6,7 @@
    "config" : "Configuration",
    "contrib" : "Contributing",
    "userguide" : "Application User Guides",
+    "flavortext" : "Flavor Text",
    "developer" : "Phabricator Developer Guides",
    "differential" : "Differential (Code Review)",
    "diffusion" : "Diffusion (Repository Browser)",
--- a/src/docs/contrib_intro.diviner
+++ b/src/docs/contrib_intro.diviner
@ -30,8 +30,9 @@ introduce new contributors to the codebase.
 You should read the relevant coding convention documents before you submit a
 change and make sure you're following the project guidelines:

-  - @{article:General Coding Conventions} (for all languages)
-  - @{article:PHP Coding Conventions} (for PHP)
+  - @{article:General Coding Standards} (for all languages)
+  - @{article:PHP Coding Standards} (for PHP)
+  - @{article:Javascript Coding Standards} (for Javascript)

 = Philosophy =

--- a/src/docs/flavortext/about_flavor_text.diviner
+++ b/src/docs/flavortext/about_flavor_text.diviner
@ -0,0 +1,9 @@
+@title About Flavor Text
+@group flavortext
+
+Explains what's going on here.
+
+= Overview =
+
+Flavor Text is a collection of short articles which pertain to software
+development in general, not necessarily to Phabricator specifically.
--- a/src/docs/flavortext/javascript_object_array.diviner
+++ b/src/docs/flavortext/javascript_object_array.diviner
@ -0,0 +1,153 @@
+@title Javascript Object and Array
+@group flavortext
+
+This document describes the behaviors of Object and Array in Javascript, and
+a specific approach to their use which produces basically reasonable language
+behavior.
+
+= Primitives =
+
+Javascript has two native datatype primitives, Object and Array. Both are
+classes, so you can use ##new## to instantiate new objects and arrays:
+
+  COUNTEREXAMPLE
+  var a = new Array();    //  Not preferred.
+  var o = new Object();
+
+However, **you should prefer the shorthand notation** because it's more concise:
+
+  lang=js
+  var a = [];             // Preferred.
+  var o = {};
+
+(A possible exception to this rule is if you want to use the allocation behavior
+of the Array constructor, but you almost certainly don't.)
+
+The language relationship between Object and Array is somewhat tricky. Object
+and Array are both classes, but "object" is also a primitive type. Object is
+//also// the base class of all classes.
+
+  lang=js
+  typeof Object;        // "function"
+  typeof Array;         // "function"
+  typeof {};            // "object"
+  typeof [];            // "object"
+
+  var a = [], o = {};
+  o instanceof Object;  // true
+  o instanceof Array;   // false
+  a instanceof Object;  // true
+  a instanceof Array;   // true
+
+
+= Objects are Maps, Arrays are Lists =
+
+PHP has a single ##array## datatype which behaves like as both map and a list,
+and a common mistake is to treat Javascript arrays (or objects) in the same way.
+**Don't do this.** It sort of works until it doesn't. Instead, learn how
+Javascript's native datatypes work and use them properly.
+
+In Javascript, you should think of Objects as maps ("dictionaries") and Arrays
+as lists ("vectors").
+
+You store keys-value pairs in a map, and store ordered values in a list. So,
+store key-value pairs in Objects.
+
+  var o = {               //  Good, an object is a map.
+    name: 'Hubert',
+    species: 'zebra'
+  };
+
+ console.log(o.name);
+
+...and store ordered values in Arrays.
+
+  var a = [1, 2, 3];      //  Good, an array is a list.
+  a.push(4);
+
+Don't store key-value pairs in Arrays and don't expect Objects to be ordered.
+
+  COUNTEREXAMPLE
+  var a = [];
+  a['name'] = 'Hubert';   //  No! Don't do this!
+
+This technically works because Arrays are Objects and you think everything is
+fine and dandy, but it won't do what you want and will burn you.
+
+= Iterating over Maps and Lists =
+
+Iterate over a map like this:
+
+  lang=js
+  for (var k in object) {
+    f(object[k]);
+  }
+
+NOTE: There's some hasOwnProperty nonsense being omitted here, see below.
+
+Iterate over a list like this:
+
+  lang=js
+  for (var ii = 0; ii < list.length; ii++) {
+    f(list[ii]);
+  }
+
+NOTE: There's some sparse array nonsense being omitted here, see below.
+
+If you try to use ##for (var k in ...)## syntax to iterate over an Array, you'll
+pick up a  whole pile of keys you didn't intend to and it won't work. If you try
+to use ##for (var ii = 0; ...)## syntax to iterate over an Object, it won't work
+at all.
+
+If you consistently treat Arrays as lists and Objects as maps and use the
+corresponding iterators, everything will pretty much always work in a reasonable
+way.
+
+= hasOwnProperty() =
+
+An issue with this model is that if you write stuff to Object.prototype, it will
+show up every time you use enumeration ##for##:
+
+  COUNTEREXAMPLE
+  var o = {};
+  Object.prototype.duck = "quack";
+  for (var k in o) {
+    console.log(o[k]); // Logs "quack"
+  }
+
+There are two ways to avoid this:
+
+  - test that ##k## exists on ##o## by calling ##o.hasOwnProperty(k)## in every
+    single loop everywhere in your program and only use libraries which also do
+    this and never forget to do it ever; or
+  - don't write to Object.prototype.
+
+Of these, the first option is terrible garbage. Go with the second option.
+
+= Sparse Arrays =
+
+Another wrench in this mess is that Arrays aren't precisely like lists, because
+they do have indexes and may be sparse:
+
+  var a = [];
+  a[2] = 1;
+  console.log(a);       // [undefined, undefined, 1]
+
+The correct way to deal with this is:
+
+  for (var ii = 0; ii < list.length; ii++) {
+   if (list[ii] == undefined) {
+     continue;
+   }
+   f(list[ii]);
+ }
+
+Avoid sparse arrays if possible.
+
+= Ordered Maps =
+
+If you need an ordered map, you need to have a map for key-value associations
+and a list for key order. Don't try to build an ordered map using one Object or
+one Array. This generally applies for other complicated datatypes, as well; you
+need to build them out of more than one primitive.
+
--- a/src/docs/flavortext/javascript_pitfalls.diviner
+++ b/src/docs/flavortext/javascript_pitfalls.diviner
@ -0,0 +1,88 @@
+@title Javascript Pitfalls
+@group flavortext
+
+This document discusses pitfalls and flaws in the Javascript language, and how
+to avoid, work around, or at least understand them.
+
+= Implicit Semicolons =
+
+Javascript tries to insert semicolons if you forgot them. This is a pretty
+horrible idea. Notably, it can mask syntax errors by transforming subexpressions
+on their own lines into statements with no effect:
+
+  lang=js
+  string = "Here is a fairly long string that does not fit on one "
+           "line. Note that I forgot the string concatenation operators "
+           "so this will compile and execute with the wrong behavior. ";
+
+Here's what ECMA262 says about this:
+
+  When, as the program is parsed ..., a token ... is encountered that is not
+  allowed by any production of the grammar, then a semicolon is automatically
+  inserted before the offending token if one or more of the following conditions
+  is true: ...
+
+To protect yourself against this "feature", don't use it. Always explicitly
+insert semicolons after each statement. You should also prefer to break lines in
+places where insertion of a semicolon would not make the unparseable parseable,
+usually after operators.
+
+= ##with## is Bad News =
+
+##with## is a pretty bad feature, for this reason among others:
+
+  with (object) {
+    property = 3; // Might be on object, might be on window: who knows.
+  }
+
+Avoid ##with##.
+
+= ##arguments## is not an Array =
+
+You can convert ##arguments## to an array using JX.$A() or similar. Note that
+you can pass  ##arguments## to Function.prototype.apply() without converting it.
+
+= Object, Array, and iteration are needlessly hard =
+
+There is essentially only one reasonable, consistent way to use these primitives
+but it is not obvious. Navigate these troubled waters with
+@{article:Javascript Object and Array}.
+
+= typeof null == "object" =
+
+This statement is true in Javascript:
+
+  typeof null == 'object'
+
+This is pretty much a bug in the language that can never be fixed now.
+
+= Number, String, and Boolean objects =
+
+Like Java, Javascript has primitive versions of number, string, and boolean,
+and object versions. In Java, there's some argument for this distinction. In
+Javascript, it's pretty much completely worthless and the behavior of these
+objects is wrong. String and Boolean in particular are essentially unusable:
+
+  lang=js
+  "pancake" == "pancake";                         // true
+  new String("pancake") == new String("pancake"); // false
+
+  var b = new Boolean(false);
+  b;        // Shows 'false' in console.
+  !b;       // ALSO shows 'false' in console.
+  !b == b;  // So this is true!
+  !!b == !b // Negate both sides and it's false! FUCK!
+
+  if (b) {
+    // Better fucking believe this will get executed.
+  }
+
+There is no advantage to using the object forms (the primitive forms behave like
+objects and can have methods and properties, and inherit from Array.prototype,
+Number.prototype, etc.) and their logical behavior is at best absurd and at
+worst strictly wrong.
+
+**Never use** ##new Number()##, ##new String()## or ##new Boolean()## unless
+your Javascript is God Tier and you are absolutely sure you know what you are
+doing.
+
--- a/src/docs/flavortext/php_pitfalls.diviner
+++ b/src/docs/flavortext/php_pitfalls.diviner
@ -0,0 +1,301 @@
+@title PHP Pitfalls
+@group flavortext
+
+This document discusses difficult traps and pitfalls in PHP, and how to avoid,
+work around, or at least understand them.
+
+= array_merge() in Incredibly Slow =
+
+If you merge arrays like this:
+
+  COUNTEREXAMPLE
+  $result = array();
+  foreach ($list_of_lists as $one_list) {
+    $result = array_merge($result, $one_list);
+  }
+
+...your program now has a huge runtime runtime because it generates a large
+number of intermediate arrays and copies every element it has previously seen
+each time you iterate.
+
+In a libphutil environment, you can use ##array_mergev($list_of_lists)##
+instead.
+
+= var_export() Hates Baby Animals =
+
+If you try to var_export() an object that contains recursive references, your
+program will terminate. You have no chance to intercept or react to this or
+otherwise stop it from happening. Avoid var_export() unless you are certain
+you have only simple data. You can use print_r() or var_dump() to display
+complex variables safely.
+
+= isset(), empty() and Truthiness =
+
+A value is "truthy" if it evaluates to true in an ##if## clause:
+
+  $value = something();
+  if ($value) {
+    // Value is truthy.
+  }
+
+If a value is not truthy, it is "falsey". These values are falsey in PHP:
+
+  null      // null
+  0         // integer
+  0.0       // float
+  "0"       // string
+  ""        // empty string
+  false     // boolean
+  array()   // empty array
+
+Disregarding some bizarre edge cases, all other values are truthy. Note that
+because "0" is falsey, this sort of thing (intended to prevent users from making
+empty comments) is wrong in PHP:
+
+  COUNTEREXAMPLE
+  if ($comment_text) {
+    make_comment($comment_text);
+  }
+
+This is wrong because it prevents users from making the comment "0". //THIS
+COMMENT IS TOTALLY AWESOME AND I MAKE IT ALL THE TIME SO YOU HAD BETTER NOT
+BREAK IT!!!// A better test is probably strlen().
+
+In addition to truth tests with ##if##, PHP has two special truthiness operators
+which look like functions but aren't: empty() and isset(). These operators help
+deal with undeclared variables.
+
+In PHP, there are two major cases where you get undeclared variables -- either
+you directly use a variable without declaring it:
+
+  COUNTEREXAMPLE
+  function f() {
+    if ($not_declared) {
+      // ...
+    }
+  }
+
+...or you index into an array with an index which may not exist:
+
+  COUNTEREXAMPLE
+  function f(array $mystery) {
+    if ($mystery['stuff']) {
+      // ...
+    }
+  }
+
+When you do either of these, PHP issues a warning. Avoid these warnings by using
+empty() and isset() to do tests that are safe to apply to undeclared variables.
+
+empty() evaluates truthiness exactly opposite of if(). isset() returns true for
+everything except null. This is the truth table:
+
+  VALUE             if()        empty()     isset()
+
+  null              false       true        false
+  0                 false       true        true
+  0.0               false       true        true
+  "0"               false       true        true
+  ""                false       true        true
+  false             false       true        true
+  array()           false       true        true
+  EVERYTHING ELSE   true        false       true
+
+The value of these operators is that they accept undeclared variables and do not
+issue a warning. Specifically, if you try to do this you get a warning:
+
+  COUNTEREXAMPLE
+  if ($not_previously_declared) {         // PHP Notice:  Undefined variable!
+    // ...
+  }
+
+But these are fine:
+
+  if (empty($not_previously_declared)) {  // No notice, returns true.
+    // ...
+  }
+  if (isset($not_previously_declared)) {  // No notice, returns false.
+    // ...
+  }
+
+So, isset() really means is_declared_and_is_set_to_something_other_than_null().
+empty() really means is_falsey_or_is_not_declared(). Thus:
+
+  - If a variable is known to exist, test falsiness with if (!$v), not empty().
+    In particular, test for empty arrays with if (!$array). There is no reason
+    to ever use empty() on a declared variable.
+  - When you use isset() on an array key, like isset($array['key']), it will
+    evaluate to "false" if the key exists but has the value null! Test for index
+    existence with array_key_exists().
+
+Put another way, use isset() if you want to type "if ($value !== null)" but are
+testing something that may not be declared. Use empty() if you want to type
+"if (!$value)" but you are testing something that may not be declared.
+
+= usort(), uksort(), and uasort() are Slow =
+
+This family of functions is often extremely slow for large datasets. You should
+avoid them if at all possible. Instead, build an array which contains surrogate
+keys that are naturally sortable with a function that uses native comparison
+(e.g., sort(), asort(), ksort(), or natcasesort()). Sort this array instead, and
+use it to reorder the original array.
+
+In a libphutil environment, you can often do this easily with isort() or
+msort().
+
+= array_intersect() and array_diff() are Also Slow =
+
+These functions are much slower for even moderately large inputs than
+array_intersect_key() and array_diff_key(), because they can not make the
+assumption that their inputs are unique scalars as the ##key## varieties can.
+Strongly prefer the ##key## varieties.
+
+= array_uintersect() and array_udiff() are Definitely Slow Too =
+
+These functions have the problems of both the ##usort()## family and the
+##array_diff()## family. Avoid them.
+
+= foreach() Does Not Create Scope =
+
+Variables survive outside of the scope of foreach(). More problematically,
+references survive outside of the scope of foreach(). This code mutates
+##$array## because the reference leaks from the first loop to the second:
+
+  COUNTEREXAMPLE
+  $array = range(1, 3);
+  echo implode(',', $array); // Outputs '1,2,3'
+  foreach ($array as &$value) {}
+  echo implode(',', $array); // Outputs '1,2,3'
+  foreach ($array as $value) {}
+  echo implode(',', $array); // Outputs '1,2,2'
+
+The easiest way to avoid this is to avoid using foreach-by-reference. If you do
+use it, unset the reference after the loop:
+
+  foreach ($array as &$value) {
+    // ...
+  }
+  unset($value);
+
+= unserialize() is Incredibly Slow on Large Datasets =
+
+The performance of unserialize() is nonlinear in the number of zvals you
+unserialize, roughly O(N^2).
+
+  zvals       approximate time
+  10000       5ms
+  100000      85ms
+  1000000     8,000ms
+  10000000    72 billion years
+
+
+= call_user_func() Breaks References =
+
+If you use call_use_func() to invoke a function which takes parameters by
+reference, the variables you pass in will have their references broken and will
+emerge unmodified. That is, if you have a function that takes references:
+
+  function add_one(&$v) {
+    $v++;
+  }
+
+...and you call it with call_user_func():
+
+  COUNTEREXAMPLE
+  $x = 41;
+  call_user_func('add_one', $x);
+
+...##$x## will not be modified. The solution is to use call_user_func_array()
+and wrap the reference in an array:
+
+  $x = 41;
+  call_user_func_array(
+    'add_one',
+    array(&$x)); // Note '&$x'!
+
+This will work as expected.
+
+= You Can't Throw From __toString() =
+
+If you throw from __toString(), your program will terminate uselessly and you
+won't get the exception.
+
+= An Object Can Have Any Scalar as a Property =
+
+Object properties are not limited to legal variable names:
+
+  $property = '!@#$%^&*()';
+  $obj->$property = 'zebra';
+  echo $obj->$property;       // Outputs 'zebra'.
+
+So, don't make assumptions about property names.
+
+= There is an (object) Cast =
+
+You can cast a dictionary into an object.
+
+  $obj = (object)array('flavor' => 'coconut');
+  echo $obj->flavor;      // Outputs 'coconut'.
+  echo get_class($obj);   // Outputs 'stdClass'.
+
+This is occasionally useful, mostly to force an object to become a Javascript
+dictionary (vs a list) when passed to json_encode().
+
+= Invoking "new" With an Argument Vector is Really Hard =
+
+If you have some ##$class_name## and some ##$argv## of constructor
+arguments and you want to do this:
+
+  new $class_name($argv[0], $argv[1], ...);
+
+...you'll probably invent a very interesting, very novel solution that is very
+wrong. In a libphutil environment, solve this problem with newv(). Elsewhere,
+copy newv()'s implementation.
+
+= Equality is not Transitive =
+
+This isn't terribly surprising since equality isn't transitive in a lot of
+languages, but the == operator is not transitive:
+
+  $a = ''; $b = 0; $c = '0a';
+  $a == $b; // true
+  $b == $c; // true
+  $c == $a; // false!
+
+When either operand is an integer, the other operand is cast to an integer
+before comparison. Avoid this and similar pitfalls by using the === operator,
+which is transitive.
+
+= All 676 Letters in the Alphabet =
+
+This doesn't do what you'd expect it to do in C:
+
+  for ($c = 'a'; $c <= 'z'; $c++) {
+    // ...
+  }
+
+This is because the successor to 'z' is 'aa', which is "less than" 'z'. The
+loop will run for ~700 iterations until it reaches 'zz' and terminates. That is,
+##$c<## will take on these values:
+
+  a
+  b
+  ...
+  y
+  z
+  aa // loop continues because 'aa' <= 'z'
+  ab
+  ...
+  mf
+  mg
+  ...
+  zw
+  zx
+  zy
+  zz // loop now terminates because 'zz' > 'z'
+
+Instead, use this loop:
+
+  foreach (range('a', 'z') as $c) {
+    // ...
+  }
--- a/src/docs/flavortext/things_you_should_do_now.diviner
+++ b/src/docs/flavortext/things_you_should_do_now.diviner
@ -0,0 +1,140 @@
+@title Things You Should Do Now
+@group flavortext
+
+Describes things you should do now when building software, because the cost to
+do them increases over time and eventually becomes prohibitive or impossible.
+
+
+= Overview =
+
+If you're building a hot new web startup, there are a lot of decisions to make
+about what to focus on. Most things you'll build will take about the same amount
+of time to build regardless of what order you build them in, but there are a few
+technical things which become vastly more expensive to fix later.
+
+If you don't do these things early in development, they'll become very hard or
+impossible to do later. This is basically a list of things that would have saved
+Facebook huge amounts of time and effort down the road if someone had spent
+a tiny amount of time on them earlier in the development process.
+
+See also @{article:Things You Should Do Soon} for things that scale less
+drastically over time.
+
+
+= Start IDs At a Gigantic Number =
+
+If you're using integer IDs to identify data or objects, **don't** start your
+IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will
+ever appear in any other role in your application (like a count, a natural
+index, a byte size, a timestamp, etc). This takes about 5 seconds if you do it
+before you launch and rules out a huge class of nasty bugs for all time. It
+becomes incredibly difficult as soon as you have production data.
+
+The kind of bug that this causes is accidental use of some other value as an ID:
+
+  COUNTEREXAMPLE
+  // Load the user's friends, returns a map of friend_id => true
+  $friend_ids = user_get_friends($user_id);
+
+  // Get the first 8 friends.
+  $first_few_friends = array_slice($friend_ids, 0, 8);
+
+  // Render those friends.
+  render_user_friends($user_id, array_keys($first_few_friends));
+
+Because array_slice() in PHP discards array indices and renumbers them, this
+doesn't render the user's first 8 friends but the users with IDs 0 through 7,
+e.g. Mark Zuckerberg (ID 4) and Dustin Moskovitz (ID 6). If you have IDs in this
+range, sooner or later something that isn't an ID will get treated like an ID
+and the operation will be valid and cause unexpected behavior. This is
+completely avoidable if you start your IDs at a gigantic number.
+
+
+= Only Store Valid UTF-8 =
+
+For the most part, you can ignore UTF-8 and unicode until later. However, there
+is one aspect of unicode you should address now: store only valid UTF-8 strings.
+
+Assuming you're storing data internally as UTF-8 (this is almost certainly the
+right choice and definitely the right choice if you have no idea how unicode
+works), you just need to sanitize all the data coming into your application and
+make sure it's valid UTF-8.
+
+If your application emits invalid UTF-8, other systems (like browsers) will
+break in unexpected and interesting ways. You will eventually be forced to
+ensure you emit only valid UTF-8 to avoid these problems. If you haven't
+sanitized your data, you'll basically have two options:
+
+  - do a huge migration on literally all of your data to sanitize it; or
+  - forever sanitize all data on its way out on the read pathways.
+
+As of 2011 Facebook is in the second group, and spends several milliseconds of
+CPU time sanitizing every display string on its way to the browser, which
+multiplies out to hundreds of servers worth of CPUs sitting in a datacenter
+paying the price for the invalid UTF-8 in the databases.
+
+You can likely learn enough about unicode to be confident in an implementation
+which addresses this problem within a few hours. You don't need to learn
+everything, just the basics. Your language probably already has a function which
+does the sanitizing for you.
+
+
+= Never Design a Blacklist-Based Security System =
+
+When you have an alternative, don't design security systems which are default
+permit, blacklist-based, or otherwise attempt to enumerate badness. When
+Facebook launched Platform, it launched with a blacklist-based CSS filter, which
+basically tried to enumerate all the "bad" parts of CSS and filter them out.
+This was a poor design choice and lead to basically infinite security holes for
+all time.
+
+It is very difficult to enumerate badness in a complex system and badness is
+often a moving target. Instead of trying to do this, design whitelist-based
+security systems where you list allowed things and reject anything you don't
+understand. Assume things are bad until you verify that they're OK.
+
+It's tempting to design blacklist-based systems because they're easier to write
+and accept more inputs. In the case of the CSS filter, the product goal was for
+users to just be able to use CSS normally and feel like this system was no
+different from systems they were familiar with. A whitelist-based system would
+reject some valid, safe inputs and create product friction.
+
+But this is a much better world than the alternative, where the blacklist-based
+system fails to reject some dangerous inputs and creates //security holes//. It
+//also// creates product friction because when you fix those holes you break
+existing uses, and that backward-compatibility friction makes it very difficult
+to move the system from a blacklist to a whitelist. So you're basically in
+trouble no matter what you do, and have a bunch of security holes you need to
+unbreak immediately, so you won't even have time to feel sorry for yourself.
+
+Designing blacklist-based security is one of the worst now-vs-future tradeoffs
+you can make. See also "The Six Dumbest Ideas in Computer Security":
+
+http://www.ranum.com/security/computer_security/
+
+
+= Fail Very Loudly when SQL Syntax Errors Occur in Production =
+
+This doesn't apply if you aren't using SQL, but if you are: detect when a query
+fails because of a syntax error (in MySQL, it is error 1064). If the failure
+happened in production, fail in the loudest way possible. (I implemented this in
+2008 at Facebook and had it just email me and a few other people directly. The
+system was eventually refined.)
+
+This basically creates a high-signal stream that tells you where you have SQL
+injection holes in your application. It will have some false positives and could
+theoretically have false negatives, but at Facebook it was pretty high signal
+considering how important the signal is.
+
+Of course, the real solution here is to not have SQL injection holes in your
+application, ever. As far as I'm aware, this system correctly detected the one
+SQL injection hole we had from mid-2008 until I left in 2011, which was in a
+hackathon project on an underisolated semi-production tier and didn't use the
+query escaping system the rest of the application does.
+
+Hopefully, whatever language you're writing in has good query libraries that
+can handle escaping for you. If so, use them. If you're using PHP and don't have
+a solution in place yet, the Phabricator implementation of qsprintf() is similar
+to Facebook's system and was successful there.
+
+
--- a/src/docs/flavortext/things_you_should_do_soon.diviner
+++ b/src/docs/flavortext/things_you_should_do_soon.diviner
@ -0,0 +1,136 @@
+@title Things You Should Do Soon
+@group flavortext
+
+Describes things you should start thinking about soon, because scaling will
+be easier if you put a plan in place.
+
+= Overview =
+
+Stop! Don't do any of this yet. Go do @{article:Things You Should Do Now} first.
+
+Then you can come back and read about these things. These are basically some
+problems you'll probably eventually encounter when building a web application
+that might be good to start thinking about now.
+
+= Static Resources: JS and CSS =
+
+Over time, you'll write more JS and CSS and eventually need to put systems in
+place to manage it.
+
+
+== Manage Dependencies Automatically ==
+
+The naive way to add static resources to a page is to include them at the top
+of the page, before rendering begins, by enumerating filenames. Facebook used to
+work like that:
+
+  COUNTEREXAMPLE
+  <?php
+
+  require_js('js/base.js');
+  require_js('js/utils.js');
+  require_js('js/ajax.js');
+  require_js('js/dialog.js');
+  // ...
+
+This was okay for a while but had become unmanageable by 2007. Because
+dependencies were managed completely manually and you had to explicitly list
+every file you needed in the right order, everyone copy-pasted a giant block
+of this stuff into every page. The major problem this created was that each page
+pulled in way too much JS, which slowed down frontend performance.
+
+We moved to a system (called //Haste//) which declared JS dependencies in the
+files using a docblock-like header:
+
+  /**
+   * @provides dialog
+   * @requires utils ajax base
+   */
+
+We annotated files manually, although theoretically you could use static
+analysis instead (we couldn't realistically do that, our JS was pretty
+unstructured). This allowed us to pull in the entire dependency chain of
+component with one call:
+
+  require_static('dialog');
+
+...instead of copy-pasting every dependency.
+
+
+== Include When Used ==
+
+The other part of this problem was that all the resources were required at the
+top of the page instead of when they were actually used. This meant two things:
+
+  - you needed to include every resource that //could ever// appear on a page;
+  - if you were adding something new to 2+ pages, you had a strong incentive to
+    put it in base.js.
+
+So every page pulled in a bunch of silly stuff like the CAPTCHA code (because
+there was one obscure workflow involving unverified users which could
+theoretically show any user a CAPTCHA on any page) and every random thing anyone
+had stuck in base.js.
+
+We moved to a system where JS and CSS tags were output **after** page rendering
+had run instead (they still appeared at the top of the page, they were just
+prepended rather than appended before being output to the browser -- there are
+some complexities here, but they are beyond the immediate scope), so
+require_static() could appear anywhere in the code. Then we moved all the
+require_static() calls to be proximate to their use sites (so dialog rendering
+code would pull in dialog-related CSS and JS, for example, not any page which
+might need a dialog), and split base.js into a bunch of smaller files.
+
+
+== Packaging ==
+
+The biggest frontend performance killer in most cases is the raw number of HTTP
+requests, and the biggest hammer for addressing it is to package related JS
+and CSS into larger files, so you send down all the core JS code in one big file
+instead of a lot of smaller ones. Once the other groundwork is in place, this is
+a relatively easy change. We started with manual package definitions and
+eventually moved to automatic generation based on production data.
+
+
+== Caches and Serving Content ==
+
+In the simplest implementation of static resources, you write out a raw JS tag
+with something like ##src="/js/base.js"##. This will break disastrously as you
+scale, because clients will be running with stale versions of resources. There
+are bunch of subtle problems (especially once you have a CDN), but the big one
+is that if a user is browsing your site as you push/deploy, their client will
+not make requests for the resources they already have in cache, so even if your
+servers respond correctly to If-None-Match (ETags) and If-Modified-Since
+(Expires) the site will appear completely broken to everyone who was using it
+when you push a breaking change to static resources.
+
+The best way to solve this problem is to version your resources in the URI,
+so each version of a resource has a unique URI:
+
+  rsrc/af04d14/js/base.js
+
+When you push, users will receive pages which reference the new URI so their
+browsers will retrieve it.
+
+**But**, there's a big problem, once you have a bunch of web frontends:
+
+While you're pushing, a user may make a request which is handled by a server
+running the new version of the code, which delivers a page with a new resource
+URI. Their browser then makes a request for the new resource, but that request
+is routed to a server which has not been pushed yet, which delivers an old
+version of the resource. They now have a poisoned cache: old resource data for
+a new resource URI.
+
+You can do a lot of clever things to solve this, but the solution we chose at
+Facebook was to serve resources out of a database instead of off disk. Before a
+push begins, new resources are written to the database so that every server is
+able to satisfy both old and new resource requests.
+
+This also made it relatively easy to do processing steps (like stripping
+comments and whitespace) in one place, and just insert a minified/processed
+version of CSS and JS into the database.
+
+== Reference Implementation: Celerity ==
+
+Some of the ideas discussed here are implemented in Phabricator's //Celerity//
+system, which is essentially a simplified version of the //Haste// system used
+by Facebook.
--- a/src/docs/javascript_coding_standards.diviner
+++ b/src/docs/javascript_coding_standards.diviner
@ -0,0 +1,140 @@
+@title Javascript Coding Standards
+@group contrib
+
+This document describes Javascript coding standards for Phabricator and Javelin.
+
+= Overview =
+
+This document outlines technical and style guidelines which are followed in
+Phabricator and Javelin. Contributors should also follow these guidelines. Many
+of these guidelines are automatically enforced by lint.
+
+These guidelines are essentially identical to the Facebook guidelines, since I
+basically copy-pasted them. If you are already familiar with the Facebook
+guidelines, you can probably get away with skimming this document.
+
+
+= Spaces, Linebreaks and Indentation =
+
+  - Use two spaces for indentation. Don't use literal tab characters.
+  - Use Unix linebreaks ("\n"), not MSDOS ("\r\n") or OS9 ("\r").
+  - Use K&R style braces and spacing.
+  - Put a space after control keywords like ##if## and ##for##.
+  - Put a space after commas in argument lists.
+  - Put space around operators like ##=##, ##<##, etc.
+  - Don't put spaces after function names.
+  - Parentheses should hug their contents.
+  - Generally, prefer to wrap code at 80 columns.
+
+= Case and Capitalization =
+
+The Javascript language unambiguously dictates casing/naming rules; follow those
+rules.
+
+  - Name variables using ##lowercase_with_underscores##.
+  - Name classes using ##UpperCamelCase##.
+  - Name methods and properties using ##lowerCamelCase##.
+  - Name global functions using ##lowerCamelCase##. Avoid defining global
+    functions.
+  - Name constants using ##UPPERCASE##.
+  - Write ##true##, ##false##, and ##null## in lowercase.
+  - "Internal" methods and properties should be prefixed with an underscore.
+    For more information about what "internal" means, see
+    **Leading Underscores**, below.
+
+= Comments =
+
+  - Strongly prefer ##//## comments for making comments inside the bodies of
+    functions and methods (this lets someone easily comment out a block of code
+    while debugging later).
+
+= Javascript Language =
+
+  - Use ##[]## and ##{}##, not ##new Array## and ##new Object##.
+  - When creating an object literal, do not quote keys unless required.
+
+= Examples =
+
+**if/else:**
+
+  lang=js
+  if (x > 3) {
+    // ...
+  } else if (x === null) {
+    // ...
+  } else {
+    // ...
+  }
+
+You should always put braces around the body of an if clause, even if it is only
+one line. Note that operators like ##>## and ##===## are also surrounded by
+spaces.
+
+**for (iteration):**
+
+  lang=js
+  for (var ii = 0; ii < 10; ii++) {
+    // ...
+  }
+
+Prefer ii, jj, kk, etc., as iterators, since they're easier to pick out
+visually and react better to "Find Next..." in editors.
+
+**for (enumeration):**
+
+  lang=js
+  for (var k in obj) {
+    // ...
+  }
+
+Make sure you use enumeration only on Objects, not on Arrays. For more details,
+see @{article:Javascript Object and Array}.
+
+**switch:**
+
+  lang=js
+  switch (x) {
+    case 1:
+      // ...
+      break;
+    case 2:
+      if (flag) {
+        break;
+      }
+      break;
+    default:
+      // ...
+      break;
+  }
+
+##break## statements should be indented to block level. If you don't push them
+in, you end up with an inconsistent rule for conditional ##break## statements,
+as in the ##2## case.
+
+If you insist on having a "fall through" case that does not end with ##break##,
+make it clear in a comment that you wrote this intentionally. For instance:
+
+  lang=js
+  switch (x) {
+    case 1:
+      // ...
+      // Fall through...
+    case 2:
+      //...
+      break;
+  }
+
+= Leading Underscores =
+
+By convention, methods names which start with a leading underscore are
+considered "internal", which (roughly) means "private". The critical difference
+is that this is treated as a signal to Javascript processing scripts that a
+symbol is safe to rename since it is not referenced outside the current file.
+
+The upshot here is:
+
+  - name internal methods which shouldn't be called outside of a file's scope
+    with a leading underscore; and
+  - **never** call an internal method from another file.
+
+If you treat them as though they were "private", you won't run into problems.
--- a/src/docs/php_coding_standards.diviner
+++ b/src/docs/php_coding_standards.diviner
@ -9,6 +9,11 @@ This document outlines technical and style guidelines which are followed in
 libphutil. Contributors should also follow these guidelines. Many of these
 guidelines are automatically enforced by lint.

+These guidelines are essentially identical to the Facebook guidelines, since I
+basically copy-pasted them. If you are already familiar with the Facebook
+guidelines, you probably don't need to read this super thoroughly.
+
+
 = Spaces, Linebreaks and Indentation =

  - Use two spaces for indentation. Don't use tab literal characters.