mirror of
https://we.phorge.it/source/phorge.git
synced 2024-12-22 21:40:55 +01:00
Documentation updates.
This commit is contained in:
parent
43405c354d
commit
8a096ce2dd
10 changed files with 976 additions and 2 deletions
|
@ -6,6 +6,7 @@
|
|||
"config" : "Configuration",
|
||||
"contrib" : "Contributing",
|
||||
"userguide" : "Application User Guides",
|
||||
"flavortext" : "Flavor Text",
|
||||
"developer" : "Phabricator Developer Guides",
|
||||
"differential" : "Differential (Code Review)",
|
||||
"diffusion" : "Diffusion (Repository Browser)",
|
||||
|
|
|
@ -30,8 +30,9 @@ introduce new contributors to the codebase.
|
|||
You should read the relevant coding convention documents before you submit a
|
||||
change and make sure you're following the project guidelines:
|
||||
|
||||
- @{article:General Coding Conventions} (for all languages)
|
||||
- @{article:PHP Coding Conventions} (for PHP)
|
||||
- @{article:General Coding Standards} (for all languages)
|
||||
- @{article:PHP Coding Standards} (for PHP)
|
||||
- @{article:Javascript Coding Standards} (for Javascript)
|
||||
|
||||
= Philosophy =
|
||||
|
||||
|
|
9
src/docs/flavortext/about_flavor_text.diviner
Normal file
9
src/docs/flavortext/about_flavor_text.diviner
Normal file
|
@ -0,0 +1,9 @@
|
|||
@title About Flavor Text
|
||||
@group flavortext
|
||||
|
||||
Explains what's going on here.
|
||||
|
||||
= Overview =
|
||||
|
||||
Flavor Text is a collection of short articles which pertain to software
|
||||
development in general, not necessarily to Phabricator specifically.
|
153
src/docs/flavortext/javascript_object_array.diviner
Normal file
153
src/docs/flavortext/javascript_object_array.diviner
Normal file
|
@ -0,0 +1,153 @@
|
|||
@title Javascript Object and Array
|
||||
@group flavortext
|
||||
|
||||
This document describes the behaviors of Object and Array in Javascript, and
|
||||
a specific approach to their use which produces basically reasonable language
|
||||
behavior.
|
||||
|
||||
= Primitives =
|
||||
|
||||
Javascript has two native datatype primitives, Object and Array. Both are
|
||||
classes, so you can use ##new## to instantiate new objects and arrays:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
var a = new Array(); // Not preferred.
|
||||
var o = new Object();
|
||||
|
||||
However, **you should prefer the shorthand notation** because it's more concise:
|
||||
|
||||
lang=js
|
||||
var a = []; // Preferred.
|
||||
var o = {};
|
||||
|
||||
(A possible exception to this rule is if you want to use the allocation behavior
|
||||
of the Array constructor, but you almost certainly don't.)
|
||||
|
||||
The language relationship between Object and Array is somewhat tricky. Object
|
||||
and Array are both classes, but "object" is also a primitive type. Object is
|
||||
//also// the base class of all classes.
|
||||
|
||||
lang=js
|
||||
typeof Object; // "function"
|
||||
typeof Array; // "function"
|
||||
typeof {}; // "object"
|
||||
typeof []; // "object"
|
||||
|
||||
var a = [], o = {};
|
||||
o instanceof Object; // true
|
||||
o instanceof Array; // false
|
||||
a instanceof Object; // true
|
||||
a instanceof Array; // true
|
||||
|
||||
|
||||
= Objects are Maps, Arrays are Lists =
|
||||
|
||||
PHP has a single ##array## datatype which behaves like as both map and a list,
|
||||
and a common mistake is to treat Javascript arrays (or objects) in the same way.
|
||||
**Don't do this.** It sort of works until it doesn't. Instead, learn how
|
||||
Javascript's native datatypes work and use them properly.
|
||||
|
||||
In Javascript, you should think of Objects as maps ("dictionaries") and Arrays
|
||||
as lists ("vectors").
|
||||
|
||||
You store keys-value pairs in a map, and store ordered values in a list. So,
|
||||
store key-value pairs in Objects.
|
||||
|
||||
var o = { // Good, an object is a map.
|
||||
name: 'Hubert',
|
||||
species: 'zebra'
|
||||
};
|
||||
|
||||
console.log(o.name);
|
||||
|
||||
...and store ordered values in Arrays.
|
||||
|
||||
var a = [1, 2, 3]; // Good, an array is a list.
|
||||
a.push(4);
|
||||
|
||||
Don't store key-value pairs in Arrays and don't expect Objects to be ordered.
|
||||
|
||||
COUNTEREXAMPLE
|
||||
var a = [];
|
||||
a['name'] = 'Hubert'; // No! Don't do this!
|
||||
|
||||
This technically works because Arrays are Objects and you think everything is
|
||||
fine and dandy, but it won't do what you want and will burn you.
|
||||
|
||||
= Iterating over Maps and Lists =
|
||||
|
||||
Iterate over a map like this:
|
||||
|
||||
lang=js
|
||||
for (var k in object) {
|
||||
f(object[k]);
|
||||
}
|
||||
|
||||
NOTE: There's some hasOwnProperty nonsense being omitted here, see below.
|
||||
|
||||
Iterate over a list like this:
|
||||
|
||||
lang=js
|
||||
for (var ii = 0; ii < list.length; ii++) {
|
||||
f(list[ii]);
|
||||
}
|
||||
|
||||
NOTE: There's some sparse array nonsense being omitted here, see below.
|
||||
|
||||
If you try to use ##for (var k in ...)## syntax to iterate over an Array, you'll
|
||||
pick up a whole pile of keys you didn't intend to and it won't work. If you try
|
||||
to use ##for (var ii = 0; ...)## syntax to iterate over an Object, it won't work
|
||||
at all.
|
||||
|
||||
If you consistently treat Arrays as lists and Objects as maps and use the
|
||||
corresponding iterators, everything will pretty much always work in a reasonable
|
||||
way.
|
||||
|
||||
= hasOwnProperty() =
|
||||
|
||||
An issue with this model is that if you write stuff to Object.prototype, it will
|
||||
show up every time you use enumeration ##for##:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
var o = {};
|
||||
Object.prototype.duck = "quack";
|
||||
for (var k in o) {
|
||||
console.log(o[k]); // Logs "quack"
|
||||
}
|
||||
|
||||
There are two ways to avoid this:
|
||||
|
||||
- test that ##k## exists on ##o## by calling ##o.hasOwnProperty(k)## in every
|
||||
single loop everywhere in your program and only use libraries which also do
|
||||
this and never forget to do it ever; or
|
||||
- don't write to Object.prototype.
|
||||
|
||||
Of these, the first option is terrible garbage. Go with the second option.
|
||||
|
||||
= Sparse Arrays =
|
||||
|
||||
Another wrench in this mess is that Arrays aren't precisely like lists, because
|
||||
they do have indexes and may be sparse:
|
||||
|
||||
var a = [];
|
||||
a[2] = 1;
|
||||
console.log(a); // [undefined, undefined, 1]
|
||||
|
||||
The correct way to deal with this is:
|
||||
|
||||
for (var ii = 0; ii < list.length; ii++) {
|
||||
if (list[ii] == undefined) {
|
||||
continue;
|
||||
}
|
||||
f(list[ii]);
|
||||
}
|
||||
|
||||
Avoid sparse arrays if possible.
|
||||
|
||||
= Ordered Maps =
|
||||
|
||||
If you need an ordered map, you need to have a map for key-value associations
|
||||
and a list for key order. Don't try to build an ordered map using one Object or
|
||||
one Array. This generally applies for other complicated datatypes, as well; you
|
||||
need to build them out of more than one primitive.
|
||||
|
88
src/docs/flavortext/javascript_pitfalls.diviner
Normal file
88
src/docs/flavortext/javascript_pitfalls.diviner
Normal file
|
@ -0,0 +1,88 @@
|
|||
@title Javascript Pitfalls
|
||||
@group flavortext
|
||||
|
||||
This document discusses pitfalls and flaws in the Javascript language, and how
|
||||
to avoid, work around, or at least understand them.
|
||||
|
||||
= Implicit Semicolons =
|
||||
|
||||
Javascript tries to insert semicolons if you forgot them. This is a pretty
|
||||
horrible idea. Notably, it can mask syntax errors by transforming subexpressions
|
||||
on their own lines into statements with no effect:
|
||||
|
||||
lang=js
|
||||
string = "Here is a fairly long string that does not fit on one "
|
||||
"line. Note that I forgot the string concatenation operators "
|
||||
"so this will compile and execute with the wrong behavior. ";
|
||||
|
||||
Here's what ECMA262 says about this:
|
||||
|
||||
When, as the program is parsed ..., a token ... is encountered that is not
|
||||
allowed by any production of the grammar, then a semicolon is automatically
|
||||
inserted before the offending token if one or more of the following conditions
|
||||
is true: ...
|
||||
|
||||
To protect yourself against this "feature", don't use it. Always explicitly
|
||||
insert semicolons after each statement. You should also prefer to break lines in
|
||||
places where insertion of a semicolon would not make the unparseable parseable,
|
||||
usually after operators.
|
||||
|
||||
= ##with## is Bad News =
|
||||
|
||||
##with## is a pretty bad feature, for this reason among others:
|
||||
|
||||
with (object) {
|
||||
property = 3; // Might be on object, might be on window: who knows.
|
||||
}
|
||||
|
||||
Avoid ##with##.
|
||||
|
||||
= ##arguments## is not an Array =
|
||||
|
||||
You can convert ##arguments## to an array using JX.$A() or similar. Note that
|
||||
you can pass ##arguments## to Function.prototype.apply() without converting it.
|
||||
|
||||
= Object, Array, and iteration are needlessly hard =
|
||||
|
||||
There is essentially only one reasonable, consistent way to use these primitives
|
||||
but it is not obvious. Navigate these troubled waters with
|
||||
@{article:Javascript Object and Array}.
|
||||
|
||||
= typeof null == "object" =
|
||||
|
||||
This statement is true in Javascript:
|
||||
|
||||
typeof null == 'object'
|
||||
|
||||
This is pretty much a bug in the language that can never be fixed now.
|
||||
|
||||
= Number, String, and Boolean objects =
|
||||
|
||||
Like Java, Javascript has primitive versions of number, string, and boolean,
|
||||
and object versions. In Java, there's some argument for this distinction. In
|
||||
Javascript, it's pretty much completely worthless and the behavior of these
|
||||
objects is wrong. String and Boolean in particular are essentially unusable:
|
||||
|
||||
lang=js
|
||||
"pancake" == "pancake"; // true
|
||||
new String("pancake") == new String("pancake"); // false
|
||||
|
||||
var b = new Boolean(false);
|
||||
b; // Shows 'false' in console.
|
||||
!b; // ALSO shows 'false' in console.
|
||||
!b == b; // So this is true!
|
||||
!!b == !b // Negate both sides and it's false! FUCK!
|
||||
|
||||
if (b) {
|
||||
// Better fucking believe this will get executed.
|
||||
}
|
||||
|
||||
There is no advantage to using the object forms (the primitive forms behave like
|
||||
objects and can have methods and properties, and inherit from Array.prototype,
|
||||
Number.prototype, etc.) and their logical behavior is at best absurd and at
|
||||
worst strictly wrong.
|
||||
|
||||
**Never use** ##new Number()##, ##new String()## or ##new Boolean()## unless
|
||||
your Javascript is God Tier and you are absolutely sure you know what you are
|
||||
doing.
|
||||
|
301
src/docs/flavortext/php_pitfalls.diviner
Normal file
301
src/docs/flavortext/php_pitfalls.diviner
Normal file
|
@ -0,0 +1,301 @@
|
|||
@title PHP Pitfalls
|
||||
@group flavortext
|
||||
|
||||
This document discusses difficult traps and pitfalls in PHP, and how to avoid,
|
||||
work around, or at least understand them.
|
||||
|
||||
= array_merge() in Incredibly Slow =
|
||||
|
||||
If you merge arrays like this:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
$result = array();
|
||||
foreach ($list_of_lists as $one_list) {
|
||||
$result = array_merge($result, $one_list);
|
||||
}
|
||||
|
||||
...your program now has a huge runtime runtime because it generates a large
|
||||
number of intermediate arrays and copies every element it has previously seen
|
||||
each time you iterate.
|
||||
|
||||
In a libphutil environment, you can use ##array_mergev($list_of_lists)##
|
||||
instead.
|
||||
|
||||
= var_export() Hates Baby Animals =
|
||||
|
||||
If you try to var_export() an object that contains recursive references, your
|
||||
program will terminate. You have no chance to intercept or react to this or
|
||||
otherwise stop it from happening. Avoid var_export() unless you are certain
|
||||
you have only simple data. You can use print_r() or var_dump() to display
|
||||
complex variables safely.
|
||||
|
||||
= isset(), empty() and Truthiness =
|
||||
|
||||
A value is "truthy" if it evaluates to true in an ##if## clause:
|
||||
|
||||
$value = something();
|
||||
if ($value) {
|
||||
// Value is truthy.
|
||||
}
|
||||
|
||||
If a value is not truthy, it is "falsey". These values are falsey in PHP:
|
||||
|
||||
null // null
|
||||
0 // integer
|
||||
0.0 // float
|
||||
"0" // string
|
||||
"" // empty string
|
||||
false // boolean
|
||||
array() // empty array
|
||||
|
||||
Disregarding some bizarre edge cases, all other values are truthy. Note that
|
||||
because "0" is falsey, this sort of thing (intended to prevent users from making
|
||||
empty comments) is wrong in PHP:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
if ($comment_text) {
|
||||
make_comment($comment_text);
|
||||
}
|
||||
|
||||
This is wrong because it prevents users from making the comment "0". //THIS
|
||||
COMMENT IS TOTALLY AWESOME AND I MAKE IT ALL THE TIME SO YOU HAD BETTER NOT
|
||||
BREAK IT!!!// A better test is probably strlen().
|
||||
|
||||
In addition to truth tests with ##if##, PHP has two special truthiness operators
|
||||
which look like functions but aren't: empty() and isset(). These operators help
|
||||
deal with undeclared variables.
|
||||
|
||||
In PHP, there are two major cases where you get undeclared variables -- either
|
||||
you directly use a variable without declaring it:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
function f() {
|
||||
if ($not_declared) {
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
...or you index into an array with an index which may not exist:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
function f(array $mystery) {
|
||||
if ($mystery['stuff']) {
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
When you do either of these, PHP issues a warning. Avoid these warnings by using
|
||||
empty() and isset() to do tests that are safe to apply to undeclared variables.
|
||||
|
||||
empty() evaluates truthiness exactly opposite of if(). isset() returns true for
|
||||
everything except null. This is the truth table:
|
||||
|
||||
VALUE if() empty() isset()
|
||||
|
||||
null false true false
|
||||
0 false true true
|
||||
0.0 false true true
|
||||
"0" false true true
|
||||
"" false true true
|
||||
false false true true
|
||||
array() false true true
|
||||
EVERYTHING ELSE true false true
|
||||
|
||||
The value of these operators is that they accept undeclared variables and do not
|
||||
issue a warning. Specifically, if you try to do this you get a warning:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
if ($not_previously_declared) { // PHP Notice: Undefined variable!
|
||||
// ...
|
||||
}
|
||||
|
||||
But these are fine:
|
||||
|
||||
if (empty($not_previously_declared)) { // No notice, returns true.
|
||||
// ...
|
||||
}
|
||||
if (isset($not_previously_declared)) { // No notice, returns false.
|
||||
// ...
|
||||
}
|
||||
|
||||
So, isset() really means is_declared_and_is_set_to_something_other_than_null().
|
||||
empty() really means is_falsey_or_is_not_declared(). Thus:
|
||||
|
||||
- If a variable is known to exist, test falsiness with if (!$v), not empty().
|
||||
In particular, test for empty arrays with if (!$array). There is no reason
|
||||
to ever use empty() on a declared variable.
|
||||
- When you use isset() on an array key, like isset($array['key']), it will
|
||||
evaluate to "false" if the key exists but has the value null! Test for index
|
||||
existence with array_key_exists().
|
||||
|
||||
Put another way, use isset() if you want to type "if ($value !== null)" but are
|
||||
testing something that may not be declared. Use empty() if you want to type
|
||||
"if (!$value)" but you are testing something that may not be declared.
|
||||
|
||||
= usort(), uksort(), and uasort() are Slow =
|
||||
|
||||
This family of functions is often extremely slow for large datasets. You should
|
||||
avoid them if at all possible. Instead, build an array which contains surrogate
|
||||
keys that are naturally sortable with a function that uses native comparison
|
||||
(e.g., sort(), asort(), ksort(), or natcasesort()). Sort this array instead, and
|
||||
use it to reorder the original array.
|
||||
|
||||
In a libphutil environment, you can often do this easily with isort() or
|
||||
msort().
|
||||
|
||||
= array_intersect() and array_diff() are Also Slow =
|
||||
|
||||
These functions are much slower for even moderately large inputs than
|
||||
array_intersect_key() and array_diff_key(), because they can not make the
|
||||
assumption that their inputs are unique scalars as the ##key## varieties can.
|
||||
Strongly prefer the ##key## varieties.
|
||||
|
||||
= array_uintersect() and array_udiff() are Definitely Slow Too =
|
||||
|
||||
These functions have the problems of both the ##usort()## family and the
|
||||
##array_diff()## family. Avoid them.
|
||||
|
||||
= foreach() Does Not Create Scope =
|
||||
|
||||
Variables survive outside of the scope of foreach(). More problematically,
|
||||
references survive outside of the scope of foreach(). This code mutates
|
||||
##$array## because the reference leaks from the first loop to the second:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
$array = range(1, 3);
|
||||
echo implode(',', $array); // Outputs '1,2,3'
|
||||
foreach ($array as &$value) {}
|
||||
echo implode(',', $array); // Outputs '1,2,3'
|
||||
foreach ($array as $value) {}
|
||||
echo implode(',', $array); // Outputs '1,2,2'
|
||||
|
||||
The easiest way to avoid this is to avoid using foreach-by-reference. If you do
|
||||
use it, unset the reference after the loop:
|
||||
|
||||
foreach ($array as &$value) {
|
||||
// ...
|
||||
}
|
||||
unset($value);
|
||||
|
||||
= unserialize() is Incredibly Slow on Large Datasets =
|
||||
|
||||
The performance of unserialize() is nonlinear in the number of zvals you
|
||||
unserialize, roughly O(N^2).
|
||||
|
||||
zvals approximate time
|
||||
10000 5ms
|
||||
100000 85ms
|
||||
1000000 8,000ms
|
||||
10000000 72 billion years
|
||||
|
||||
|
||||
= call_user_func() Breaks References =
|
||||
|
||||
If you use call_use_func() to invoke a function which takes parameters by
|
||||
reference, the variables you pass in will have their references broken and will
|
||||
emerge unmodified. That is, if you have a function that takes references:
|
||||
|
||||
function add_one(&$v) {
|
||||
$v++;
|
||||
}
|
||||
|
||||
...and you call it with call_user_func():
|
||||
|
||||
COUNTEREXAMPLE
|
||||
$x = 41;
|
||||
call_user_func('add_one', $x);
|
||||
|
||||
...##$x## will not be modified. The solution is to use call_user_func_array()
|
||||
and wrap the reference in an array:
|
||||
|
||||
$x = 41;
|
||||
call_user_func_array(
|
||||
'add_one',
|
||||
array(&$x)); // Note '&$x'!
|
||||
|
||||
This will work as expected.
|
||||
|
||||
= You Can't Throw From __toString() =
|
||||
|
||||
If you throw from __toString(), your program will terminate uselessly and you
|
||||
won't get the exception.
|
||||
|
||||
= An Object Can Have Any Scalar as a Property =
|
||||
|
||||
Object properties are not limited to legal variable names:
|
||||
|
||||
$property = '!@#$%^&*()';
|
||||
$obj->$property = 'zebra';
|
||||
echo $obj->$property; // Outputs 'zebra'.
|
||||
|
||||
So, don't make assumptions about property names.
|
||||
|
||||
= There is an (object) Cast =
|
||||
|
||||
You can cast a dictionary into an object.
|
||||
|
||||
$obj = (object)array('flavor' => 'coconut');
|
||||
echo $obj->flavor; // Outputs 'coconut'.
|
||||
echo get_class($obj); // Outputs 'stdClass'.
|
||||
|
||||
This is occasionally useful, mostly to force an object to become a Javascript
|
||||
dictionary (vs a list) when passed to json_encode().
|
||||
|
||||
= Invoking "new" With an Argument Vector is Really Hard =
|
||||
|
||||
If you have some ##$class_name## and some ##$argv## of constructor
|
||||
arguments and you want to do this:
|
||||
|
||||
new $class_name($argv[0], $argv[1], ...);
|
||||
|
||||
...you'll probably invent a very interesting, very novel solution that is very
|
||||
wrong. In a libphutil environment, solve this problem with newv(). Elsewhere,
|
||||
copy newv()'s implementation.
|
||||
|
||||
= Equality is not Transitive =
|
||||
|
||||
This isn't terribly surprising since equality isn't transitive in a lot of
|
||||
languages, but the == operator is not transitive:
|
||||
|
||||
$a = ''; $b = 0; $c = '0a';
|
||||
$a == $b; // true
|
||||
$b == $c; // true
|
||||
$c == $a; // false!
|
||||
|
||||
When either operand is an integer, the other operand is cast to an integer
|
||||
before comparison. Avoid this and similar pitfalls by using the === operator,
|
||||
which is transitive.
|
||||
|
||||
= All 676 Letters in the Alphabet =
|
||||
|
||||
This doesn't do what you'd expect it to do in C:
|
||||
|
||||
for ($c = 'a'; $c <= 'z'; $c++) {
|
||||
// ...
|
||||
}
|
||||
|
||||
This is because the successor to 'z' is 'aa', which is "less than" 'z'. The
|
||||
loop will run for ~700 iterations until it reaches 'zz' and terminates. That is,
|
||||
##$c<## will take on these values:
|
||||
|
||||
a
|
||||
b
|
||||
...
|
||||
y
|
||||
z
|
||||
aa // loop continues because 'aa' <= 'z'
|
||||
ab
|
||||
...
|
||||
mf
|
||||
mg
|
||||
...
|
||||
zw
|
||||
zx
|
||||
zy
|
||||
zz // loop now terminates because 'zz' > 'z'
|
||||
|
||||
Instead, use this loop:
|
||||
|
||||
foreach (range('a', 'z') as $c) {
|
||||
// ...
|
||||
}
|
140
src/docs/flavortext/things_you_should_do_now.diviner
Normal file
140
src/docs/flavortext/things_you_should_do_now.diviner
Normal file
|
@ -0,0 +1,140 @@
|
|||
@title Things You Should Do Now
|
||||
@group flavortext
|
||||
|
||||
Describes things you should do now when building software, because the cost to
|
||||
do them increases over time and eventually becomes prohibitive or impossible.
|
||||
|
||||
|
||||
= Overview =
|
||||
|
||||
If you're building a hot new web startup, there are a lot of decisions to make
|
||||
about what to focus on. Most things you'll build will take about the same amount
|
||||
of time to build regardless of what order you build them in, but there are a few
|
||||
technical things which become vastly more expensive to fix later.
|
||||
|
||||
If you don't do these things early in development, they'll become very hard or
|
||||
impossible to do later. This is basically a list of things that would have saved
|
||||
Facebook huge amounts of time and effort down the road if someone had spent
|
||||
a tiny amount of time on them earlier in the development process.
|
||||
|
||||
See also @{article:Things You Should Do Soon} for things that scale less
|
||||
drastically over time.
|
||||
|
||||
|
||||
= Start IDs At a Gigantic Number =
|
||||
|
||||
If you're using integer IDs to identify data or objects, **don't** start your
|
||||
IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will
|
||||
ever appear in any other role in your application (like a count, a natural
|
||||
index, a byte size, a timestamp, etc). This takes about 5 seconds if you do it
|
||||
before you launch and rules out a huge class of nasty bugs for all time. It
|
||||
becomes incredibly difficult as soon as you have production data.
|
||||
|
||||
The kind of bug that this causes is accidental use of some other value as an ID:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
// Load the user's friends, returns a map of friend_id => true
|
||||
$friend_ids = user_get_friends($user_id);
|
||||
|
||||
// Get the first 8 friends.
|
||||
$first_few_friends = array_slice($friend_ids, 0, 8);
|
||||
|
||||
// Render those friends.
|
||||
render_user_friends($user_id, array_keys($first_few_friends));
|
||||
|
||||
Because array_slice() in PHP discards array indices and renumbers them, this
|
||||
doesn't render the user's first 8 friends but the users with IDs 0 through 7,
|
||||
e.g. Mark Zuckerberg (ID 4) and Dustin Moskovitz (ID 6). If you have IDs in this
|
||||
range, sooner or later something that isn't an ID will get treated like an ID
|
||||
and the operation will be valid and cause unexpected behavior. This is
|
||||
completely avoidable if you start your IDs at a gigantic number.
|
||||
|
||||
|
||||
= Only Store Valid UTF-8 =
|
||||
|
||||
For the most part, you can ignore UTF-8 and unicode until later. However, there
|
||||
is one aspect of unicode you should address now: store only valid UTF-8 strings.
|
||||
|
||||
Assuming you're storing data internally as UTF-8 (this is almost certainly the
|
||||
right choice and definitely the right choice if you have no idea how unicode
|
||||
works), you just need to sanitize all the data coming into your application and
|
||||
make sure it's valid UTF-8.
|
||||
|
||||
If your application emits invalid UTF-8, other systems (like browsers) will
|
||||
break in unexpected and interesting ways. You will eventually be forced to
|
||||
ensure you emit only valid UTF-8 to avoid these problems. If you haven't
|
||||
sanitized your data, you'll basically have two options:
|
||||
|
||||
- do a huge migration on literally all of your data to sanitize it; or
|
||||
- forever sanitize all data on its way out on the read pathways.
|
||||
|
||||
As of 2011 Facebook is in the second group, and spends several milliseconds of
|
||||
CPU time sanitizing every display string on its way to the browser, which
|
||||
multiplies out to hundreds of servers worth of CPUs sitting in a datacenter
|
||||
paying the price for the invalid UTF-8 in the databases.
|
||||
|
||||
You can likely learn enough about unicode to be confident in an implementation
|
||||
which addresses this problem within a few hours. You don't need to learn
|
||||
everything, just the basics. Your language probably already has a function which
|
||||
does the sanitizing for you.
|
||||
|
||||
|
||||
= Never Design a Blacklist-Based Security System =
|
||||
|
||||
When you have an alternative, don't design security systems which are default
|
||||
permit, blacklist-based, or otherwise attempt to enumerate badness. When
|
||||
Facebook launched Platform, it launched with a blacklist-based CSS filter, which
|
||||
basically tried to enumerate all the "bad" parts of CSS and filter them out.
|
||||
This was a poor design choice and lead to basically infinite security holes for
|
||||
all time.
|
||||
|
||||
It is very difficult to enumerate badness in a complex system and badness is
|
||||
often a moving target. Instead of trying to do this, design whitelist-based
|
||||
security systems where you list allowed things and reject anything you don't
|
||||
understand. Assume things are bad until you verify that they're OK.
|
||||
|
||||
It's tempting to design blacklist-based systems because they're easier to write
|
||||
and accept more inputs. In the case of the CSS filter, the product goal was for
|
||||
users to just be able to use CSS normally and feel like this system was no
|
||||
different from systems they were familiar with. A whitelist-based system would
|
||||
reject some valid, safe inputs and create product friction.
|
||||
|
||||
But this is a much better world than the alternative, where the blacklist-based
|
||||
system fails to reject some dangerous inputs and creates //security holes//. It
|
||||
//also// creates product friction because when you fix those holes you break
|
||||
existing uses, and that backward-compatibility friction makes it very difficult
|
||||
to move the system from a blacklist to a whitelist. So you're basically in
|
||||
trouble no matter what you do, and have a bunch of security holes you need to
|
||||
unbreak immediately, so you won't even have time to feel sorry for yourself.
|
||||
|
||||
Designing blacklist-based security is one of the worst now-vs-future tradeoffs
|
||||
you can make. See also "The Six Dumbest Ideas in Computer Security":
|
||||
|
||||
http://www.ranum.com/security/computer_security/
|
||||
|
||||
|
||||
= Fail Very Loudly when SQL Syntax Errors Occur in Production =
|
||||
|
||||
This doesn't apply if you aren't using SQL, but if you are: detect when a query
|
||||
fails because of a syntax error (in MySQL, it is error 1064). If the failure
|
||||
happened in production, fail in the loudest way possible. (I implemented this in
|
||||
2008 at Facebook and had it just email me and a few other people directly. The
|
||||
system was eventually refined.)
|
||||
|
||||
This basically creates a high-signal stream that tells you where you have SQL
|
||||
injection holes in your application. It will have some false positives and could
|
||||
theoretically have false negatives, but at Facebook it was pretty high signal
|
||||
considering how important the signal is.
|
||||
|
||||
Of course, the real solution here is to not have SQL injection holes in your
|
||||
application, ever. As far as I'm aware, this system correctly detected the one
|
||||
SQL injection hole we had from mid-2008 until I left in 2011, which was in a
|
||||
hackathon project on an underisolated semi-production tier and didn't use the
|
||||
query escaping system the rest of the application does.
|
||||
|
||||
Hopefully, whatever language you're writing in has good query libraries that
|
||||
can handle escaping for you. If so, use them. If you're using PHP and don't have
|
||||
a solution in place yet, the Phabricator implementation of qsprintf() is similar
|
||||
to Facebook's system and was successful there.
|
||||
|
||||
|
136
src/docs/flavortext/things_you_should_do_soon.diviner
Normal file
136
src/docs/flavortext/things_you_should_do_soon.diviner
Normal file
|
@ -0,0 +1,136 @@
|
|||
@title Things You Should Do Soon
|
||||
@group flavortext
|
||||
|
||||
Describes things you should start thinking about soon, because scaling will
|
||||
be easier if you put a plan in place.
|
||||
|
||||
= Overview =
|
||||
|
||||
Stop! Don't do any of this yet. Go do @{article:Things You Should Do Now} first.
|
||||
|
||||
Then you can come back and read about these things. These are basically some
|
||||
problems you'll probably eventually encounter when building a web application
|
||||
that might be good to start thinking about now.
|
||||
|
||||
= Static Resources: JS and CSS =
|
||||
|
||||
Over time, you'll write more JS and CSS and eventually need to put systems in
|
||||
place to manage it.
|
||||
|
||||
|
||||
== Manage Dependencies Automatically ==
|
||||
|
||||
The naive way to add static resources to a page is to include them at the top
|
||||
of the page, before rendering begins, by enumerating filenames. Facebook used to
|
||||
work like that:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
<?php
|
||||
|
||||
require_js('js/base.js');
|
||||
require_js('js/utils.js');
|
||||
require_js('js/ajax.js');
|
||||
require_js('js/dialog.js');
|
||||
// ...
|
||||
|
||||
This was okay for a while but had become unmanageable by 2007. Because
|
||||
dependencies were managed completely manually and you had to explicitly list
|
||||
every file you needed in the right order, everyone copy-pasted a giant block
|
||||
of this stuff into every page. The major problem this created was that each page
|
||||
pulled in way too much JS, which slowed down frontend performance.
|
||||
|
||||
We moved to a system (called //Haste//) which declared JS dependencies in the
|
||||
files using a docblock-like header:
|
||||
|
||||
/**
|
||||
* @provides dialog
|
||||
* @requires utils ajax base
|
||||
*/
|
||||
|
||||
We annotated files manually, although theoretically you could use static
|
||||
analysis instead (we couldn't realistically do that, our JS was pretty
|
||||
unstructured). This allowed us to pull in the entire dependency chain of
|
||||
component with one call:
|
||||
|
||||
require_static('dialog');
|
||||
|
||||
...instead of copy-pasting every dependency.
|
||||
|
||||
|
||||
== Include When Used ==
|
||||
|
||||
The other part of this problem was that all the resources were required at the
|
||||
top of the page instead of when they were actually used. This meant two things:
|
||||
|
||||
- you needed to include every resource that //could ever// appear on a page;
|
||||
- if you were adding something new to 2+ pages, you had a strong incentive to
|
||||
put it in base.js.
|
||||
|
||||
So every page pulled in a bunch of silly stuff like the CAPTCHA code (because
|
||||
there was one obscure workflow involving unverified users which could
|
||||
theoretically show any user a CAPTCHA on any page) and every random thing anyone
|
||||
had stuck in base.js.
|
||||
|
||||
We moved to a system where JS and CSS tags were output **after** page rendering
|
||||
had run instead (they still appeared at the top of the page, they were just
|
||||
prepended rather than appended before being output to the browser -- there are
|
||||
some complexities here, but they are beyond the immediate scope), so
|
||||
require_static() could appear anywhere in the code. Then we moved all the
|
||||
require_static() calls to be proximate to their use sites (so dialog rendering
|
||||
code would pull in dialog-related CSS and JS, for example, not any page which
|
||||
might need a dialog), and split base.js into a bunch of smaller files.
|
||||
|
||||
|
||||
== Packaging ==
|
||||
|
||||
The biggest frontend performance killer in most cases is the raw number of HTTP
|
||||
requests, and the biggest hammer for addressing it is to package related JS
|
||||
and CSS into larger files, so you send down all the core JS code in one big file
|
||||
instead of a lot of smaller ones. Once the other groundwork is in place, this is
|
||||
a relatively easy change. We started with manual package definitions and
|
||||
eventually moved to automatic generation based on production data.
|
||||
|
||||
|
||||
== Caches and Serving Content ==
|
||||
|
||||
In the simplest implementation of static resources, you write out a raw JS tag
|
||||
with something like ##src="/js/base.js"##. This will break disastrously as you
|
||||
scale, because clients will be running with stale versions of resources. There
|
||||
are bunch of subtle problems (especially once you have a CDN), but the big one
|
||||
is that if a user is browsing your site as you push/deploy, their client will
|
||||
not make requests for the resources they already have in cache, so even if your
|
||||
servers respond correctly to If-None-Match (ETags) and If-Modified-Since
|
||||
(Expires) the site will appear completely broken to everyone who was using it
|
||||
when you push a breaking change to static resources.
|
||||
|
||||
The best way to solve this problem is to version your resources in the URI,
|
||||
so each version of a resource has a unique URI:
|
||||
|
||||
rsrc/af04d14/js/base.js
|
||||
|
||||
When you push, users will receive pages which reference the new URI so their
|
||||
browsers will retrieve it.
|
||||
|
||||
**But**, there's a big problem, once you have a bunch of web frontends:
|
||||
|
||||
While you're pushing, a user may make a request which is handled by a server
|
||||
running the new version of the code, which delivers a page with a new resource
|
||||
URI. Their browser then makes a request for the new resource, but that request
|
||||
is routed to a server which has not been pushed yet, which delivers an old
|
||||
version of the resource. They now have a poisoned cache: old resource data for
|
||||
a new resource URI.
|
||||
|
||||
You can do a lot of clever things to solve this, but the solution we chose at
|
||||
Facebook was to serve resources out of a database instead of off disk. Before a
|
||||
push begins, new resources are written to the database so that every server is
|
||||
able to satisfy both old and new resource requests.
|
||||
|
||||
This also made it relatively easy to do processing steps (like stripping
|
||||
comments and whitespace) in one place, and just insert a minified/processed
|
||||
version of CSS and JS into the database.
|
||||
|
||||
== Reference Implementation: Celerity ==
|
||||
|
||||
Some of the ideas discussed here are implemented in Phabricator's //Celerity//
|
||||
system, which is essentially a simplified version of the //Haste// system used
|
||||
by Facebook.
|
140
src/docs/javascript_coding_standards.diviner
Normal file
140
src/docs/javascript_coding_standards.diviner
Normal file
|
@ -0,0 +1,140 @@
|
|||
@title Javascript Coding Standards
|
||||
@group contrib
|
||||
|
||||
This document describes Javascript coding standards for Phabricator and Javelin.
|
||||
|
||||
= Overview =
|
||||
|
||||
This document outlines technical and style guidelines which are followed in
|
||||
Phabricator and Javelin. Contributors should also follow these guidelines. Many
|
||||
of these guidelines are automatically enforced by lint.
|
||||
|
||||
These guidelines are essentially identical to the Facebook guidelines, since I
|
||||
basically copy-pasted them. If you are already familiar with the Facebook
|
||||
guidelines, you can probably get away with skimming this document.
|
||||
|
||||
|
||||
= Spaces, Linebreaks and Indentation =
|
||||
|
||||
- Use two spaces for indentation. Don't use literal tab characters.
|
||||
- Use Unix linebreaks ("\n"), not MSDOS ("\r\n") or OS9 ("\r").
|
||||
- Use K&R style braces and spacing.
|
||||
- Put a space after control keywords like ##if## and ##for##.
|
||||
- Put a space after commas in argument lists.
|
||||
- Put space around operators like ##=##, ##<##, etc.
|
||||
- Don't put spaces after function names.
|
||||
- Parentheses should hug their contents.
|
||||
- Generally, prefer to wrap code at 80 columns.
|
||||
|
||||
= Case and Capitalization =
|
||||
|
||||
The Javascript language unambiguously dictates casing/naming rules; follow those
|
||||
rules.
|
||||
|
||||
- Name variables using ##lowercase_with_underscores##.
|
||||
- Name classes using ##UpperCamelCase##.
|
||||
- Name methods and properties using ##lowerCamelCase##.
|
||||
- Name global functions using ##lowerCamelCase##. Avoid defining global
|
||||
functions.
|
||||
- Name constants using ##UPPERCASE##.
|
||||
- Write ##true##, ##false##, and ##null## in lowercase.
|
||||
- "Internal" methods and properties should be prefixed with an underscore.
|
||||
For more information about what "internal" means, see
|
||||
**Leading Underscores**, below.
|
||||
|
||||
= Comments =
|
||||
|
||||
- Strongly prefer ##//## comments for making comments inside the bodies of
|
||||
functions and methods (this lets someone easily comment out a block of code
|
||||
while debugging later).
|
||||
|
||||
= Javascript Language =
|
||||
|
||||
- Use ##[]## and ##{}##, not ##new Array## and ##new Object##.
|
||||
- When creating an object literal, do not quote keys unless required.
|
||||
|
||||
= Examples =
|
||||
|
||||
**if/else:**
|
||||
|
||||
lang=js
|
||||
if (x > 3) {
|
||||
// ...
|
||||
} else if (x === null) {
|
||||
// ...
|
||||
} else {
|
||||
// ...
|
||||
}
|
||||
|
||||
You should always put braces around the body of an if clause, even if it is only
|
||||
one line. Note that operators like ##>## and ##===## are also surrounded by
|
||||
spaces.
|
||||
|
||||
**for (iteration):**
|
||||
|
||||
lang=js
|
||||
for (var ii = 0; ii < 10; ii++) {
|
||||
// ...
|
||||
}
|
||||
|
||||
Prefer ii, jj, kk, etc., as iterators, since they're easier to pick out
|
||||
visually and react better to "Find Next..." in editors.
|
||||
|
||||
**for (enumeration):**
|
||||
|
||||
lang=js
|
||||
for (var k in obj) {
|
||||
// ...
|
||||
}
|
||||
|
||||
Make sure you use enumeration only on Objects, not on Arrays. For more details,
|
||||
see @{article:Javascript Object and Array}.
|
||||
|
||||
**switch:**
|
||||
|
||||
lang=js
|
||||
switch (x) {
|
||||
case 1:
|
||||
// ...
|
||||
break;
|
||||
case 2:
|
||||
if (flag) {
|
||||
break;
|
||||
}
|
||||
break;
|
||||
default:
|
||||
// ...
|
||||
break;
|
||||
}
|
||||
|
||||
##break## statements should be indented to block level. If you don't push them
|
||||
in, you end up with an inconsistent rule for conditional ##break## statements,
|
||||
as in the ##2## case.
|
||||
|
||||
If you insist on having a "fall through" case that does not end with ##break##,
|
||||
make it clear in a comment that you wrote this intentionally. For instance:
|
||||
|
||||
lang=js
|
||||
switch (x) {
|
||||
case 1:
|
||||
// ...
|
||||
// Fall through...
|
||||
case 2:
|
||||
//...
|
||||
break;
|
||||
}
|
||||
|
||||
= Leading Underscores =
|
||||
|
||||
By convention, methods names which start with a leading underscore are
|
||||
considered "internal", which (roughly) means "private". The critical difference
|
||||
is that this is treated as a signal to Javascript processing scripts that a
|
||||
symbol is safe to rename since it is not referenced outside the current file.
|
||||
|
||||
The upshot here is:
|
||||
|
||||
- name internal methods which shouldn't be called outside of a file's scope
|
||||
with a leading underscore; and
|
||||
- **never** call an internal method from another file.
|
||||
|
||||
If you treat them as though they were "private", you won't run into problems.
|
|
@ -9,6 +9,11 @@ This document outlines technical and style guidelines which are followed in
|
|||
libphutil. Contributors should also follow these guidelines. Many of these
|
||||
guidelines are automatically enforced by lint.
|
||||
|
||||
These guidelines are essentially identical to the Facebook guidelines, since I
|
||||
basically copy-pasted them. If you are already familiar with the Facebook
|
||||
guidelines, you probably don't need to read this super thoroughly.
|
||||
|
||||
|
||||
= Spaces, Linebreaks and Indentation =
|
||||
|
||||
- Use two spaces for indentation. Don't use tab literal characters.
|
||||
|
|
Loading…
Reference in a new issue