2011-05-15 17:12:56 +02:00
|
|
|
@title PHP Pitfalls
|
2014-03-05 22:00:24 +01:00
|
|
|
@group php
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
This document discusses difficult traps and pitfalls in PHP, and how to avoid,
|
|
|
|
work around, or at least understand them.
|
|
|
|
|
2011-06-28 02:40:34 +02:00
|
|
|
= array_merge() in Incredibly Slow When Merging A List of Arrays =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2011-06-28 02:40:34 +02:00
|
|
|
If you merge a list of arrays like this:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
$result = array();
|
|
|
|
foreach ($list_of_lists as $one_list) {
|
|
|
|
$result = array_merge($result, $one_list);
|
|
|
|
}
|
|
|
|
|
2011-06-28 02:40:34 +02:00
|
|
|
...your program now has a huge runtime because it generates a large number of
|
|
|
|
intermediate arrays and copies every element it has previously seen each time
|
|
|
|
you iterate.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2012-08-15 22:47:03 +02:00
|
|
|
In a libphutil environment, you can use @{function@libphutil:array_mergev}
|
2011-05-15 17:12:56 +02:00
|
|
|
instead.
|
|
|
|
|
|
|
|
= var_export() Hates Baby Animals =
|
|
|
|
|
|
|
|
If you try to var_export() an object that contains recursive references, your
|
|
|
|
program will terminate. You have no chance to intercept or react to this or
|
|
|
|
otherwise stop it from happening. Avoid var_export() unless you are certain
|
|
|
|
you have only simple data. You can use print_r() or var_dump() to display
|
|
|
|
complex variables safely.
|
|
|
|
|
|
|
|
= isset(), empty() and Truthiness =
|
|
|
|
|
|
|
|
A value is "truthy" if it evaluates to true in an ##if## clause:
|
|
|
|
|
|
|
|
$value = something();
|
|
|
|
if ($value) {
|
|
|
|
// Value is truthy.
|
|
|
|
}
|
|
|
|
|
|
|
|
If a value is not truthy, it is "falsey". These values are falsey in PHP:
|
|
|
|
|
|
|
|
null // null
|
|
|
|
0 // integer
|
|
|
|
0.0 // float
|
|
|
|
"0" // string
|
|
|
|
"" // empty string
|
|
|
|
false // boolean
|
|
|
|
array() // empty array
|
|
|
|
|
|
|
|
Disregarding some bizarre edge cases, all other values are truthy. Note that
|
|
|
|
because "0" is falsey, this sort of thing (intended to prevent users from making
|
|
|
|
empty comments) is wrong in PHP:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
if ($comment_text) {
|
|
|
|
make_comment($comment_text);
|
|
|
|
}
|
|
|
|
|
|
|
|
This is wrong because it prevents users from making the comment "0". //THIS
|
|
|
|
COMMENT IS TOTALLY AWESOME AND I MAKE IT ALL THE TIME SO YOU HAD BETTER NOT
|
|
|
|
BREAK IT!!!// A better test is probably strlen().
|
|
|
|
|
|
|
|
In addition to truth tests with ##if##, PHP has two special truthiness operators
|
|
|
|
which look like functions but aren't: empty() and isset(). These operators help
|
|
|
|
deal with undeclared variables.
|
|
|
|
|
|
|
|
In PHP, there are two major cases where you get undeclared variables -- either
|
|
|
|
you directly use a variable without declaring it:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
function f() {
|
|
|
|
if ($not_declared) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
...or you index into an array with an index which may not exist:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
function f(array $mystery) {
|
|
|
|
if ($mystery['stuff']) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
When you do either of these, PHP issues a warning. Avoid these warnings by using
|
|
|
|
empty() and isset() to do tests that are safe to apply to undeclared variables.
|
|
|
|
|
|
|
|
empty() evaluates truthiness exactly opposite of if(). isset() returns true for
|
|
|
|
everything except null. This is the truth table:
|
|
|
|
|
|
|
|
VALUE if() empty() isset()
|
|
|
|
|
|
|
|
null false true false
|
|
|
|
0 false true true
|
|
|
|
0.0 false true true
|
|
|
|
"0" false true true
|
|
|
|
"" false true true
|
|
|
|
false false true true
|
|
|
|
array() false true true
|
|
|
|
EVERYTHING ELSE true false true
|
|
|
|
|
|
|
|
The value of these operators is that they accept undeclared variables and do not
|
|
|
|
issue a warning. Specifically, if you try to do this you get a warning:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
if ($not_previously_declared) { // PHP Notice: Undefined variable!
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
|
|
|
|
But these are fine:
|
|
|
|
|
|
|
|
if (empty($not_previously_declared)) { // No notice, returns true.
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
if (isset($not_previously_declared)) { // No notice, returns false.
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
|
|
|
|
So, isset() really means is_declared_and_is_set_to_something_other_than_null().
|
|
|
|
empty() really means is_falsey_or_is_not_declared(). Thus:
|
|
|
|
|
|
|
|
- If a variable is known to exist, test falsiness with if (!$v), not empty().
|
|
|
|
In particular, test for empty arrays with if (!$array). There is no reason
|
|
|
|
to ever use empty() on a declared variable.
|
|
|
|
- When you use isset() on an array key, like isset($array['key']), it will
|
|
|
|
evaluate to "false" if the key exists but has the value null! Test for index
|
|
|
|
existence with array_key_exists().
|
|
|
|
|
|
|
|
Put another way, use isset() if you want to type "if ($value !== null)" but are
|
|
|
|
testing something that may not be declared. Use empty() if you want to type
|
|
|
|
"if (!$value)" but you are testing something that may not be declared.
|
|
|
|
|
|
|
|
= usort(), uksort(), and uasort() are Slow =
|
|
|
|
|
|
|
|
This family of functions is often extremely slow for large datasets. You should
|
|
|
|
avoid them if at all possible. Instead, build an array which contains surrogate
|
|
|
|
keys that are naturally sortable with a function that uses native comparison
|
|
|
|
(e.g., sort(), asort(), ksort(), or natcasesort()). Sort this array instead, and
|
|
|
|
use it to reorder the original array.
|
|
|
|
|
2012-08-15 22:47:03 +02:00
|
|
|
In a libphutil environment, you can often do this easily with
|
|
|
|
@{function@libphutil:isort} or @{function@libphutil:msort}.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
= array_intersect() and array_diff() are Also Slow =
|
|
|
|
|
|
|
|
These functions are much slower for even moderately large inputs than
|
|
|
|
array_intersect_key() and array_diff_key(), because they can not make the
|
|
|
|
assumption that their inputs are unique scalars as the ##key## varieties can.
|
|
|
|
Strongly prefer the ##key## varieties.
|
|
|
|
|
|
|
|
= array_uintersect() and array_udiff() are Definitely Slow Too =
|
|
|
|
|
|
|
|
These functions have the problems of both the ##usort()## family and the
|
2014-08-06 23:18:32 +02:00
|
|
|
`array_diff()` family. Avoid them.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
= foreach() Does Not Create Scope =
|
|
|
|
|
|
|
|
Variables survive outside of the scope of foreach(). More problematically,
|
|
|
|
references survive outside of the scope of foreach(). This code mutates
|
2014-08-06 23:18:32 +02:00
|
|
|
`$array` because the reference leaks from the first loop to the second:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
$array = range(1, 3);
|
|
|
|
echo implode(',', $array); // Outputs '1,2,3'
|
|
|
|
foreach ($array as &$value) {}
|
|
|
|
echo implode(',', $array); // Outputs '1,2,3'
|
|
|
|
foreach ($array as $value) {}
|
|
|
|
echo implode(',', $array); // Outputs '1,2,2'
|
|
|
|
|
|
|
|
The easiest way to avoid this is to avoid using foreach-by-reference. If you do
|
|
|
|
use it, unset the reference after the loop:
|
|
|
|
|
|
|
|
foreach ($array as &$value) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
unset($value);
|
|
|
|
|
|
|
|
= unserialize() is Incredibly Slow on Large Datasets =
|
|
|
|
|
|
|
|
The performance of unserialize() is nonlinear in the number of zvals you
|
|
|
|
unserialize, roughly O(N^2).
|
|
|
|
|
|
|
|
zvals approximate time
|
|
|
|
10000 5ms
|
|
|
|
100000 85ms
|
|
|
|
1000000 8,000ms
|
|
|
|
10000000 72 billion years
|
|
|
|
|
|
|
|
|
|
|
|
= call_user_func() Breaks References =
|
|
|
|
|
|
|
|
If you use call_use_func() to invoke a function which takes parameters by
|
|
|
|
reference, the variables you pass in will have their references broken and will
|
|
|
|
emerge unmodified. That is, if you have a function that takes references:
|
|
|
|
|
|
|
|
function add_one(&$v) {
|
|
|
|
$v++;
|
|
|
|
}
|
|
|
|
|
|
|
|
...and you call it with call_user_func():
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
$x = 41;
|
|
|
|
call_user_func('add_one', $x);
|
|
|
|
|
|
|
|
...##$x## will not be modified. The solution is to use call_user_func_array()
|
|
|
|
and wrap the reference in an array:
|
|
|
|
|
|
|
|
$x = 41;
|
|
|
|
call_user_func_array(
|
|
|
|
'add_one',
|
|
|
|
array(&$x)); // Note '&$x'!
|
|
|
|
|
|
|
|
This will work as expected.
|
|
|
|
|
|
|
|
= You Can't Throw From __toString() =
|
|
|
|
|
|
|
|
If you throw from __toString(), your program will terminate uselessly and you
|
|
|
|
won't get the exception.
|
|
|
|
|
|
|
|
= An Object Can Have Any Scalar as a Property =
|
|
|
|
|
|
|
|
Object properties are not limited to legal variable names:
|
|
|
|
|
|
|
|
$property = '!@#$%^&*()';
|
|
|
|
$obj->$property = 'zebra';
|
|
|
|
echo $obj->$property; // Outputs 'zebra'.
|
|
|
|
|
|
|
|
So, don't make assumptions about property names.
|
|
|
|
|
|
|
|
= There is an (object) Cast =
|
|
|
|
|
|
|
|
You can cast a dictionary into an object.
|
|
|
|
|
|
|
|
$obj = (object)array('flavor' => 'coconut');
|
|
|
|
echo $obj->flavor; // Outputs 'coconut'.
|
|
|
|
echo get_class($obj); // Outputs 'stdClass'.
|
|
|
|
|
|
|
|
This is occasionally useful, mostly to force an object to become a Javascript
|
|
|
|
dictionary (vs a list) when passed to json_encode().
|
|
|
|
|
|
|
|
= Invoking "new" With an Argument Vector is Really Hard =
|
|
|
|
|
|
|
|
If you have some ##$class_name## and some ##$argv## of constructor
|
|
|
|
arguments and you want to do this:
|
|
|
|
|
|
|
|
new $class_name($argv[0], $argv[1], ...);
|
|
|
|
|
|
|
|
...you'll probably invent a very interesting, very novel solution that is very
|
2012-08-15 22:47:03 +02:00
|
|
|
wrong. In a libphutil environment, solve this problem with
|
|
|
|
@{function@libphutil:newv}. Elsewhere, copy newv()'s implementation.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
= Equality is not Transitive =
|
|
|
|
|
|
|
|
This isn't terribly surprising since equality isn't transitive in a lot of
|
|
|
|
languages, but the == operator is not transitive:
|
|
|
|
|
|
|
|
$a = ''; $b = 0; $c = '0a';
|
|
|
|
$a == $b; // true
|
|
|
|
$b == $c; // true
|
|
|
|
$c == $a; // false!
|
|
|
|
|
|
|
|
When either operand is an integer, the other operand is cast to an integer
|
|
|
|
before comparison. Avoid this and similar pitfalls by using the === operator,
|
|
|
|
which is transitive.
|
|
|
|
|
|
|
|
= All 676 Letters in the Alphabet =
|
|
|
|
|
|
|
|
This doesn't do what you'd expect it to do in C:
|
|
|
|
|
|
|
|
for ($c = 'a'; $c <= 'z'; $c++) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
|
|
|
|
This is because the successor to 'z' is 'aa', which is "less than" 'z'. The
|
|
|
|
loop will run for ~700 iterations until it reaches 'zz' and terminates. That is,
|
2014-08-06 23:18:32 +02:00
|
|
|
`$c` will take on these values:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
a
|
|
|
|
b
|
|
|
|
...
|
|
|
|
y
|
|
|
|
z
|
|
|
|
aa // loop continues because 'aa' <= 'z'
|
|
|
|
ab
|
|
|
|
...
|
|
|
|
mf
|
|
|
|
mg
|
|
|
|
...
|
|
|
|
zw
|
|
|
|
zx
|
|
|
|
zy
|
|
|
|
zz // loop now terminates because 'zz' > 'z'
|
|
|
|
|
|
|
|
Instead, use this loop:
|
|
|
|
|
|
|
|
foreach (range('a', 'z') as $c) {
|
|
|
|
// ...
|
|
|
|
}
|