2011-05-15 17:12:56 +02:00
|
|
|
@title PHP Pitfalls
|
2014-03-05 22:00:24 +01:00
|
|
|
@group php
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
This document discusses difficult traps and pitfalls in PHP, and how to avoid,
|
|
|
|
work around, or at least understand them.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `array_merge()` in Incredibly Slow When Merging A List of Arrays =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2011-06-28 02:40:34 +02:00
|
|
|
If you merge a list of arrays like this:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
COUNTEREXAMPLE, lang=php
|
2011-05-15 17:12:56 +02:00
|
|
|
$result = array();
|
|
|
|
foreach ($list_of_lists as $one_list) {
|
|
|
|
$result = array_merge($result, $one_list);
|
|
|
|
}
|
|
|
|
|
2011-06-28 02:40:34 +02:00
|
|
|
...your program now has a huge runtime because it generates a large number of
|
|
|
|
intermediate arrays and copies every element it has previously seen each time
|
|
|
|
you iterate.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2012-08-15 22:47:03 +02:00
|
|
|
In a libphutil environment, you can use @{function@libphutil:array_mergev}
|
2011-05-15 17:12:56 +02:00
|
|
|
instead.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `var_export()` Hates Baby Animals =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
If you try to `var_export()` an object that contains recursive references, your
|
2011-05-15 17:12:56 +02:00
|
|
|
program will terminate. You have no chance to intercept or react to this or
|
2015-06-14 23:32:28 +02:00
|
|
|
otherwise stop it from happening. Avoid `var_export()` unless you are certain
|
|
|
|
you have only simple data. You can use `print_r()` or `var_dump()` to display
|
2011-05-15 17:12:56 +02:00
|
|
|
complex variables safely.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `isset()`, `empty()` and Truthiness =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-05-31 02:07:45 +02:00
|
|
|
A value is "truthy" if it evaluates to true in an `if` clause:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
lang=php
|
2011-05-15 17:12:56 +02:00
|
|
|
$value = something();
|
|
|
|
if ($value) {
|
|
|
|
// Value is truthy.
|
|
|
|
}
|
|
|
|
|
|
|
|
If a value is not truthy, it is "falsey". These values are falsey in PHP:
|
|
|
|
|
|
|
|
null // null
|
|
|
|
0 // integer
|
|
|
|
0.0 // float
|
|
|
|
"0" // string
|
|
|
|
"" // empty string
|
|
|
|
false // boolean
|
|
|
|
array() // empty array
|
|
|
|
|
|
|
|
Disregarding some bizarre edge cases, all other values are truthy. Note that
|
|
|
|
because "0" is falsey, this sort of thing (intended to prevent users from making
|
|
|
|
empty comments) is wrong in PHP:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
if ($comment_text) {
|
|
|
|
make_comment($comment_text);
|
|
|
|
}
|
|
|
|
|
|
|
|
This is wrong because it prevents users from making the comment "0". //THIS
|
|
|
|
COMMENT IS TOTALLY AWESOME AND I MAKE IT ALL THE TIME SO YOU HAD BETTER NOT
|
2015-06-14 23:32:28 +02:00
|
|
|
BREAK IT!!!// A better test is probably `strlen()`.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-05-31 02:07:45 +02:00
|
|
|
In addition to truth tests with `if`, PHP has two special truthiness operators
|
2015-06-14 23:32:28 +02:00
|
|
|
which look like functions but aren't: `empty()` and `isset()`. These operators
|
|
|
|
help deal with undeclared variables.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
In PHP, there are two major cases where you get undeclared variables -- either
|
|
|
|
you directly use a variable without declaring it:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
COUNTEREXAMPLE, lang=php
|
2011-05-15 17:12:56 +02:00
|
|
|
function f() {
|
|
|
|
if ($not_declared) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
...or you index into an array with an index which may not exist:
|
|
|
|
|
|
|
|
COUNTEREXAMPLE
|
|
|
|
function f(array $mystery) {
|
|
|
|
if ($mystery['stuff']) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
When you do either of these, PHP issues a warning. Avoid these warnings by
|
|
|
|
using `empty()` and `isset()` to do tests that are safe to apply to undeclared
|
|
|
|
variables.
|
|
|
|
|
|
|
|
`empty()` evaluates truthiness exactly opposite of `if()`. `isset()` returns
|
|
|
|
`true` for everything except `null`. This is the truth table:
|
|
|
|
|
|
|
|
| Value | `if()` | `empty()` | `isset()` |
|
|
|
|
|-------|--------|-----------|-----------|
|
|
|
|
| `null` | `false` | `true` | `false` |
|
|
|
|
| `0` | `false` | `true` | `true` |
|
|
|
|
| `0.0` | `false` | `true` | `true` |
|
|
|
|
| `"0"` | `false` | `true` | `true` |
|
|
|
|
| `""` | `false` | `true` | `true` |
|
|
|
|
| `false` | `false` | `true` | `true` |
|
|
|
|
| `array()` | `false` | `true` | `true` |
|
|
|
|
| Everything else | `true` | `false` | `true` |
|
|
|
|
|
|
|
|
The value of these operators is that they accept undeclared variables and do
|
|
|
|
not issue a warning. Specifically, if you try to do this you get a warning:
|
|
|
|
|
|
|
|
```lang=php, COUNTEREXAMPLE
|
|
|
|
if ($not_previously_declared) { // PHP Notice: Undefined variable!
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
But these are fine:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
if (empty($not_previously_declared)) { // No notice, returns true.
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
if (isset($not_previously_declared)) { // No notice, returns false.
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
So, `isset()` really means
|
|
|
|
`is_declared_and_is_set_to_something_other_than_null()`. `empty()` really means
|
|
|
|
`is_falsey_or_is_not_declared()`. Thus:
|
|
|
|
|
|
|
|
- If a variable is known to exist, test falsiness with `if (!$v)`, not
|
|
|
|
`empty()`. In particular, test for empty arrays with `if (!$array)`. There
|
|
|
|
is no reason to ever use `empty()` on a declared variable.
|
|
|
|
- When you use `isset()` on an array key, like `isset($array['key'])`, it
|
|
|
|
will evaluate to "false" if the key exists but has the value `null`! Test
|
|
|
|
for index existence with `array_key_exists()`.
|
|
|
|
|
|
|
|
Put another way, use `isset()` if you want to type `if ($value !== null)` but
|
|
|
|
are testing something that may not be declared. Use `empty()` if you want to
|
|
|
|
type `if (!$value)` but you are testing something that may not be declared.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
= usort(), uksort(), and uasort() are Slow =
|
|
|
|
|
|
|
|
This family of functions is often extremely slow for large datasets. You should
|
|
|
|
avoid them if at all possible. Instead, build an array which contains surrogate
|
|
|
|
keys that are naturally sortable with a function that uses native comparison
|
2015-06-14 23:32:28 +02:00
|
|
|
(e.g., `sort()`, `asort()`, `ksort()`, or `natcasesort()`). Sort this array
|
|
|
|
instead, and use it to reorder the original array.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2012-08-15 22:47:03 +02:00
|
|
|
In a libphutil environment, you can often do this easily with
|
|
|
|
@{function@libphutil:isort} or @{function@libphutil:msort}.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `array_intersect()` and `array_diff()` are Also Slow =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
These functions are much slower for even moderately large inputs than
|
2015-06-14 23:32:28 +02:00
|
|
|
`array_intersect_key()` and `array_diff_key()`, because they can not make the
|
2015-05-31 02:07:45 +02:00
|
|
|
assumption that their inputs are unique scalars as the `key` varieties can.
|
|
|
|
Strongly prefer the `key` varieties.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `array_uintersect()` and `array_udiff()` are Definitely Slow Too =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-05-31 02:07:45 +02:00
|
|
|
These functions have the problems of both the `usort()` family and the
|
2014-08-06 23:18:32 +02:00
|
|
|
`array_diff()` family. Avoid them.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `foreach()` Does Not Create Scope =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
Variables survive outside of the scope of `foreach()`. More problematically,
|
|
|
|
references survive outside of the scope of `foreach()`. This code mutates
|
2014-08-06 23:18:32 +02:00
|
|
|
`$array` because the reference leaks from the first loop to the second:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php, COUNTEREXAMPLE
|
|
|
|
$array = range(1, 3);
|
|
|
|
echo implode(',', $array); // Outputs '1,2,3'
|
|
|
|
foreach ($array as &$value) {}
|
|
|
|
echo implode(',', $array); // Outputs '1,2,3'
|
|
|
|
foreach ($array as $value) {}
|
|
|
|
echo implode(',', $array); // Outputs '1,2,2'
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
The easiest way to avoid this is to avoid using foreach-by-reference. If you do
|
|
|
|
use it, unset the reference after the loop:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
foreach ($array as &$value) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
unset($value);
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `unserialize()` is Incredibly Slow on Large Datasets =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
The performance of `unserialize()` is nonlinear in the number of zvals you
|
|
|
|
unserialize, roughly `O(N^2)`.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
| zvals | Approximate time |
|
|
|
|
|-------|------------------|
|
|
|
|
| 10000 |5ms |
|
|
|
|
| 100000 | 85ms |
|
|
|
|
| 1000000 | 8,000ms |
|
|
|
|
| 10000000 | 72 billion years |
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= `call_user_func()` Breaks References =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
If you use `call_use_func()` to invoke a function which takes parameters by
|
2011-05-15 17:12:56 +02:00
|
|
|
reference, the variables you pass in will have their references broken and will
|
|
|
|
emerge unmodified. That is, if you have a function that takes references:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
function add_one(&$v) {
|
|
|
|
$v++;
|
|
|
|
}
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
...and you call it with `call_user_func()`:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php, COUNTEREXAMPLE
|
|
|
|
$x = 41;
|
|
|
|
call_user_func('add_one', $x);
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
...`$x` will not be modified. The solution is to use `call_user_func_array()`
|
2011-05-15 17:12:56 +02:00
|
|
|
and wrap the reference in an array:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
$x = 41;
|
|
|
|
call_user_func_array(
|
|
|
|
'add_one',
|
|
|
|
array(&$x)); // Note '&$x'!
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
This will work as expected.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= You Can't Throw From `__toString()` =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
If you throw from `__toString()`, your program will terminate uselessly and you
|
2011-05-15 17:12:56 +02:00
|
|
|
won't get the exception.
|
|
|
|
|
|
|
|
= An Object Can Have Any Scalar as a Property =
|
|
|
|
|
|
|
|
Object properties are not limited to legal variable names:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
$property = '!@#$%^&*()';
|
|
|
|
$obj->$property = 'zebra';
|
|
|
|
echo $obj->$property; // Outputs 'zebra'.
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
So, don't make assumptions about property names.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= There is an `(object)` Cast =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
You can cast a dictionary into an object.
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
$obj = (object)array('flavor' => 'coconut');
|
|
|
|
echo $obj->flavor; // Outputs 'coconut'.
|
|
|
|
echo get_class($obj); // Outputs 'stdClass'.
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
This is occasionally useful, mostly to force an object to become a Javascript
|
2015-06-14 23:32:28 +02:00
|
|
|
dictionary (vs a list) when passed to `json_encode()`.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
= Invoking `new` With an Argument Vector is Really Hard =
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
If you have some `$class_name` and some `$argv` of constructor arguments
|
|
|
|
and you want to do this:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
new $class_name($argv[0], $argv[1], ...);
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
...you'll probably invent a very interesting, very novel solution that is very
|
2012-08-15 22:47:03 +02:00
|
|
|
wrong. In a libphutil environment, solve this problem with
|
2015-06-14 23:32:28 +02:00
|
|
|
@{function@libphutil:newv}. Elsewhere, copy `newv()`'s implementation.
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
= Equality is not Transitive =
|
|
|
|
|
|
|
|
This isn't terribly surprising since equality isn't transitive in a lot of
|
2015-06-14 23:32:28 +02:00
|
|
|
languages, but the `==` operator is not transitive:
|
2011-05-15 17:12:56 +02:00
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
$a = ''; $b = 0; $c = '0a';
|
|
|
|
$a == $b; // true
|
|
|
|
$b == $c; // true
|
|
|
|
$c == $a; // false!
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
When either operand is an integer, the other operand is cast to an integer
|
2015-06-14 23:32:28 +02:00
|
|
|
before comparison. Avoid this and similar pitfalls by using the `===` operator,
|
2011-05-15 17:12:56 +02:00
|
|
|
which is transitive.
|
|
|
|
|
|
|
|
= All 676 Letters in the Alphabet =
|
|
|
|
|
|
|
|
This doesn't do what you'd expect it to do in C:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
for ($c = 'a'; $c <= 'z'; $c++) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
This is because the successor to `z` is `aa`, which is "less than" `z`.
|
|
|
|
The loop will run for ~700 iterations until it reaches `zz` and terminates.
|
|
|
|
That is, `$c` will take on these values:
|
|
|
|
|
|
|
|
```
|
|
|
|
a
|
|
|
|
b
|
|
|
|
...
|
|
|
|
y
|
|
|
|
z
|
|
|
|
aa // loop continues because 'aa' <= 'z'
|
|
|
|
ab
|
|
|
|
...
|
|
|
|
mf
|
|
|
|
mg
|
|
|
|
...
|
|
|
|
zw
|
|
|
|
zx
|
|
|
|
zy
|
|
|
|
zz // loop now terminates because 'zz' > 'z'
|
|
|
|
```
|
2011-05-15 17:12:56 +02:00
|
|
|
|
|
|
|
Instead, use this loop:
|
|
|
|
|
2015-06-14 23:32:28 +02:00
|
|
|
```lang=php
|
|
|
|
foreach (range('a', 'z') as $c) {
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|