1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-23 07:12:41 +01:00

Provide general documentation on how to use perfomance tools

Summary: Ref T8617. Provide general documentation with tools for debugging hangs and slow pages. Update DarkConsole docs and discuss how to use Services and XHProf. Explain what Multimeter is for and how to use it. Update XHProf docs and provide some usage hints.

Test Plan: Read documentation.

Reviewers: joshuaspence, btrahan

Reviewed By: joshuaspence, btrahan

Subscribers: joshuaspence, epriestley

Maniphest Tasks: T8617

Differential Revision: https://secure.phabricator.com/D13359
This commit is contained in:
epriestley 2015-06-20 05:30:17 -07:00
parent 04516d256b
commit b22fba1ab5
8 changed files with 574 additions and 114 deletions

View file

@ -43,4 +43,13 @@ final class PhabricatorMultimeterApplication
);
}
public function getHelpDocumentationArticles(PhabricatorUser $viewer) {
return array(
array(
'name' => pht('Multimeter User Guide'),
'href' => PhabricatorEnv::getDoclink('Multimeter User Guide'),
),
);
}
}

View file

@ -28,6 +28,9 @@
},
"userguide": {
"name": "Application User Guides"
},
"fieldmanual": {
"name": "Field Manuals"
}
}
}

View file

@ -1,60 +0,0 @@
@title Using DarkConsole
@group developer
Enabling and using the built-in debugging console.
= Overview =
DarkConsole is a debugging console built into Phabricator which exposes
configuration, performance and error information. It can help you detect,
understand and resolve bugs and performance problems in Phabricator
applications.
DarkConsole was originally implemented as part of the Facebook Lite site; its
name is a bit of play on that (and a reference to the dark color palette its
design uses).
= Warning =
Because DarkConsole exposes some configuration and debugging information, it is
disabled by default (and **you should not enable it in production**). It has
some simple safeguards to prevent leaking credential information, but enabling
it in production may compromise the integrity of an install.
= Enabling DarkConsole =
You enable DarkConsole in your configuration, by setting `darkconsole.enabled`
to `true`, and then turning it on in `Settings` -> `Developer Settings`. Once
DarkConsole is enabled, you can show or hide it by pressing ##`## on your
keyboard.
Since the setting is not available to logged-out users, you can also set
`darkconsole.always-on` if you need to access DarkConsole on logged-out pages.
DarkConsole has a number of tabs, each of which is powered by a "plugin". You
can use them to access different debugging and performance features.
= Plugin: Error Log =
The "Error Log" plugin shows errors that occurred while generating the page,
similar to the httpd `error.log`. You can send information to the error log
explicitly with the @{function@libphutil:phlog} function.
If errors occurred, a red dot will appear on the plugin tab.
= Plugin: Request =
The "Request" plugin shows information about the HTTP request the server
received, and the server itself.
= Plugin: Services =
The "Services" plugin lists calls a page made to external services, like
MySQL and the command line.
= Plugin: XHProf =
The "XHProf" plugin gives you access to the XHProf profiler. To use it, you need
to install the corresponding PHP plugin -- see instructions in the
@{article:Installation Guide}. Once it is installed, you can use XHProf to
profile the runtime performance of a page.

View file

@ -1,54 +0,0 @@
@title Installing XHProf
@group developer
Describes how to install XHProf, a PHP profiling tool.
Overview
========
You can install XHProf to activate the XHProf tab in DarkConsole and the
`--xprofile` flag from the CLI. This will allow you to generate performance
profiles of pages and scripts, which can be tremendously valuable in identifying
and fixing slow code.
Installing XHProf
=================
XHProf is a PHP profiling tool. You don't need to install it unless you are
developing Phabricator and making performance changes.
You can install xhprof with:
$ pecl install xhprof
If you have a PEAR version prior to 1.9.3, you may run into a `phpize` failure.
If so, you can download the source and build it with:
$ cd extension/
$ phpize
$ ./configure
$ make
$ sudo make install
You may also need to add `extension=xhprof.so` to your php.ini.
See <https://bugs.php.net/bug.php?id=59747> for more information.
Using XHProf: Web
=================
To profile a web page, activate DarkConsole and navigate to the XHProf tab.
Use the **Profile Page** button to generate a profile.
Using XHProf: CLI
=================
From the command line, use the `--xprofile <filename>` flag to generate a
profile of any script.
Next Steps
==========
Continue by:
- enabling DarkConsole with @{article:Using DarkConsole}.

View file

@ -0,0 +1,162 @@
@title Using DarkConsole
@group fieldmanual
Enabling and using the built-in debugging and performance console.
Overview
========
DarkConsole is a debugging console built into Phabricator which exposes
configuration, performance and error information. It can help you detect,
understand and resolve bugs and performance problems in Phabricator
applications.
Security Warning
================
WARNING: Because DarkConsole exposes some configuration and debugging
information, it is disabled by default and you should be cautious about
enabling it in production.
Particularly, DarkConsole may expose some information about your session
details or other private material. It has some crude safeguards against this,
but does not completely sanitize output.
This is mostly a risk if you take screenshots or copy/paste output and share
it with others.
Enabling DarkConsole
====================
You enable DarkConsole in your configuration, by setting `darkconsole.enabled`
to `true`, and then turning it on in {nav Settings > Developer Settings}.
Once DarkConsole is enabled, you can show or hide it by pressing ##`## on your
keyboard.
Since the setting is not available to logged-out users, you can also set
`darkconsole.always-on` if you need to access DarkConsole on logged-out pages.
DarkConsole has a number of tabs, each of which is powered by a "plugin". You
can use them to access different debugging and performance features.
Plugin: Error Log
=================
The "Error Log" plugin shows errors that occurred while generating the page,
similar to the httpd `error.log`. You can send information to the error log
explicitly with the @{function@libphutil:phlog} function.
If errors occurred, a red dot will appear on the plugin tab.
Plugin: Request
===============
The "Request" plugin shows information about the HTTP request the server
received, and the server itself.
Plugin: Services
================
The "Services" plugin lists calls a page made to external services, like
MySQL and subprocesses.
The Services tab can help you understand and debug issues related to page
behavior: for example, you can use it to see exactly what queries or commands a
page is running. In some cases, you can re-run those queries or commands
yourself to examine their output and look for problems.
This tab can also be particularly useful in understanding page performance,
because many performance problems are caused by inefficient queries (queries
with bad query plans or which take too long) or repeated queries (queries which
could be better structured or benefit from caching).
When analyzing performance problems, the major things to look for are:
**Summary**: In the summary table at the top of the tab, are any categories
of events dominating the performance cost? For normal pages, the costs should
be roughly along these lines:
| Event Type | Approximate Cost |
|---|---|
| Connect | 1%-10% |
| Query | 10%-40% |
| Cache | 1% |
| Event | 1% |
| Conduit | 0%-80% |
| Exec | 0%-80% |
| All Services | 10%-75% |
| Entire Page | 100ms - 1000ms |
These ranges are rough, but should usually be what you expect from a page
summary. If any of these numbers are way off (for example, "Event" is taking
50% of runtime), that points toward a possible problem in that section of the
code, and can guide you to examining the related service calls more carefully.
**Duration**: In the Duration column, look for service calls that take a long
time. Sometimes these calls are just what the page is doing, but sometimes they
may indicate a problem.
Some questions that may help understanding this column are: are there a small
number of calls which account for a majority of the total page generation time?
Do these calls seem fundamental to the behavior of the page, or is it not clear
why they need to be made? Do some of them seem like they could be cached?
If there are queries which look slow, using the "Analyze Query Plans" button
may help reveal poor query plans.
Generally, this column can help pinpoint these kinds of problems:
- Queries or other service calls which are huge and inefficient.
- Work the page is doing which it could cache instead.
- Problems with network services.
- Missing keys or poor query plans.
**Repeated Calls**: In the "Details" column, look for service calls that are
being made over and over again. Sometimes this is normal, but usually it
indicates a call that can be batched or cached.
Some things to look for are: are similar calls being made over and over again?
Do calls mostly make sense given what the page is doing? Could any calls be
cached? Could multiple small calls be collected into one larger call? Are any
of the service calls clearly goofy nonsense that shouldn't be happening?
Generally, this column can help pinpoint these kinds of problems:
- Unbatched queries which should be batched (see
@{article:Performance: N+1 Query Problem}).
- Opportunities to improve performance with caching.
- General goofiness in how service calls are woking.
If the services tab looks fine, and particularly if a page is slow but the
"All Services" cost is small, that may indicate a problem in PHP. The best
tool to understand problems in PHP is XHProf.
Plugin: XHProf
==============
The "XHProf" plugin gives you access to the XHProf profiler. To use it, you need
to install the corresponding PHP plugin.
Once it is installed, you can use XHProf to profile the runtime performance of
a page. This will show you a detailed breakdown of where PHP spent time. This
can help find slow or inefficient application code, and is the most powerful
general-purpose performance tool available.
For instructions on installing and using XHProf, see @{article:Using XHProf}.
Next Steps
==========
Continue by:
- installing XHProf with @{article:Using XHProf}; or
- understanding and reporting performance issues with
@{article:Troubleshooting Performance Problems}.

View file

@ -0,0 +1,179 @@
@title Troubleshooting Performance Problems
@group fieldmanual
Guide to the troubleshooting slow pages and hangs.
Overview
========
This document describes how to isolate, examine, understand and resolve or
report performance issues like slow pages and hangs.
This document covers the general process for handling performance problems,
and outlines the major tools available for understanding them:
- **Multimeter** helps you understand sources of load and broad resource
utilization. This is a coarse, high-level tool.
- **DarkConsole** helps you dig into a specific slow page and understand
service calls. This is a general, mid-level tool.
- **XHProf** gives you detailed application performance profiles. This
is a fine-grained, low-level tool.
Performance and the Upstream
============================
Performance issues and hangs will often require upstream involvement to fully
resolve. The intent is for Phabricator to perform well in all reasonable cases,
not require tuning for different workloads (as long as those workloads are
generally reasonable). Poor performance with a reasonable workload is likely a
bug, not a configuration problem.
However, some pages are slow because Phabricator legitimately needs to do a lot
of work to generate them. For example, if you write a 100MB wiki document,
Phabricator will need substantial time to process it, it will take a long time
to download over the network, and your browser will proably not be able to
render it especially quickly.
We may be able to improve perfomance in some cases, but Phabricator is not
magic and can not wish away real complexity. The best solution to these problems
is usually to find another way to solve your problem: for example, maybe the
100MB document can be split into several smaller documents.
Here are some examples of performance problems under reasonable workloads that
the upstream can help resolve:
- {icon check, color=green} Commenting on a file and mentioning that same
file results in a hang.
- {icon check, color=green} Creating a new user takes many seconds.
- {icon check, color=green} Loading Feed hangs on 32-bit systems.
The upstream will be less able to help resolve unusual workloads with high
inherent complexity, like these:
- {icon times, color=red} A 100MB wiki page takes a long time to render.
- {icon times, color=red} A turing-complete simulation of Conway's Game of
Life implented in 958,000 Herald rules executes slowly.
- {icon times, color=red} Uploading an 8GB file takes several minutes.
Generally, the path forward will be:
- Follow the instructions in this document to gain the best understanding of
the issue (and of how to reproduce it) that you can.
- In particular, is it being caused by an unusual workload (like a 100MB
wiki page)? If so, consider other ways to solve the problem.
- File a report with the upstream by following the instructions in
@{article:Contributing Bug Reports}.
The remaining sections in this document walk through these steps.
Understanding Performance Problems
==================================
To isolate, examine, and understand performance problems, follow these steps:
**General Slowness**: If you are experiencing generally poor performance, use
Multimeter to understand resource usage and look for load-based causes. See
@{article:Multimeter User Guide}. If that isn't fruitful, treat this like a
reproducible performance problem on an arbitrary page.
**Hangs**: If you are experiencing hangs (pages which never return, or which
time out with a fatal after some number of seconds), they are almost always
the result of bugs in the upstream. Report them by following these
instructions:
- Set `debug.time-limit` to a value like `5`.
- Reproduce the hang. The page should exit after 5 seconds with a more useful
stack trace.
- File a report with the reproduction instructions and the stack trace in
the upstream. See @{article:Contributing Bug Reports} for detailed
instructions.
- Clear `debug.time-limit` again to take your install out of debug mode.
If part of the reproduction instructions include "Create a 100MB wiki page",
the upstream may be less sympathetic to your cause than if reproducing the
issue does not require an unusual, complex workload.
In some cases, the hang may really just a very large amount of processing time.
If you're very excited about 100MB wiki pages and don't mind waiting many
minutes for them to render, you may be able to adjust `max_execution_time` in
your PHP configuration to allow the process enough time to complete, or adjust
settings in your webserver config to let it wait longer for results.
**DarkConsole**: If you have a reproducible performance problem (for example,
loading a specific page is very slow), you can enable DarkConsole (a builtin
debugging console) to examine page performance in detail.
The two most useful tabs in DarkConsole are the "Services" tab and the
"XHProf" tab.
The "Services" module allows you to examine service calls (network calls,
subprocesses, events, etc) and find slow queries, slow services, inefficient
query plans, and unnecessary calls. Broadly, you're looking for slow or
repeated service calls, or calls which don't make sense given what the page
should be doing.
After installing XHProf (see @{article:Using XHProf}) you'll gain access to the
"XHProf" tab, which is a full tracing profiler. You can use the "Profile Page"
button to generate a complete trace of where a page is spending time. When
reading a profile, you're looking for the overall use of time, and for anything
which sticks out as taking unreasonably long or not making sense.
See @{article:Using DarkConsole} for complete instructions on configuring
and using DarkConsole.
**AJAX Requests**: To debug Ajax requests, activate DarkConsole and then turn
on the profiler or query analyzer on the main request by clicking the
appropriate button. The setting will cascade to Ajax requests made by the page
and they'll show up in the console with full query analysis or profiling
information.
**Command-Line Hangs**: If you have a script or daemon hanging, you can send
it `SIGHUP` to have it dump a stack trace to `sys_get_temp_dir()` (usually
`/tmp`).
Do this with:
```
$ kill -HUP <pid>
```
You can use this command to figure out where the system's temporary directory
is:
```
$ php -r 'echo sys_get_temp_dir()."\n";'
```
On most systems, this is `/tmp`. The trace should appear in that directory with
a name like `phabricator_backtrace_<pid>`. Examining this trace may provide
a key to understanding the problem.
**Command-Line Performance**: If you have general performance issues with
command-line scripts, you can add `--trace` to see a service call log. This is
similar to the "Services" tab in DarkConsole. This may help identify issues.
After installing XHProf, you can also add `--xprofile <filename>` to emit a
detailed performance profile. You can `arc upload` these files and then view
them in XHProf from the web UI.
Next Steps
==========
If you've done all you can to isolate and understand the problem you're
experiencing, report it to the upstream. Including as much relevant data as
you can, including:
- reproduction instructions;
- traces from `debug.time-limit` for hangs;
- screenshots of service call logs from DarkConsole (review these carefully,
as they can sometimes contain sensitive information);
- traces from CLI scripts with `--trace`;
- traces from sending HUP to processes; and
- XHProf profile files from `--xprofile` or "Download .xhprof Profile" in
the web UI.
After collecting this information:
- follow the instructions in @{article:Contributing Bug Reports} to file
a report in the upstream.

View file

@ -0,0 +1,122 @@
@title Using XHProf
@group fieldmanual
Describes how to install and use XHProf, a PHP profiling tool.
Overview
========
XHProf is a profiling tool which will let you understand application
performance in Phabricator.
After you install XHProf, you can use it from the web UI and the CLI to
generate detailed performance profiles. It is the most powerful tool available
for understanding application performance and identifying and fixing slow code.
Installing XHProf
=================
You are likely to have the most luck building XHProf from source:
$ git clone https://github.com/phacility/xhprof.git
From any source distribution of the extension, build and install it like this:
$ cd xhprof/
$ cd extension/
$ phpize
$ ./configure
$ make
$ sudo make install
You may also need to add `extension=xhprof.so` to your php.ini.
You can also try using PECL to install it, but this may not work well with
recent versions of PHP:
$ pecl install xhprof
Once you've installed it, `php -i` should report it as installed (you may
see a different version number, which is fine):
$ php -i | grep xhprof
...
xhprof => 0.9.2
...
Using XHProf: Web UI
====================
To profile a web page, activate DarkConsole and navigate to the XHProf tab.
Use the **Profile Page** button to generate a profile.
For instructions on activating DarkConsole, see @{article:Using DarkConsole}.
Using XHProf: CLI
=================
From the command line, use the `--xprofile <filename>` flag to generate a
profile of any script.
You can then upload this file to Phabricator (using `arc upload` may be easiest)
and view it in the web UI.
Analyzing Profiles
==================
Understanding profiles is as much art as science, so be warned that you may not
make much headway. Even if you aren't able to conclusively read a profile
yourself, you can attach profiles when submitting bug reports to the upstream
and we can look at them. This may yield new insight.
When looking at profiles, the "Wall Time (Inclusive)" column is usually the
most important. This shows the total amount of time spent in a function or
method and all of its children. Usually, to improve the performance of a page,
we're trying to find something that's slow and make it not slow: this column
can help identify which things are slowest.
The "Wall Time (Exclusive)" column shows time spent in a function or method,
excluding time spent in its children. This can give you hint about whether the
call itself is slow or it's just making calls to other things that are slow.
You can also get a sense of this by clicking a call to see its children, and
seeing if the bulk of runtime is spent in a child call. This tends to indicate
that you're looking at a problem which is deeper in the stack, and you need
to go down further to identify and understand it.
Conversely, if the "Wall Time (Exclusive)" column is large, or the children
of a call are all cheap, there's probably something expesive happening in the
call itself.
The "Count" column can also sometimes tip you off that something is amiss, if
a method which shouldn't be called very often is being called a lot.
Some general thing to look for -- these aren't smoking guns, but are unusual
and can lead to finding a performance issue:
- Is a low-level utility method like `phutil_utf8ize()` or `array_merge()`
taking more than a few percent of the page runtime?
- Do any methods (especially high-level methods) have >10,00 calls?
- Are we spending more than 100ms doing anything which isn't loading data
or rendering data?
- Does anything look suspiciously expensive or out of place?
- Is the profile for the slow page a lot different than the profile for a
fast page?
Some performance problems are obvious and will jump out of a profile; others
may require a more nuanced understanding of the codebase to sniff out which
parts are suspicious. If you aren't able to make progress with a profile,
report the issue upstream and attach the profile to your report.
Next Steps
==========
Continue by:
- enabling DarkConsole with @{article:Using DarkConsole}; or
- understanding and reporting performance problems with
@{article:Troubleshooting Performance Problems}.

View file

@ -0,0 +1,99 @@
@title Multimeter User Guide
@group userguide
Using Multimeter, a sampling profiler.
Overview
========
IMPORTANT: This document describes a prototype application.
Multimeter is a sampling profiler that can give you coarse information about
Phabricator resource usage. In particular, it can help quickly identify sources
of load, like bots or scripts which are making a very large number of requests.
Configuring and Using Multimeter
================================
To access Multimeter, go to {nav Applications > Multimeter}.
By default, Multimeter samples 0.1% of pages. This should be a reasonable rate
for most installs, but you can increase or decrease the rate by adjusting
`debug.sample-rate`. Increasing the rate (by setting the value to a lower
number, like 100, to sample 1% of pages) will increase the granualrity of the
data, at a small performance cost.
Using Multimeter
================
Multimeter shows you what Phabricator has spent time doing recently. By
looking at the samples it collects, you can identify major sources of load
or resource use, whether they are specific users, pages, subprocesses, or
other types of activity.
By identifying and understanding unexpected load, you can adjust usage patterns
or configuration to make better use of resources (for example, rewrite bots
that are making too many calls), or report specific, actionable issues to the
upstream for resolution.
The main screen of Multimeter shows you everything Phabricator has spent
resources on recently, broken down by action type. Categories are folded up
by default, with "(All)" labels.
To filter by a dimension, click the link for it. For example, from the main
page, you can click "Web Request" to filter by only web requests. To expand a
grouped dimension, click the "(All)" link.
For example, suppose we suspect that someone is running a bot that is making
a lot of requests and consuming a lot of resources. We can get a better idea
about this by filtering the results like this:
- Click {nav Web Request}. This will show only web requests.
- Click {nav (All)} under "Viewer". This will expand events by viewer.
Recent resource costs for web requests are now shown, grouped and sorted by
user. The usernames in the "Viewer" column show who is using resources, in
order from greatest use to least use (only administrators can see usernames).
The "Avg" column shows the average cost per event, while the "Cost" column
shows the total cost.
If the top few users account for similar costs and are normal, active users,
there may be nothing amiss and your problem might lie elsewhere. If a user like
`slowbot` is in the top few users and has way higher usage than anyone else,
there might be a script running under that account consuming a disproportionate
amount of resources.
Assuming you find a user with unusual usage, you could dig into their usage
like this:
- Click their name (like {nav slowbot}) to filter to just their requests.
- Click {nav (All)} under "Label". This expands by request detail.
This will show exactly what they spent those resources doing, and can help
identify if they're making a lot of API calls or scraping the site or whatever
else.
This is just an example of a specific kind of problem that Multimeter could
help resolve. In general, exploring Multimeter data by filtering and expanding
resource uses can help you understand how resources are used and identify
unexpected uses of resources. For example:
- Identify a problem with load balancing by filtering on {nav Web Request}
and expanding on {nav Host}. If hosts aren't roughly even, DNS or a load
balancer are misconfigured.
- Identify which pages cost the most by filtering on {nav Web Request}
and expanding on {nav Label}.
- Find outlier pages by filtering on {nav Web Request} and expanding on
{nav ID}.
- Find where subprocess are invoked from by filtering on {nav Subprocesses},
then expanding on {nav Context}.
Next Steps
==========
Continue by:
- understanding and reporting performance issues with
@{article:Troubleshooting Performance Problems}.