1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-12 08:36:13 +01:00
phorge-phorge/src/infrastructure/markup/PhabricatorMarkupEngine.php

604 lines
17 KiB
PHP
Raw Normal View History

2011-01-30 22:20:56 +01:00
<?php
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Manages markup engine selection, configuration, application, caching and
* pipelining.
*
* @{class:PhabricatorMarkupEngine} can be used to render objects which
* implement @{interface:PhabricatorMarkupInterface} in a batched, cache-aware
* way. For example, if you have a list of comments written in remarkup (and
* the objects implement the correct interface) you can render them by first
* building an engine and adding the fields with @{method:addObject}.
*
* $field = 'field:body'; // Field you want to render. Each object exposes
* // one or more fields of markup.
*
* $engine = new PhabricatorMarkupEngine();
* foreach ($comments as $comment) {
* $engine->addObject($comment, $field);
* }
*
* Now, call @{method:process} to perform the actual cache/rendering
* step. This is a heavyweight call which does batched data access and
* transforms the markup into output.
*
* $engine->process();
*
* Finally, do something with the results:
*
* $results = array();
* foreach ($comments as $comment) {
* $results[] = $engine->getOutput($comment, $field);
* }
*
* If you have a single object to render, you can use the convenience method
* @{method:renderOneObject}.
*
* @task markup Markup Pipeline
* @task engine Engine Construction
*/
final class PhabricatorMarkupEngine {
private $objects = array();
private $viewer;
private $version = 12;
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/* -( Markup Pipeline )---------------------------------------------------- */
/**
* Convenience method for pushing a single object through the markup
* pipeline.
*
* @param PhabricatorMarkupInterface The object to render.
* @param string The field to render.
* @param PhabricatorUser User viewing the markup.
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
* @return string Marked up output.
* @task markup
*/
public static function renderOneObject(
PhabricatorMarkupInterface $object,
$field,
PhabricatorUser $viewer) {
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
return id(new PhabricatorMarkupEngine())
->setViewer($viewer)
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
->addObject($object, $field)
->process()
->getOutput($object, $field);
}
2011-01-30 22:20:56 +01:00
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Queue an object for markup generation when @{method:process} is
* called. You can retrieve the output later with @{method:getOutput}.
*
* @param PhabricatorMarkupInterface The object to render.
* @param string The field to render.
* @return this
* @task markup
*/
public function addObject(PhabricatorMarkupInterface $object, $field) {
$key = $this->getMarkupFieldKey($object, $field);
$this->objects[$key] = array(
'object' => $object,
'field' => $field,
);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
return $this;
}
/**
* Process objects queued with @{method:addObject}. You can then retrieve
* the output with @{method:getOutput}.
*
* @return this
* @task markup
*/
public function process() {
$keys = array();
foreach ($this->objects as $key => $info) {
if (!isset($info['markup'])) {
$keys[] = $key;
}
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
if (!$keys) {
return;
}
$objects = array_select_keys($this->objects, $keys);
// Build all the markup engines. We need an engine for each field whether
// we have a cache or not, since we still need to postprocess the cache.
$engines = array();
foreach ($objects as $key => $info) {
$engines[$key] = $info['object']->newMarkupEngine($info['field']);
$engines[$key]->setConfig('viewer', $this->viewer);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
// Load or build the preprocessor caches.
$blocks = $this->loadPreprocessorCaches($engines, $objects);
$blocks = mpull($blocks, 'getCacheData');
$this->engineCaches = $blocks;
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
// Finalize the output.
foreach ($objects as $key => $info) {
$engine = $engines[$key];
$field = $info['field'];
$object = $info['object'];
$output = $engine->postprocessText($blocks[$key]);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
$output = $object->didMarkupText($field, $output, $engine);
$this->objects[$key]['output'] = $output;
}
return $this;
}
/**
* Get the output of markup processing for a field queued with
* @{method:addObject}. Before you can call this method, you must call
* @{method:process}.
*
* @param PhabricatorMarkupInterface The object to retrieve.
* @param string The field to retrieve.
* @return string Processed output.
* @task markup
*/
public function getOutput(PhabricatorMarkupInterface $object, $field) {
$key = $this->getMarkupFieldKey($object, $field);
$this->requireKeyProcessed($key);
return $this->objects[$key]['output'];
}
/**
* Retrieve engine metadata for a given field.
*
* @param PhabricatorMarkupInterface The object to retrieve.
* @param string The field to retrieve.
* @param string The engine metadata field to retrieve.
* @param wild Optional default value.
* @task markup
*/
public function getEngineMetadata(
PhabricatorMarkupInterface $object,
$field,
$metadata_key,
$default = null) {
$key = $this->getMarkupFieldKey($object, $field);
$this->requireKeyProcessed($key);
return idx($this->engineCaches[$key]['metadata'], $metadata_key, $default);
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task markup
*/
private function requireKeyProcessed($key) {
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
if (empty($this->objects[$key])) {
throw new Exception(
"Call addObject() before using results (key = '{$key}').");
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
if (!isset($this->objects[$key]['output'])) {
throw new Exception(
'Call process() before using results.');
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
}
/**
* @task markup
*/
private function getMarkupFieldKey(
PhabricatorMarkupInterface $object,
$field) {
static $custom;
if ($custom === null) {
$custom = array_merge(
self::loadCustomInlineRules(),
self::loadCustomBlockRules());
$custom = mpull($custom, 'getRuleVersion', null);
ksort($custom);
$custom = PhabricatorHash::digestForIndex(serialize($custom));
}
return $object->getMarkupFieldKey($field).'@'.$this->version.'@'.$custom;
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
/**
* @task markup
*/
private function loadPreprocessorCaches(array $engines, array $objects) {
$blocks = array();
$use_cache = array();
foreach ($objects as $key => $info) {
if ($info['object']->shouldUseMarkupCache($info['field'])) {
$use_cache[$key] = true;
}
}
if ($use_cache) {
try {
$blocks = id(new PhabricatorMarkupCache())->loadAllWhere(
'cacheKey IN (%Ls)',
array_keys($use_cache));
$blocks = mpull($blocks, null, 'getCacheKey');
} catch (Exception $ex) {
phlog($ex);
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
foreach ($objects as $key => $info) {
// False check in case MySQL doesn't support unicode characters
// in the string (T1191), resulting in unserialize returning false.
if (isset($blocks[$key]) && $blocks[$key]->getCacheData() !== false) {
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
// If we already have a preprocessing cache, we don't need to rebuild
// it.
continue;
}
$text = $info['object']->getMarkupText($info['field']);
$data = $engines[$key]->preprocessText($text);
// NOTE: This is just debugging information to help sort out cache issues.
// If one machine is misconfigured and poisoning caches you can use this
// field to hunt it down.
$metadata = array(
'host' => php_uname('n'),
);
$blocks[$key] = id(new PhabricatorMarkupCache())
->setCacheKey($key)
->setCacheData($data)
->setMetadata($metadata);
if (isset($use_cache[$key])) {
// This is just filling a cache and always safe, even on a read pathway.
$unguarded = AphrontWriteGuard::beginScopedUnguardedWrites();
$blocks[$key]->replace();
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
unset($unguarded);
}
}
return $blocks;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Set the viewing user. Used to implement object permissions.
*
* @param PhabricatorUser The viewing user.
* @return this
* @task markup
*/
public function setViewer(PhabricatorUser $viewer) {
$this->viewer = $viewer;
return $this;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/* -( Engine Construction )------------------------------------------------ */
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newManiphestMarkupEngine() {
return self::newMarkupEngine(array(
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newPhrictionMarkupEngine() {
return self::newMarkupEngine(array(
'header.generate-toc' => true,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newPhameMarkupEngine() {
return self::newMarkupEngine(array(
'macros' => false,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newFeedMarkupEngine() {
return self::newMarkupEngine(
array(
'macros' => false,
'youtube' => false,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newDifferentialMarkupEngine(array $options = array()) {
return self::newMarkupEngine(array(
'differential.diff' => idx($options, 'differential.diff'),
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newDiffusionMarkupEngine(array $options = array()) {
return self::newMarkupEngine(array(
'header.generate-toc' => true,
));
}
/**
* @task engine
*/
public static function getEngine($ruleset = 'default') {
static $engines = array();
if (isset($engines[$ruleset])) {
return $engines[$ruleset];
}
$engine = null;
switch ($ruleset) {
case 'default':
$engine = self::newMarkupEngine(array());
break;
case 'nolinebreaks':
$engine = self::newMarkupEngine(array());
$engine->setConfig('preserve-linebreaks', false);
break;
case 'diffusion-readme':
$engine = self::newMarkupEngine(array());
$engine->setConfig('preserve-linebreaks', false);
$engine->setConfig('header.generate-toc', true);
break;
case 'diviner':
$engine = self::newMarkupEngine(array());
$engine->setConfig('preserve-linebreaks', false);
// $engine->setConfig('diviner.renderer', new DivinerDefaultRenderer());
$engine->setConfig('header.generate-toc', true);
break;
case 'extract':
// Engine used for reference/edge extraction. Turn off anything which
// is slow and doesn't change reference extraction.
$engine = self::newMarkupEngine(array());
$engine->setConfig('pygments.enabled', false);
break;
default:
throw new Exception("Unknown engine ruleset: {$ruleset}!");
}
$engines[$ruleset] = $engine;
return $engine;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
private static function getMarkupEngineDefaultConfiguration() {
return array(
'pygments' => PhabricatorEnv::getEnvConfig('pygments.enabled'),
'youtube' => PhabricatorEnv::getEnvConfig(
'remarkup.enable-embedded-youtube'),
'differential.diff' => null,
'header.generate-toc' => false,
'macros' => true,
'uri.allowed-protocols' => PhabricatorEnv::getEnvConfig(
'uri.allowed-protocols'),
'syntax-highlighter.engine' => PhabricatorEnv::getEnvConfig(
'syntax-highlighter.engine'),
'preserve-linebreaks' => true,
);
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newMarkupEngine(array $options) {
$options += self::getMarkupEngineDefaultConfiguration();
2011-01-30 22:20:56 +01:00
$engine = new PhutilRemarkupEngine();
$engine->setConfig('preserve-linebreaks', $options['preserve-linebreaks']);
$engine->setConfig('pygments.enabled', $options['pygments']);
$engine->setConfig(
'uri.allowed-protocols',
$options['uri.allowed-protocols']);
$engine->setConfig('differential.diff', $options['differential.diff']);
$engine->setConfig('header.generate-toc', $options['header.generate-toc']);
$engine->setConfig(
'syntax-highlighter.engine',
$options['syntax-highlighter.engine']);
2011-01-30 22:20:56 +01:00
$rules = array();
$rules[] = new PhutilRemarkupRuleEscapeRemarkup();
$rules[] = new PhutilRemarkupRuleMonospace();
2012-08-08 03:51:47 +02:00
$rules[] = new PhutilRemarkupRuleDocumentLink();
if ($options['youtube']) {
$rules[] = new PhabricatorRemarkupRuleYoutube();
}
$applications = PhabricatorApplication::getAllInstalledApplications();
foreach ($applications as $application) {
foreach ($application->getRemarkupRules() as $rule) {
$rules[] = $rule;
}
}
$rules[] = new PhutilRemarkupRuleHyperlink();
if ($options['macros']) {
$rules[] = new PhabricatorRemarkupRuleImageMacro();
$rules[] = new PhabricatorRemarkupRuleMeme();
}
$rules[] = new PhutilRemarkupRuleBold();
$rules[] = new PhutilRemarkupRuleItalic();
$rules[] = new PhutilRemarkupRuleDel();
$rules[] = new PhutilRemarkupRuleUnderline();
foreach (self::loadCustomInlineRules() as $rule) {
$rules[] = $rule;
}
2011-01-30 22:20:56 +01:00
$blocks = array();
$blocks[] = new PhutilRemarkupEngineRemarkupQuotesBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupReplyBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupLiteralBlockRule();
2011-01-30 22:20:56 +01:00
$blocks[] = new PhutilRemarkupEngineRemarkupHeaderBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupHorizontalRuleBlockRule();
2011-01-30 22:20:56 +01:00
$blocks[] = new PhutilRemarkupEngineRemarkupListBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupCodeBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupNoteBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupTableBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupSimpleTableBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupInterpreterRule();
$blocks[] = new PhutilRemarkupEngineRemarkupDefaultBlockRule();
2011-01-30 22:20:56 +01:00
foreach (self::loadCustomBlockRules() as $rule) {
$blocks[] = $rule;
}
2011-01-30 22:20:56 +01:00
foreach ($blocks as $block) {
$block->setMarkupRules($rules);
2011-01-30 22:20:56 +01:00
}
$engine->setBlockRules($blocks);
return $engine;
}
public static function extractPHIDsFromMentions(
PhabricatorUser $viewer,
array $content_blocks) {
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
$mentions = array();
$engine = self::newDifferentialMarkupEngine();
$engine->setConfig('viewer', $viewer);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
foreach ($content_blocks as $content_block) {
$engine->markupText($content_block);
$phids = $engine->getTextMetadata(
PhabricatorRemarkupRuleMention::KEY_MENTIONED,
array());
$mentions += $phids;
}
return $mentions;
}
public static function extractFilePHIDsFromEmbeddedFiles(
PhabricatorUser $viewer,
array $content_blocks) {
$files = array();
$engine = self::newDifferentialMarkupEngine();
$engine->setConfig('viewer', $viewer);
foreach ($content_blocks as $content_block) {
$engine->markupText($content_block);
$ids = $engine->getTextMetadata(
PhabricatorRemarkupRuleEmbedFile::KEY_EMBED_FILE_PHIDS,
array());
$files += $ids;
}
return $files;
}
/**
* Produce a corpus summary, in a way that shortens the underlying text
* without truncating it somewhere awkward.
*
* TODO: We could do a better job of this.
*
* @param string Remarkup corpus to summarize.
* @return string Summarized corpus.
*/
public static function summarize($corpus) {
// Major goals here are:
// - Don't split in the middle of a character (utf-8).
// - Don't split in the middle of, e.g., **bold** text, since
// we end up with hanging '**' in the summary.
// - Try not to pick an image macro, header, embedded file, etc.
// - Hopefully don't return too much text. We don't explicitly limit
// this right now.
$blocks = preg_split("/\n *\n\s*/", trim($corpus));
$best = null;
foreach ($blocks as $block) {
// This is a test for normal spaces in the block, i.e. a heuristic to
// distinguish standard paragraphs from things like image macros. It may
// not work well for non-latin text. We prefer to summarize with a
// paragraph of normal words over an image macro, if possible.
$has_space = preg_match('/\w\s\w/', $block);
// This is a test to find embedded images and headers. We prefer to
// summarize with a normal paragraph over a header or an embedded object,
// if possible.
$has_embed = preg_match('/^[{=]/', $block);
if ($has_space && !$has_embed) {
// This seems like a good summary, so return it.
return $block;
}
if (!$best) {
// This is the first block we found; if everything is garbage just
// use the first block.
$best = $block;
}
}
return $best;
}
private static function loadCustomInlineRules() {
return id(new PhutilSymbolLoader())
->setAncestorClass('PhabricatorRemarkupCustomInlineRule')
->loadObjects();
}
private static function loadCustomBlockRules() {
return id(new PhutilSymbolLoader())
->setAncestorClass('PhabricatorRemarkupCustomBlockRule')
->loadObjects();
}
2011-01-30 22:20:56 +01:00
}