1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-02 11:42:42 +01:00
phorge-phorge/src/infrastructure/markup/PhabricatorMarkupEngine.php

499 lines
14 KiB
PHP
Raw Normal View History

2011-01-30 22:20:56 +01:00
<?php
/*
* Copyright 2012 Facebook, Inc.
2011-01-30 22:20:56 +01:00
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Manages markup engine selection, configuration, application, caching and
* pipelining.
*
* @{class:PhabricatorMarkupEngine} can be used to render objects which
* implement @{interface:PhabricatorMarkupInterface} in a batched, cache-aware
* way. For example, if you have a list of comments written in remarkup (and
* the objects implement the correct interface) you can render them by first
* building an engine and adding the fields with @{method:addObject}.
*
* $field = 'field:body'; // Field you want to render. Each object exposes
* // one or more fields of markup.
*
* $engine = new PhabricatorMarkupEngine();
* foreach ($comments as $comment) {
* $engine->addObject($comment, $field);
* }
*
* Now, call @{method:process} to perform the actual cache/rendering
* step. This is a heavyweight call which does batched data access and
* transforms the markup into output.
*
* $engine->process();
*
* Finally, do something with the results:
*
* $results = array();
* foreach ($comments as $comment) {
* $results[] = $engine->getOutput($comment, $field);
* }
*
* If you have a single object to render, you can use the convenience method
* @{method:renderOneObject}.
*
* @task markup Markup Pipeline
* @task engine Engine Construction
*/
final class PhabricatorMarkupEngine {
private $objects = array();
private $viewer;
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/* -( Markup Pipeline )---------------------------------------------------- */
/**
* Convenience method for pushing a single object through the markup
* pipeline.
*
* @param PhabricatorMarkupInterface The object to render.
* @param string The field to render.
* @param PhabricatorUser User viewing the markup.
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
* @return string Marked up output.
* @task markup
*/
public static function renderOneObject(
PhabricatorMarkupInterface $object,
$field,
PhabricatorUser $viewer) {
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
return id(new PhabricatorMarkupEngine())
->setViewer($viewer)
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
->addObject($object, $field)
->process()
->getOutput($object, $field);
}
2011-01-30 22:20:56 +01:00
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Queue an object for markup generation when @{method:process} is
* called. You can retrieve the output later with @{method:getOutput}.
*
* @param PhabricatorMarkupInterface The object to render.
* @param string The field to render.
* @return this
* @task markup
*/
public function addObject(PhabricatorMarkupInterface $object, $field) {
$key = $this->getMarkupFieldKey($object, $field);
$this->objects[$key] = array(
'object' => $object,
'field' => $field,
);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
return $this;
}
/**
* Process objects queued with @{method:addObject}. You can then retrieve
* the output with @{method:getOutput}.
*
* @return this
* @task markup
*/
public function process() {
$keys = array();
foreach ($this->objects as $key => $info) {
if (!isset($info['markup'])) {
$keys[] = $key;
}
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
if (!$keys) {
return;
}
$objects = array_select_keys($this->objects, $keys);
// Build all the markup engines. We need an engine for each field whether
// we have a cache or not, since we still need to postprocess the cache.
$engines = array();
foreach ($objects as $key => $info) {
$engines[$key] = $info['object']->newMarkupEngine($info['field']);
$engines[$key]->setConfig('viewer', $this->viewer);
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
}
// Load or build the preprocessor caches.
$blocks = $this->loadPreprocessorCaches($engines, $objects);
// Finalize the output.
foreach ($objects as $key => $info) {
$data = $blocks[$key]->getCacheData();
$engine = $engines[$key];
$field = $info['field'];
$object = $info['object'];
$output = $engine->postprocessText($data);
$output = $object->didMarkupText($field, $output, $engine);
$this->objects[$key]['output'] = $output;
}
return $this;
}
/**
* Get the output of markup processing for a field queued with
* @{method:addObject}. Before you can call this method, you must call
* @{method:process}.
*
* @param PhabricatorMarkupInterface The object to retrieve.
* @param string The field to retrieve.
* @return string Processed output.
* @task markup
*/
public function getOutput(PhabricatorMarkupInterface $object, $field) {
$key = $this->getMarkupFieldKey($object, $field);
if (empty($this->objects[$key])) {
throw new Exception(
"Call addObject() before getOutput() (key = '{$key}').");
}
if (!isset($this->objects[$key]['output'])) {
throw new Exception(
"Call process() before getOutput().");
}
return $this->objects[$key]['output'];
}
/**
* @task markup
*/
private function getMarkupFieldKey(
PhabricatorMarkupInterface $object,
$field) {
return $object->getMarkupFieldKey($field);
}
/**
* @task markup
*/
private function loadPreprocessorCaches(array $engines, array $objects) {
$blocks = array();
$use_cache = array();
foreach ($objects as $key => $info) {
if ($info['object']->shouldUseMarkupCache($info['field'])) {
$use_cache[$key] = true;
}
}
if ($use_cache) {
$blocks = id(new PhabricatorMarkupCache())->loadAllWhere(
'cacheKey IN (%Ls)',
array_keys($use_cache));
$blocks = mpull($blocks, null, 'getCacheKey');
}
foreach ($objects as $key => $info) {
if (isset($blocks[$key])) {
// If we already have a preprocessing cache, we don't need to rebuild
// it.
continue;
}
$text = $info['object']->getMarkupText($info['field']);
$data = $engines[$key]->preprocessText($text);
// NOTE: This is just debugging information to help sort out cache issues.
// If one machine is misconfigured and poisoning caches you can use this
// field to hunt it down.
$metadata = array(
'host' => php_uname('n'),
);
$blocks[$key] = id(new PhabricatorMarkupCache())
->setCacheKey($key)
->setCacheData($data)
->setMetadata($metadata);
if (isset($use_cache[$key])) {
// This is just filling a cache and always safe, even on a read pathway.
$unguarded = AphrontWriteGuard::beginScopedUnguardedWrites();
try {
$blocks[$key]->save();
} catch (AphrontQueryDuplicateKeyException $ex) {
// Ignore this, we just raced to write the cache.
}
unset($unguarded);
}
}
return $blocks;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* Set the viewing user. Used to implement object permissions.
*
* @param PhabricatorUser The viewing user.
* @return this
* @task markup
*/
public function setViewer(PhabricatorUser $viewer) {
$this->viewer = $viewer;
return $this;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/* -( Engine Construction )------------------------------------------------ */
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newManiphestMarkupEngine() {
return self::newMarkupEngine(array(
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newPhrictionMarkupEngine() {
return self::newMarkupEngine(array(
'header.generate-toc' => true,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newPhameMarkupEngine() {
return self::newMarkupEngine(array(
'macros' => false,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newFeedMarkupEngine() {
return self::newMarkupEngine(
array(
'macros' => false,
'fileproxy' => false,
'youtube' => false,
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newDifferentialMarkupEngine(array $options = array()) {
return self::newMarkupEngine(array(
'custom-inline' => PhabricatorEnv::getEnvConfig(
'differential.custom-remarkup-rules'),
'custom-block' => PhabricatorEnv::getEnvConfig(
'differential.custom-remarkup-block-rules'),
'differential.diff' => idx($options, 'differential.diff'),
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newDiffusionMarkupEngine(array $options = array()) {
return self::newMarkupEngine(array(
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newProfileMarkupEngine() {
return self::newMarkupEngine(array(
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
public static function newSlowvoteMarkupEngine() {
return self::newMarkupEngine(array(
));
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
public static function newPonderMarkupEngine(array $options = array()) {
return self::newMarkupEngine($options);
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
private static function getMarkupEngineDefaultConfiguration() {
return array(
'pygments' => PhabricatorEnv::getEnvConfig('pygments.enabled'),
'fileproxy' => PhabricatorEnv::getEnvConfig('files.enable-proxy'),
'youtube' => PhabricatorEnv::getEnvConfig(
'remarkup.enable-embedded-youtube'),
'custom-inline' => array(),
'custom-block' => array(),
'differential.diff' => null,
'header.generate-toc' => false,
'macros' => true,
'uri.allowed-protocols' => PhabricatorEnv::getEnvConfig(
'uri.allowed-protocols'),
'syntax-highlighter.engine' => PhabricatorEnv::getEnvConfig(
'syntax-highlighter.engine'),
);
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
/**
* @task engine
*/
private static function newMarkupEngine(array $options) {
$options += self::getMarkupEngineDefaultConfiguration();
2011-01-30 22:20:56 +01:00
$engine = new PhutilRemarkupEngine();
$engine->setConfig('preserve-linebreaks', true);
$engine->setConfig('pygments.enabled', $options['pygments']);
$engine->setConfig(
'uri.allowed-protocols',
$options['uri.allowed-protocols']);
$engine->setConfig('differential.diff', $options['differential.diff']);
$engine->setConfig('header.generate-toc', $options['header.generate-toc']);
$engine->setConfig(
'syntax-highlighter.engine',
$options['syntax-highlighter.engine']);
2011-01-30 22:20:56 +01:00
$rules = array();
$rules[] = new PhutilRemarkupRuleEscapeRemarkup();
$rules[] = new PhutilRemarkupRuleMonospace();
$custom_rule_classes = $options['custom-inline'];
if ($custom_rule_classes) {
foreach ($custom_rule_classes as $custom_rule_class) {
$rules[] = newv($custom_rule_class, array());
}
}
2012-08-08 03:51:47 +02:00
$rules[] = new PhutilRemarkupRuleDocumentLink();
if ($options['fileproxy']) {
$rules[] = new PhabricatorRemarkupRuleProxyImage();
}
if ($options['youtube']) {
$rules[] = new PhabricatorRemarkupRuleYoutube();
}
2011-01-30 22:20:56 +01:00
$rules[] = new PhutilRemarkupRuleHyperlink();
$rules[] = new PhabricatorRemarkupRulePhriction();
$rules[] = new PhabricatorRemarkupRuleDifferentialHandle();
$rules[] = new PhabricatorRemarkupRuleManiphestHandle();
$rules[] = new PhabricatorRemarkupRuleEmbedFile();
$rules[] = new PhabricatorRemarkupRuleDifferential();
2011-04-11 12:02:19 +02:00
$rules[] = new PhabricatorRemarkupRuleDiffusion();
$rules[] = new PhabricatorRemarkupRuleManiphest();
$rules[] = new PhabricatorRemarkupRulePaste();
$rules[] = new PhabricatorRemarkupRuleCountdown();
$rules[] = new PonderRuleQuestion();
if ($options['macros']) {
$rules[] = new PhabricatorRemarkupRuleImageMacro();
}
$rules[] = new PhabricatorRemarkupRuleMention();
$rules[] = new PhutilRemarkupRuleEscapeHTML();
$rules[] = new PhutilRemarkupRuleBold();
$rules[] = new PhutilRemarkupRuleItalic();
$rules[] = new PhutilRemarkupRuleDel();
2011-01-30 22:20:56 +01:00
$blocks = array();
$blocks[] = new PhutilRemarkupEngineRemarkupQuotesBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupLiteralBlockRule();
2011-01-30 22:20:56 +01:00
$blocks[] = new PhutilRemarkupEngineRemarkupHeaderBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupListBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupCodeBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupNoteBlockRule();
$blocks[] = new PhutilRemarkupEngineRemarkupTableBlockRule();
2011-01-30 22:20:56 +01:00
$blocks[] = new PhutilRemarkupEngineRemarkupDefaultBlockRule();
$custom_block_rule_classes = $options['custom-block'];
if ($custom_block_rule_classes) {
foreach ($custom_block_rule_classes as $custom_block_rule_class) {
$blocks[] = newv($custom_block_rule_class, array());
}
}
2011-01-30 22:20:56 +01:00
foreach ($blocks as $block) {
if ($block instanceof PhutilRemarkupEngineRemarkupLiteralBlockRule) {
$literal_rules = array();
$literal_rules[] = new PhutilRemarkupRuleEscapeHTML();
$literal_rules[] = new PhutilRemarkupRuleLinebreaks();
$block->setMarkupRules($literal_rules);
} else if (
!($block instanceof PhutilRemarkupEngineRemarkupCodeBlockRule)) {
2011-01-30 22:20:56 +01:00
$block->setMarkupRules($rules);
}
}
$engine->setBlockRules($blocks);
return $engine;
}
Add a generic multistep Markup cache Summary: The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive. The broader issue is that out markup caches aren't very good right now. They have three major problems: **Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different. To solve this, I created a dedicated cache database that I plan to move all markup caches to use. **Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks. To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases. This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering. **Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward. Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered). I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON. Test Plan: - Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table). - Verified that published documents come out of cache. - Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches. Reviewers: btrahan Reviewed By: btrahan CC: aran Maniphest Tasks: T1366 Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
public static function extractPHIDsFromMentions(array $content_blocks) {
$mentions = array();
$engine = self::newDifferentialMarkupEngine();
foreach ($content_blocks as $content_block) {
$engine->markupText($content_block);
$phids = $engine->getTextMetadata(
PhabricatorRemarkupRuleMention::KEY_MENTIONED,
array());
$mentions += $phids;
}
return $mentions;
}
2011-01-30 22:20:56 +01:00
}