2011-01-30 22:20:56 +01:00
|
|
|
<?php
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
/**
|
|
|
|
* Manages markup engine selection, configuration, application, caching and
|
|
|
|
* pipelining.
|
|
|
|
*
|
|
|
|
* @{class:PhabricatorMarkupEngine} can be used to render objects which
|
|
|
|
* implement @{interface:PhabricatorMarkupInterface} in a batched, cache-aware
|
|
|
|
* way. For example, if you have a list of comments written in remarkup (and
|
|
|
|
* the objects implement the correct interface) you can render them by first
|
|
|
|
* building an engine and adding the fields with @{method:addObject}.
|
|
|
|
*
|
|
|
|
* $field = 'field:body'; // Field you want to render. Each object exposes
|
|
|
|
* // one or more fields of markup.
|
|
|
|
*
|
|
|
|
* $engine = new PhabricatorMarkupEngine();
|
|
|
|
* foreach ($comments as $comment) {
|
|
|
|
* $engine->addObject($comment, $field);
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* Now, call @{method:process} to perform the actual cache/rendering
|
|
|
|
* step. This is a heavyweight call which does batched data access and
|
|
|
|
* transforms the markup into output.
|
|
|
|
*
|
|
|
|
* $engine->process();
|
|
|
|
*
|
|
|
|
* Finally, do something with the results:
|
|
|
|
*
|
|
|
|
* $results = array();
|
|
|
|
* foreach ($comments as $comment) {
|
|
|
|
* $results[] = $engine->getOutput($comment, $field);
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* If you have a single object to render, you can use the convenience method
|
|
|
|
* @{method:renderOneObject}.
|
|
|
|
*
|
|
|
|
* @task markup Markup Pipeline
|
|
|
|
* @task engine Engine Construction
|
|
|
|
*/
|
|
|
|
final class PhabricatorMarkupEngine {
|
|
|
|
|
|
|
|
private $objects = array();
|
2012-09-05 20:40:48 +02:00
|
|
|
private $viewer;
|
2013-01-21 20:57:58 +01:00
|
|
|
private $version = 2;
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
|
|
|
|
/* -( Markup Pipeline )---------------------------------------------------- */
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Convenience method for pushing a single object through the markup
|
|
|
|
* pipeline.
|
|
|
|
*
|
|
|
|
* @param PhabricatorMarkupInterface The object to render.
|
|
|
|
* @param string The field to render.
|
2012-09-05 20:40:48 +02:00
|
|
|
* @param PhabricatorUser User viewing the markup.
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
* @return string Marked up output.
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
public static function renderOneObject(
|
|
|
|
PhabricatorMarkupInterface $object,
|
2012-09-05 20:40:48 +02:00
|
|
|
$field,
|
|
|
|
PhabricatorUser $viewer) {
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
return id(new PhabricatorMarkupEngine())
|
2012-09-05 20:40:48 +02:00
|
|
|
->setViewer($viewer)
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
->addObject($object, $field)
|
|
|
|
->process()
|
|
|
|
->getOutput($object, $field);
|
|
|
|
}
|
2011-01-30 22:20:56 +01:00
|
|
|
|
2011-06-24 20:50:19 +02:00
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
/**
|
|
|
|
* Queue an object for markup generation when @{method:process} is
|
|
|
|
* called. You can retrieve the output later with @{method:getOutput}.
|
|
|
|
*
|
|
|
|
* @param PhabricatorMarkupInterface The object to render.
|
|
|
|
* @param string The field to render.
|
|
|
|
* @return this
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
public function addObject(PhabricatorMarkupInterface $object, $field) {
|
|
|
|
$key = $this->getMarkupFieldKey($object, $field);
|
|
|
|
$this->objects[$key] = array(
|
|
|
|
'object' => $object,
|
|
|
|
'field' => $field,
|
|
|
|
);
|
2011-06-24 20:50:19 +02:00
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Process objects queued with @{method:addObject}. You can then retrieve
|
|
|
|
* the output with @{method:getOutput}.
|
|
|
|
*
|
|
|
|
* @return this
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
public function process() {
|
|
|
|
$keys = array();
|
|
|
|
foreach ($this->objects as $key => $info) {
|
|
|
|
if (!isset($info['markup'])) {
|
|
|
|
$keys[] = $key;
|
|
|
|
}
|
2011-06-24 20:50:19 +02:00
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
if (!$keys) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
$objects = array_select_keys($this->objects, $keys);
|
|
|
|
|
|
|
|
// Build all the markup engines. We need an engine for each field whether
|
|
|
|
// we have a cache or not, since we still need to postprocess the cache.
|
|
|
|
$engines = array();
|
|
|
|
foreach ($objects as $key => $info) {
|
|
|
|
$engines[$key] = $info['object']->newMarkupEngine($info['field']);
|
2012-09-05 20:40:48 +02:00
|
|
|
$engines[$key]->setConfig('viewer', $this->viewer);
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
// Load or build the preprocessor caches.
|
|
|
|
$blocks = $this->loadPreprocessorCaches($engines, $objects);
|
|
|
|
|
|
|
|
// Finalize the output.
|
|
|
|
foreach ($objects as $key => $info) {
|
|
|
|
$data = $blocks[$key]->getCacheData();
|
|
|
|
$engine = $engines[$key];
|
|
|
|
$field = $info['field'];
|
|
|
|
$object = $info['object'];
|
|
|
|
|
|
|
|
$output = $engine->postprocessText($data);
|
|
|
|
$output = $object->didMarkupText($field, $output, $engine);
|
|
|
|
$this->objects[$key]['output'] = $output;
|
|
|
|
}
|
|
|
|
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Get the output of markup processing for a field queued with
|
|
|
|
* @{method:addObject}. Before you can call this method, you must call
|
|
|
|
* @{method:process}.
|
|
|
|
*
|
|
|
|
* @param PhabricatorMarkupInterface The object to retrieve.
|
|
|
|
* @param string The field to retrieve.
|
|
|
|
* @return string Processed output.
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
public function getOutput(PhabricatorMarkupInterface $object, $field) {
|
|
|
|
$key = $this->getMarkupFieldKey($object, $field);
|
|
|
|
|
|
|
|
if (empty($this->objects[$key])) {
|
|
|
|
throw new Exception(
|
|
|
|
"Call addObject() before getOutput() (key = '{$key}').");
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!isset($this->objects[$key]['output'])) {
|
|
|
|
throw new Exception(
|
|
|
|
"Call process() before getOutput().");
|
|
|
|
}
|
|
|
|
|
|
|
|
return $this->objects[$key]['output'];
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
private function getMarkupFieldKey(
|
|
|
|
PhabricatorMarkupInterface $object,
|
|
|
|
$field) {
|
2012-10-23 04:06:56 +02:00
|
|
|
return $object->getMarkupFieldKey($field).'@'.$this->version;
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
private function loadPreprocessorCaches(array $engines, array $objects) {
|
|
|
|
$blocks = array();
|
|
|
|
|
|
|
|
$use_cache = array();
|
|
|
|
foreach ($objects as $key => $info) {
|
|
|
|
if ($info['object']->shouldUseMarkupCache($info['field'])) {
|
|
|
|
$use_cache[$key] = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if ($use_cache) {
|
2013-01-19 00:36:41 +01:00
|
|
|
try {
|
|
|
|
$blocks = id(new PhabricatorMarkupCache())->loadAllWhere(
|
|
|
|
'cacheKey IN (%Ls)',
|
|
|
|
array_keys($use_cache));
|
|
|
|
$blocks = mpull($blocks, null, 'getCacheKey');
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
phlog($ex);
|
|
|
|
}
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
foreach ($objects as $key => $info) {
|
|
|
|
if (isset($blocks[$key])) {
|
|
|
|
// If we already have a preprocessing cache, we don't need to rebuild
|
|
|
|
// it.
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
$text = $info['object']->getMarkupText($info['field']);
|
|
|
|
$data = $engines[$key]->preprocessText($text);
|
|
|
|
|
|
|
|
// NOTE: This is just debugging information to help sort out cache issues.
|
|
|
|
// If one machine is misconfigured and poisoning caches you can use this
|
|
|
|
// field to hunt it down.
|
|
|
|
|
|
|
|
$metadata = array(
|
|
|
|
'host' => php_uname('n'),
|
|
|
|
);
|
|
|
|
|
|
|
|
$blocks[$key] = id(new PhabricatorMarkupCache())
|
|
|
|
->setCacheKey($key)
|
|
|
|
->setCacheData($data)
|
|
|
|
->setMetadata($metadata);
|
|
|
|
|
|
|
|
if (isset($use_cache[$key])) {
|
|
|
|
// This is just filling a cache and always safe, even on a read pathway.
|
|
|
|
$unguarded = AphrontWriteGuard::beginScopedUnguardedWrites();
|
2013-01-19 00:36:41 +01:00
|
|
|
$blocks[$key]->replace();
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
unset($unguarded);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return $blocks;
|
2011-06-24 20:50:19 +02:00
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
2012-09-05 20:40:48 +02:00
|
|
|
/**
|
|
|
|
* Set the viewing user. Used to implement object permissions.
|
|
|
|
*
|
|
|
|
* @param PhabricatorUser The viewing user.
|
|
|
|
* @return this
|
|
|
|
* @task markup
|
|
|
|
*/
|
|
|
|
public function setViewer(PhabricatorUser $viewer) {
|
|
|
|
$this->viewer = $viewer;
|
|
|
|
return $this;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
/* -( Engine Construction )------------------------------------------------ */
|
|
|
|
|
|
|
|
|
2012-09-05 20:40:48 +02:00
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
public static function newManiphestMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
public static function newPhrictionMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(array(
|
2012-01-06 18:08:59 +01:00
|
|
|
'header.generate-toc' => true,
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
2012-04-12 22:09:04 +02:00
|
|
|
public static function newPhameMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
'macros' => false,
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
2012-07-02 19:44:37 +02:00
|
|
|
public static function newFeedMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(
|
|
|
|
array(
|
|
|
|
'macros' => false,
|
|
|
|
'youtube' => false,
|
|
|
|
|
|
|
|
));
|
|
|
|
}
|
2012-04-12 22:09:04 +02:00
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
2011-10-20 18:41:38 +02:00
|
|
|
public static function newDifferentialMarkupEngine(array $options = array()) {
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
'custom-inline' => PhabricatorEnv::getEnvConfig(
|
|
|
|
'differential.custom-remarkup-rules'),
|
|
|
|
'custom-block' => PhabricatorEnv::getEnvConfig(
|
|
|
|
'differential.custom-remarkup-block-rules'),
|
2011-10-20 18:41:38 +02:00
|
|
|
'differential.diff' => idx($options, 'differential.diff'),
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
2012-02-24 23:14:39 +01:00
|
|
|
public static function newDiffusionMarkupEngine(array $options = array()) {
|
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
public static function newProfileMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
public static function newSlowvoteMarkupEngine() {
|
|
|
|
return self::newMarkupEngine(array(
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
2012-08-10 19:44:04 +02:00
|
|
|
public static function newPonderMarkupEngine(array $options = array()) {
|
|
|
|
return self::newMarkupEngine($options);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
private static function getMarkupEngineDefaultConfiguration() {
|
|
|
|
return array(
|
|
|
|
'pygments' => PhabricatorEnv::getEnvConfig('pygments.enabled'),
|
|
|
|
'youtube' => PhabricatorEnv::getEnvConfig(
|
|
|
|
'remarkup.enable-embedded-youtube'),
|
|
|
|
'custom-inline' => array(),
|
|
|
|
'custom-block' => array(),
|
2011-10-20 18:41:38 +02:00
|
|
|
'differential.diff' => null,
|
2012-01-07 08:32:15 +01:00
|
|
|
'header.generate-toc' => false,
|
2011-07-17 03:25:45 +02:00
|
|
|
'macros' => true,
|
2011-10-09 22:47:27 +02:00
|
|
|
'uri.allowed-protocols' => PhabricatorEnv::getEnvConfig(
|
|
|
|
'uri.allowed-protocols'),
|
2012-06-27 04:06:53 +02:00
|
|
|
'syntax-highlighter.engine' => PhabricatorEnv::getEnvConfig(
|
|
|
|
'syntax-highlighter.engine'),
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
);
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* @task engine
|
|
|
|
*/
|
2012-11-22 02:24:01 +01:00
|
|
|
public static function newMarkupEngine(array $options) {
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
|
|
|
|
$options += self::getMarkupEngineDefaultConfiguration();
|
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
$engine = new PhutilRemarkupEngine();
|
|
|
|
|
2011-04-10 23:07:00 +02:00
|
|
|
$engine->setConfig('preserve-linebreaks', true);
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
$engine->setConfig('pygments.enabled', $options['pygments']);
|
2011-10-09 22:47:27 +02:00
|
|
|
$engine->setConfig(
|
|
|
|
'uri.allowed-protocols',
|
|
|
|
$options['uri.allowed-protocols']);
|
2011-10-20 18:41:38 +02:00
|
|
|
$engine->setConfig('differential.diff', $options['differential.diff']);
|
2012-01-06 18:08:59 +01:00
|
|
|
$engine->setConfig('header.generate-toc', $options['header.generate-toc']);
|
2012-06-27 04:06:53 +02:00
|
|
|
$engine->setConfig(
|
|
|
|
'syntax-highlighter.engine',
|
|
|
|
$options['syntax-highlighter.engine']);
|
2011-04-10 23:07:00 +02:00
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
$rules = array();
|
|
|
|
$rules[] = new PhutilRemarkupRuleEscapeRemarkup();
|
2012-03-15 14:41:08 +01:00
|
|
|
$rules[] = new PhutilRemarkupRuleMonospace();
|
2011-10-20 18:41:38 +02:00
|
|
|
|
|
|
|
$custom_rule_classes = $options['custom-inline'];
|
|
|
|
if ($custom_rule_classes) {
|
|
|
|
foreach ($custom_rule_classes as $custom_rule_class) {
|
|
|
|
$rules[] = newv($custom_rule_class, array());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-08-08 03:51:47 +02:00
|
|
|
$rules[] = new PhutilRemarkupRuleDocumentLink();
|
|
|
|
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
if ($options['youtube']) {
|
2011-05-27 21:50:02 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleYoutube();
|
|
|
|
}
|
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
$rules[] = new PhutilRemarkupRuleHyperlink();
|
2012-04-08 10:13:00 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRulePhriction();
|
2011-02-12 03:06:43 +01:00
|
|
|
|
Provide a "reference-with-full-name" syntax for Remarkup
Summary:
Provide a {T123} syntax which pulls in the entire name of an object, not just a
link to it. A major use for this is organizing projects using wiki pages. Since
handle links show object status now, this lets you organize stuff in an ad-hoc
way and get a reasonable overview of it. We can make handles richer in the
future, too.
The performance on this isn't perfect (it adds some potential single gets) but I
think it's okay for now and I don't want to make remarkup engine even more
complex until the preprocess/postprocess stuff has had a chance to settle and
I'm more confident it works.
In Differential and Maniphest we'll also incorrectly cache the object
state/name, but that'll fix itself once I move the cache code to use
preprocess/postprocess correctly.
Test Plan:
- See https://secure.phabricator.com/file/view/PHID-FILE-5f9ca32407bec20899b9/
for an example.
- Generated and looked over the documentation.
Reviewed By: jungejason
Reviewers: jungejason, tuomaspelkonen, aran, hunterbridges
CC: skrul, aran, jungejason, epriestley
Differential Revision: 784
2011-08-05 17:50:26 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleDifferentialHandle();
|
2012-11-21 19:57:29 +01:00
|
|
|
if (PhabricatorEnv::getEnvConfig('maniphest.enabled')) {
|
|
|
|
$rules[] = new PhabricatorRemarkupRuleManiphestHandle();
|
|
|
|
}
|
Provide a "reference-with-full-name" syntax for Remarkup
Summary:
Provide a {T123} syntax which pulls in the entire name of an object, not just a
link to it. A major use for this is organizing projects using wiki pages. Since
handle links show object status now, this lets you organize stuff in an ad-hoc
way and get a reasonable overview of it. We can make handles richer in the
future, too.
The performance on this isn't perfect (it adds some potential single gets) but I
think it's okay for now and I don't want to make remarkup engine even more
complex until the preprocess/postprocess stuff has had a chance to settle and
I'm more confident it works.
In Differential and Maniphest we'll also incorrectly cache the object
state/name, but that'll fix itself once I move the cache code to use
preprocess/postprocess correctly.
Test Plan:
- See https://secure.phabricator.com/file/view/PHID-FILE-5f9ca32407bec20899b9/
for an example.
- Generated and looked over the documentation.
Reviewed By: jungejason
Reviewers: jungejason, tuomaspelkonen, aran, hunterbridges
CC: skrul, aran, jungejason, epriestley
Differential Revision: 784
2011-08-05 17:50:26 +02:00
|
|
|
|
2011-07-15 23:17:55 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleEmbedFile();
|
Provide a "reference-with-full-name" syntax for Remarkup
Summary:
Provide a {T123} syntax which pulls in the entire name of an object, not just a
link to it. A major use for this is organizing projects using wiki pages. Since
handle links show object status now, this lets you organize stuff in an ad-hoc
way and get a reasonable overview of it. We can make handles richer in the
future, too.
The performance on this isn't perfect (it adds some potential single gets) but I
think it's okay for now and I don't want to make remarkup engine even more
complex until the preprocess/postprocess stuff has had a chance to settle and
I'm more confident it works.
In Differential and Maniphest we'll also incorrectly cache the object
state/name, but that'll fix itself once I move the cache code to use
preprocess/postprocess correctly.
Test Plan:
- See https://secure.phabricator.com/file/view/PHID-FILE-5f9ca32407bec20899b9/
for an example.
- Generated and looked over the documentation.
Reviewed By: jungejason
Reviewers: jungejason, tuomaspelkonen, aran, hunterbridges
CC: skrul, aran, jungejason, epriestley
Differential Revision: 784
2011-08-05 17:50:26 +02:00
|
|
|
|
2011-02-12 03:06:43 +01:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleDifferential();
|
2011-04-11 12:02:19 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleDiffusion();
|
2012-11-21 19:57:29 +01:00
|
|
|
if (PhabricatorEnv::getEnvConfig('maniphest.enabled')) {
|
|
|
|
$rules[] = new PhabricatorRemarkupRuleManiphest();
|
|
|
|
}
|
2011-06-27 08:57:43 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRulePaste();
|
2011-07-17 03:25:45 +02:00
|
|
|
|
2012-08-15 04:19:23 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleCountdown();
|
|
|
|
|
2012-08-10 19:44:04 +02:00
|
|
|
$rules[] = new PonderRuleQuestion();
|
|
|
|
|
2011-07-17 03:25:45 +02:00
|
|
|
if ($options['macros']) {
|
|
|
|
$rules[] = new PhabricatorRemarkupRuleImageMacro();
|
2013-01-24 18:57:58 +01:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleMeme();
|
2011-07-17 03:25:45 +02:00
|
|
|
}
|
|
|
|
|
2011-06-24 19:59:57 +02:00
|
|
|
$rules[] = new PhabricatorRemarkupRuleMention();
|
2011-02-12 03:06:43 +01:00
|
|
|
|
2011-06-02 20:50:24 +02:00
|
|
|
$rules[] = new PhutilRemarkupRuleEscapeHTML();
|
|
|
|
$rules[] = new PhutilRemarkupRuleBold();
|
|
|
|
$rules[] = new PhutilRemarkupRuleItalic();
|
2012-03-21 22:01:12 +01:00
|
|
|
$rules[] = new PhutilRemarkupRuleDel();
|
2011-06-02 20:50:24 +02:00
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
$blocks = array();
|
2011-04-14 21:52:28 +02:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupQuotesBlockRule();
|
2012-01-04 01:17:40 +01:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupLiteralBlockRule();
|
2011-01-30 22:20:56 +01:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupHeaderBlockRule();
|
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupListBlockRule();
|
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupCodeBlockRule();
|
2011-12-01 18:48:27 +01:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupNoteBlockRule();
|
2012-07-02 23:44:38 +02:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupTableBlockRule();
|
2012-10-17 22:18:39 +02:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupSimpleTableBlockRule();
|
2011-01-30 22:20:56 +01:00
|
|
|
$blocks[] = new PhutilRemarkupEngineRemarkupDefaultBlockRule();
|
|
|
|
|
Generalize the markup engine factory
Summary:
This thing services every app but it lives inside Differential right now. Pull
it out, and separate the factory interfaces per-application.
This will let us accommodate changes we need to make for Phriction to support
wiki linking.
Test Plan: Tested remarkup in differential, diffusion, maniphest, people,
slowvote.
Reviewed By: hsb
Reviewers: hsb, codeblock, jungejason, tuomaspelkonen, aran
CC: aran, hsb
Differential Revision: 646
2011-07-12 00:58:32 +02:00
|
|
|
$custom_block_rule_classes = $options['custom-block'];
|
2011-05-26 22:13:36 +02:00
|
|
|
if ($custom_block_rule_classes) {
|
|
|
|
foreach ($custom_block_rule_classes as $custom_block_rule_class) {
|
|
|
|
$blocks[] = newv($custom_block_rule_class, array());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
foreach ($blocks as $block) {
|
2012-01-04 01:17:40 +01:00
|
|
|
if ($block instanceof PhutilRemarkupEngineRemarkupLiteralBlockRule) {
|
|
|
|
$literal_rules = array();
|
|
|
|
$literal_rules[] = new PhutilRemarkupRuleEscapeHTML();
|
|
|
|
$literal_rules[] = new PhutilRemarkupRuleLinebreaks();
|
|
|
|
$block->setMarkupRules($literal_rules);
|
|
|
|
} else if (
|
|
|
|
!($block instanceof PhutilRemarkupEngineRemarkupCodeBlockRule)) {
|
2011-01-30 22:20:56 +01:00
|
|
|
$block->setMarkupRules($rules);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
$engine->setBlockRules($blocks);
|
|
|
|
|
|
|
|
return $engine;
|
|
|
|
}
|
|
|
|
|
Add a generic multistep Markup cache
Summary:
The immediate issue this addresses is T1366, adding a rendering cache to Phriction. For wiki pages with code blocks especially, rerendering them each time is expensive.
The broader issue is that out markup caches aren't very good right now. They have three major problems:
**Problem 1: the data is stored in the wrong place.** We currently store remarkup caches on objects. This means we're always loading it and passing it around even when we don't need it, can't genericize cache management code (e.g., have one simple script to drop/GC caches), need to update authoritative rows to clear caches, and can't genericize rendering code since each object is different.
To solve this, I created a dedicated cache database that I plan to move all markup caches to use.
**Problem 2: time-variant rules break when cached.** Some rules like `**bold**` are time-invariant and always produce the same output, but some rules like `{Tnnn}` and `@username` are variant and may render differently (because a task was closed or a user is on vacation). Currently, we cache the raw output, so these time-variant rules get locked at whatever values they had when they were first rendered. This is the main reason Phriction doesn't have a cache right now -- I wanted `{Tnnn}` rules to reflect open/closed tasks.
To solve this, I split markup into a "preprocessing" phase (which does all the parsing and evaluates all time-invariant rules) and a "postprocessing" phase (which evaluates time-variant rules only). The preprocessing phase is most of the expense (and, notably, includes syntax highlighting) so this is nearly as good as caching the final output. I did most of the work here in D737 / D738, but we never moved to use it in Phabricator -- we currently just do the two operations serially in all cases.
This diff splits them apart and caches the output of preprocessing only, so we benefit from caching but also get accurate time-variant rendering.
**Problem 3: cache access isn't batched/pipelined optimally.** When we're rendering a list of markup blocks, we should be able to batch datafetching better than we do. D738 helped with this (fetching is batched within a single hunk of markup) and this improves batching on cache access. We could still do better here, but this is at least a step forward.
Also fixes a bug with generating a link in the Phriction history interface ($uri gets clobbered).
I'm using PHP serialization instead of JSON serialization because Remarkup does some stuff with non-ascii characters that might not survive JSON.
Test Plan:
- Created a Phriction document and verified that previews don't go to cache (no rows appear in the cache table).
- Verified that published documents come out of cache.
- Verified that caches generate/regenerate correctly, time-variant rules render properly and old documents hit the right caches.
Reviewers: btrahan
Reviewed By: btrahan
CC: aran
Maniphest Tasks: T1366
Differential Revision: https://secure.phabricator.com/D2945
2012-07-10 00:20:56 +02:00
|
|
|
public static function extractPHIDsFromMentions(array $content_blocks) {
|
|
|
|
$mentions = array();
|
|
|
|
|
|
|
|
$engine = self::newDifferentialMarkupEngine();
|
|
|
|
|
|
|
|
foreach ($content_blocks as $content_block) {
|
|
|
|
$engine->markupText($content_block);
|
|
|
|
$phids = $engine->getTextMetadata(
|
|
|
|
PhabricatorRemarkupRuleMention::KEY_MENTIONED,
|
|
|
|
array());
|
|
|
|
$mentions += $phids;
|
|
|
|
}
|
|
|
|
|
|
|
|
return $mentions;
|
|
|
|
}
|
|
|
|
|
2013-01-25 02:23:05 +01:00
|
|
|
public static function extractFilePHIDsFromEmbeddedFiles(
|
|
|
|
array $content_blocks) {
|
|
|
|
$files = array();
|
|
|
|
|
|
|
|
$engine = self::newDifferentialMarkupEngine();
|
|
|
|
|
|
|
|
foreach ($content_blocks as $content_block) {
|
|
|
|
$engine->markupText($content_block);
|
|
|
|
$ids = $engine->getTextMetadata(
|
|
|
|
PhabricatorRemarkupRuleEmbedFile::KEY_EMBED_FILE_PHIDS,
|
|
|
|
array());
|
|
|
|
$files += $ids;
|
|
|
|
}
|
|
|
|
|
|
|
|
return $files;
|
|
|
|
}
|
2012-10-17 17:36:33 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Produce a corpus summary, in a way that shortens the underlying text
|
|
|
|
* without truncating it somewhere awkward.
|
|
|
|
*
|
|
|
|
* TODO: We could do a better job of this.
|
|
|
|
*
|
|
|
|
* @param string Remarkup corpus to summarize.
|
|
|
|
* @return string Summarized corpus.
|
|
|
|
*/
|
|
|
|
public static function summarize($corpus) {
|
|
|
|
|
|
|
|
// Major goals here are:
|
|
|
|
// - Don't split in the middle of a character (utf-8).
|
|
|
|
// - Don't split in the middle of, e.g., **bold** text, since
|
|
|
|
// we end up with hanging '**' in the summary.
|
|
|
|
// - Try not to pick an image macro, header, embedded file, etc.
|
|
|
|
// - Hopefully don't return too much text. We don't explicitly limit
|
|
|
|
// this right now.
|
|
|
|
|
|
|
|
$blocks = preg_split("/\n *\n\s*/", trim($corpus));
|
|
|
|
|
|
|
|
$best = null;
|
|
|
|
foreach ($blocks as $block) {
|
|
|
|
// This is a test for normal spaces in the block, i.e. a heuristic to
|
|
|
|
// distinguish standard paragraphs from things like image macros. It may
|
|
|
|
// not work well for non-latin text. We prefer to summarize with a
|
|
|
|
// paragraph of normal words over an image macro, if possible.
|
|
|
|
$has_space = preg_match('/\w\s\w/', $block);
|
|
|
|
|
|
|
|
// This is a test to find embedded images and headers. We prefer to
|
|
|
|
// summarize with a normal paragraph over a header or an embedded object,
|
|
|
|
// if possible.
|
|
|
|
$has_embed = preg_match('/^[{=]/', $block);
|
|
|
|
|
|
|
|
if ($has_space && !$has_embed) {
|
|
|
|
// This seems like a good summary, so return it.
|
|
|
|
return $block;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!$best) {
|
|
|
|
// This is the first block we found; if everything is garbage just
|
|
|
|
// use the first block.
|
|
|
|
$best = $block;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return $best;
|
|
|
|
}
|
|
|
|
|
2011-01-30 22:20:56 +01:00
|
|
|
}
|