1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-12-22 21:40:55 +01:00

Correct a prose diff behavior when prose pieces include newlines

Summary:
See <https://discourse.phabricator-community.org/t/bad-regex-in-prose-diff-logic/3969>.

The prose splitting rules normally guarantee that newlines appear only at the beginning or end of blocks. However, if a prose sentence ends with text like "...x\n.", we can end up with a newline inside a "sentence".

If we do, the regular expression that breaks it into pieces will fail.

Arguably, this is an error in how sentences are split apart (we might prefer to split this into two sentences, "x\n" and ".", rather than a single "x\n." sentence) but in the general case it's not unreasonable for blocks to contain newlines, so a simple fix is to make the pattern more robust.

Test Plan: Added a failing test which includes this behavior, made it pass.

Differential Revision: https://secure.phabricator.com/D21295
This commit is contained in:
epriestley 2020-05-30 13:31:42 -07:00
parent f686a0b827
commit 36075f6ce5
2 changed files with 9 additions and 1 deletions

View file

@ -148,7 +148,7 @@ final class PhutilProseDifferenceEngine extends Phobject {
// whitespace at the end.
$matches = null;
preg_match('/^(\s*)(.*?)(\s*)\z/', $result, $matches);
preg_match('/^(\s*)(.*?)(\s*)\z/s', $result, $matches);
if (strlen($matches[1])) {
$results[] = $matches[1];

View file

@ -30,6 +30,14 @@ final class PhutilProseDiffTestCase
),
pht('Remove Paragraph'));
$this->assertProseParts(
'xxx',
"xxxyyy\n.zzz",
array(
'= xxx',
"+ yyy\n.zzz",
),
pht('Amend paragraph, and add paragraph starting with punctuation'));
// Without smoothing, the alogorithm identifies that "shark" and "cat"
// both contain the letter "a" and tries to express this as a very