Summary:
Fixes T13554. For certain prose diff inputs and PCRE backtracking limits, this regular expression may back track too often and fail.
A characteristic input is "x x x x ...", i.e. many sequences where `(.*?)\s*\z` looks like it may be able to match but actually can not.
I think writing an expression which has all the behavior we'd like without this backtracking issue isn't trivial (at least, I don't think I know how to do it offhand); just use a strategy based on "trim()" insetad, which avoids any PCRE complexities here.
Test Plan: Locally, this passes the "x x x ..." test which the previous code failed. I'm not including that test because it won't reproduce across values of "pcre.backtrac_limit", PCRE versions, etc.
Maniphest Tasks: T13554
Differential Revision: https://secure.phabricator.com/D21422
Summary:
See <https://discourse.phabricator-community.org/t/bad-regex-in-prose-diff-logic/3969>.
The prose splitting rules normally guarantee that newlines appear only at the beginning or end of blocks. However, if a prose sentence ends with text like "...x\n.", we can end up with a newline inside a "sentence".
If we do, the regular expression that breaks it into pieces will fail.
Arguably, this is an error in how sentences are split apart (we might prefer to split this into two sentences, "x\n" and ".", rather than a single "x\n." sentence) but in the general case it's not unreasonable for blocks to contain newlines, so a simple fix is to make the pattern more robust.
Test Plan: Added a failing test which includes this behavior, made it pass.
Differential Revision: https://secure.phabricator.com/D21295
Summary: Depends on D20838. Fixes T13414. Instead of doing coarse diffing with "PhutilEditDistanceMatrix", use hash-and-diff with "DocumentEngine".
Test Plan:
- On a large document (~3K top level blocks), saw a more sensible diff, instead of the whole thing falling back to "everything changed" mode.
- On a small document, still saw a sensible granular diff.
{F6888249}
Maniphest Tasks: T13414
Differential Revision: https://secure.phabricator.com/D20839
Summary: Depends on D20836. Ref T13414. Ref T13425. Ref T13395. Move these to "phabricator/" before trying to improve the high-level diff engine in prose diffs.
Test Plan: Ran "arc liberate", looked at a prose diff (no behavioral change).
Maniphest Tasks: T13425, T13414, T13395
Differential Revision: https://secure.phabricator.com/D20838