1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-24 21:48:21 +01:00
Commit graph

4 commits

Author SHA1 Message Date
epriestley
fcb75d0503 Fix an issue where prose diffing may fail after hitting the PCRE backtracking limit
Summary:
Fixes T13554. For certain prose diff inputs and PCRE backtracking limits, this regular expression may back track too often and fail.

A characteristic input is "x x x x ...", i.e. many sequences where `(.*?)\s*\z` looks like it may be able to match but actually can not.

I think writing an expression which has all the behavior we'd like without this backtracking issue isn't trivial (at least, I don't think I know how to do it offhand); just use a strategy based on "trim()" insetad, which avoids any PCRE complexities here.

Test Plan: Locally, this passes the "x x x ..." test which the previous code failed. I'm not including that test because it won't reproduce across values of "pcre.backtrac_limit", PCRE versions, etc.

Maniphest Tasks: T13554

Differential Revision: https://secure.phabricator.com/D21422
2020-07-23 07:46:15 -07:00
epriestley
36075f6ce5 Correct a prose diff behavior when prose pieces include newlines
Summary:
See <https://discourse.phabricator-community.org/t/bad-regex-in-prose-diff-logic/3969>.

The prose splitting rules normally guarantee that newlines appear only at the beginning or end of blocks. However, if a prose sentence ends with text like "...x\n.", we can end up with a newline inside a "sentence".

If we do, the regular expression that breaks it into pieces will fail.

Arguably, this is an error in how sentences are split apart (we might prefer to split this into two sentences, "x\n" and ".", rather than a single "x\n." sentence) but in the general case it's not unreasonable for blocks to contain newlines, so a simple fix is to make the pattern more robust.

Test Plan: Added a failing test which includes this behavior, made it pass.

Differential Revision: https://secure.phabricator.com/D21295
2020-05-30 14:11:37 -07:00
epriestley
884cd74cc4 In prose diffs, use hash-and-diff for coarse "level 0" diffing to scale better
Summary: Depends on D20838. Fixes T13414. Instead of doing coarse diffing with "PhutilEditDistanceMatrix", use hash-and-diff with "DocumentEngine".

Test Plan:
  - On a large document (~3K top level blocks), saw a more sensible diff, instead of the whole thing falling back to "everything changed" mode.
  - On a small document, still saw a sensible granular diff.

{F6888249}

Maniphest Tasks: T13414

Differential Revision: https://secure.phabricator.com/D20839
2019-09-25 16:50:49 -07:00
epriestley
9d884f144f Add "PhutilProseDiff" classes to "phabricator/"
Summary: Depends on D20836. Ref T13414. Ref T13425. Ref T13395. Move these to "phabricator/" before trying to improve the high-level diff engine in prose diffs.

Test Plan: Ran "arc liberate", looked at a prose diff (no behavioral change).

Maniphest Tasks: T13425, T13414, T13395

Differential Revision: https://secure.phabricator.com/D20838
2019-09-25 16:49:54 -07:00