mirror of
https://we.phorge.it/source/arcanist.git
synced 2024-11-22 23:02:41 +01:00
Don't compute intraline diffs if the input fails a coarse check for being huge
Summary: Fixes T11744. Because intraline diffs are expensive to generate, we already bail out and decline to generate them for very long lines. However, we currently split the inputs into lists of characters first, then check how long they are and make a decision to bail. For //huge// inputs (e.g., 1MB+), this is too late: just splitting them has a large CPU/RAM cost. (These inputs are rare in normal source, but can appear in, e.g., JSON files written without newlines.) Instead, add an extra "are the inputs really huge?" check first, and bail early if they are. Test Plan: - Generated a 1MB "change a file full of Q to a file full of R" diff. - Before change: purged changeset cache; took about 7 seconds to load. - After change: purged changeset cache; took about 1 second to load. - Viewed some normal diffs to make sure intraline edits still displayed correctly. Reviewers: chad Reviewed By: chad Maniphest Tasks: T11744 Differential Revision: https://secure.phabricator.com/D16683
This commit is contained in:
parent
483e985d08
commit
2ad15c499a
1 changed files with 26 additions and 3 deletions
|
@ -56,7 +56,30 @@ final class ArcanistDiffUtils extends Phobject {
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
return self::computeIntralineEdits($o, $n);
|
// Do a fast check for certainly-too-long inputs before splitting the
|
||||||
|
// lines. Inputs take ~200x more memory to represent as lists than as
|
||||||
|
// strings, so we can run out of memory quickly if we try to split huge
|
||||||
|
// inputs. See T11744.
|
||||||
|
$ol = strlen($o);
|
||||||
|
$nl = strlen($n);
|
||||||
|
|
||||||
|
$max_glyphs = 80;
|
||||||
|
|
||||||
|
// This has some wiggle room for multi-byte UTF8 characters, and the
|
||||||
|
// fact that we're testing the sum of the lengths of both strings. It can
|
||||||
|
// still generate false positives for, say, Chinese text liberally
|
||||||
|
// slathered with combining characters, but this kind of text should be
|
||||||
|
// vitually nonexistent in real data.
|
||||||
|
$too_many_bytes = (16 * $max_glyphs);
|
||||||
|
|
||||||
|
if ($ol + $nl > $too_many_bytes) {
|
||||||
|
return array(
|
||||||
|
array(array(1, $ol)),
|
||||||
|
array(array(1, $nl)),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return self::computeIntralineEdits($o, $n, $max_glyphs);
|
||||||
}
|
}
|
||||||
|
|
||||||
public static function applyIntralineDiff($str, $intra_stack) {
|
public static function applyIntralineDiff($str, $intra_stack) {
|
||||||
|
@ -155,7 +178,7 @@ final class ArcanistDiffUtils extends Phobject {
|
||||||
->getEditString();
|
->getEditString();
|
||||||
}
|
}
|
||||||
|
|
||||||
public static function computeIntralineEdits($o, $n) {
|
private static function computeIntralineEdits($o, $n, $max_glyphs) {
|
||||||
if (preg_match('/[\x80-\xFF]/', $o.$n)) {
|
if (preg_match('/[\x80-\xFF]/', $o.$n)) {
|
||||||
$ov = phutil_utf8v_combined($o);
|
$ov = phutil_utf8v_combined($o);
|
||||||
$nv = phutil_utf8v_combined($n);
|
$nv = phutil_utf8v_combined($n);
|
||||||
|
@ -166,7 +189,7 @@ final class ArcanistDiffUtils extends Phobject {
|
||||||
$multibyte = false;
|
$multibyte = false;
|
||||||
}
|
}
|
||||||
|
|
||||||
$result = self::generateEditString($ov, $nv);
|
$result = self::generateEditString($ov, $nv, $max_glyphs);
|
||||||
|
|
||||||
// Now we have a character-based description of the edit. We need to
|
// Now we have a character-based description of the edit. We need to
|
||||||
// convert into a byte-based description. Walk through the edit string and
|
// convert into a byte-based description. Walk through the edit string and
|
||||||
|
|
Loading…
Reference in a new issue