From e2cf1e42887afc62631d006726a8ee88beaa56a0 Mon Sep 17 00:00:00 2001 From: epriestley Date: Fri, 19 Oct 2018 13:20:39 -0700 Subject: [PATCH] Skip copied code detection for changes that are too large for it to be useful Summary: Ref T13210. See PHI944. When parsing certain large diffs (the case in PHI944 is an 2.5-million line JSON diff), we spend ~66% of runtime and ~80% of memory doing copy detection (the little yellow bar which shows up to give you a hint that code was moved around within a diff). This is pretty much pointless and copy hints are almost certainly never useful on large changes. Instead, just bail if the change is larger than some arbitrary "probably too big for copy hints to ever be useful" threshold (here, 65535 lines). Test Plan: Roughly, ran this against a 2.5 million line JSON diff: ``` $changes = id(new ArcanistDiffParser())->parseDiff($raw_diff); $diff = DifferentialDiff::newFromRawChanges($viewer, $changes); ``` Before the changes, it took 20s + 2.5GB RAM to parse. After the changes, it took 7s + 500MB RAM to parse. Reviewers: amckinley Reviewed By: amckinley Maniphest Tasks: T13210 Differential Revision: https://secure.phabricator.com/D19748 --- .../engine/DifferentialChangesetEngine.php | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/applications/differential/engine/DifferentialChangesetEngine.php b/src/applications/differential/engine/DifferentialChangesetEngine.php index e8a55a1b0e..d72db025ad 100644 --- a/src/applications/differential/engine/DifferentialChangesetEngine.php +++ b/src/applications/differential/engine/DifferentialChangesetEngine.php @@ -88,6 +88,20 @@ final class DifferentialChangesetEngine extends Phobject { private function detectCopiedCode(array $changesets) { + // See PHI944. If the total number of changed lines is excessively large, + // don't bother with copied code detection. This can take a lot of time and + // memory and it's not generally of any use for very large changes. + $max_size = 65535; + + $total_size = 0; + foreach ($changesets as $changeset) { + $total_size += ($changeset->getAddLines() + $changeset->getDelLines()); + } + + if ($total_size > $max_size) { + return; + } + $min_width = 30; $min_lines = 3;