1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2025-01-26 06:28:19 +01:00

Skip copied code detection for changes that are too large for it to be useful

Summary:
Ref T13210. See PHI944. When parsing certain large diffs (the case in PHI944 is an 2.5-million line JSON diff), we spend ~66% of runtime and ~80% of memory doing copy detection (the little yellow bar which shows up to give you a hint that code was moved around within a diff).

This is pretty much pointless and copy hints are almost certainly never useful on large changes. Instead, just bail if the change is larger than some arbitrary "probably too big for copy hints to ever be useful" threshold (here, 65535 lines).

Test Plan:
Roughly, ran this against a 2.5 million line JSON diff:

```
$changes = id(new ArcanistDiffParser())->parseDiff($raw_diff);
$diff = DifferentialDiff::newFromRawChanges($viewer, $changes);
```

Before the changes, it took 20s + 2.5GB RAM to parse. After the changes, it took 7s + 500MB RAM to parse.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13210

Differential Revision: https://secure.phabricator.com/D19748
This commit is contained in:
epriestley 2018-10-19 13:20:39 -07:00
parent e0dea4c486
commit e2cf1e4288

View file

@ -88,6 +88,20 @@ final class DifferentialChangesetEngine extends Phobject {
private function detectCopiedCode(array $changesets) {
// See PHI944. If the total number of changed lines is excessively large,
// don't bother with copied code detection. This can take a lot of time and
// memory and it's not generally of any use for very large changes.
$max_size = 65535;
$total_size = 0;
foreach ($changesets as $changeset) {
$total_size += ($changeset->getAddLines() + $changeset->getDelLines());
}
if ($total_size > $max_size) {
return;
}
$min_width = 30;
$min_lines = 3;