1
0
Fork 0

make clang_tidy_report handle diffs text files with invalid utf-8

"git diff" handles text files encoded that is not valid UTF-8 (e.g
using ISO-8859-1) as text files and produces a diff of those (rather
saying "Binary files a/x and b/x differ").

This means that the diff output may contain such characters. Files
that did would cause clang_tidy_report.py do hit an UnicodeDecodeError
when reading the diff, including if it was on removed lines and
regardless if it was in the ignore file.

By specifying errors mode "replace" for decode() method the bytes
that are not a valid utf-8 encoding are replaced with the unicode
replacement question mark (U+FFFD). When parsing the diff
clang-tidy-diff is only looking at filenames and line numbers of the
diff, so this shouldn't be a problem if it doesn't get the exact same
byte sequence inside the actual change.
This commit is contained in:
Anders Waldenborg 2021-06-21 19:22:18 +02:00 committed by Mikhail Goncharov
parent d41a6769fc
commit 1d9dbf5568

View file

@ -34,7 +34,7 @@ def run(base_commit, ignore_config, step: Optional[Step], report: Optional[Repor
step = Step() # For debugging.
r = subprocess.run(f'git diff -U0 --no-prefix {base_commit}', shell=True, capture_output=True)
logging.debug(f'git diff {r}')
diff = r.stdout.decode()
diff = r.stdout.decode("utf-8", "replace")
if ignore_config is not None and os.path.exists(ignore_config):
ignore = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern,
open(ignore_config, 'r').readlines())