From c94c5bbf35f0ea216fda8058946209a6f352c96a Mon Sep 17 00:00:00 2001 From: Christopher Speck Date: Sun, 27 Jun 2021 20:28:45 -0400 Subject: [PATCH] Force all mercurial commands to use UTF-8 encoding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Summary: When non-ascii characters appear in revision titles/summaries the `patch` and `diff` (to update) commands will fail on Windows systems. This often occurs due to “smart quotes” or "em—dash" characters being inserted into commit messages by editors on "user-friendly" operating systems like macOS. This can be worked around by forcing all mercurial commands to use the global option `--encoding utf-8` which applies for any mercurial command. This option was [[ https://www.mercurial-scm.org/repo/hg/rev/a88e02081a88 | added in ~2006 ]] so this should work across all supported versions of mercurial. Refs T13649 Test Plan: I created a diff on a mercurial repository using smart quotes in the "Title" and "Summary" fields as well as in the content of a file being changed. Then on macOS, Windows (PowerShell), and Windows (cmd.exe) I was able to `patch` down the revision, make a modification, and `diff` the change back up to Phabricator, as well as `land` the change. I verified the commit and content looked correct on macOS as well as on Windows by using `nvim` which seems to properly detect and render the encoding, whereas mercurial displays the smart quotes and em-dashes with odd characters instead. I did a grep through Arcanist codebase to find other places where `--encoding` might be specified for mercurial commands and could not find any. In the event that somehow this argument is added elsewhere I verified that multiple specifications of `--encoding utf-8` does not cause any issues and the later specification of `--encoding` appears to "win". ```lang=console $ hg --encoding utf-8 --encoding utf-8 log -r tip # prints out results in UTF-8 without issue $ hg --encoding utf-8 log --encoding latin-1 -r tip # prints out results in latin-1 without issue ``` Reviewers: epriestley, #blessed_reviewers Reviewed By: epriestley, #blessed_reviewers Subscribers: Korvin Maniphest Tasks: T13649 Differential Revision: https://secure.phabricator.com/D21676 --- src/repository/api/ArcanistMercurialAPI.php | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/repository/api/ArcanistMercurialAPI.php b/src/repository/api/ArcanistMercurialAPI.php index e5c2078b..f0815985 100644 --- a/src/repository/api/ArcanistMercurialAPI.php +++ b/src/repository/api/ArcanistMercurialAPI.php @@ -15,7 +15,10 @@ final class ArcanistMercurialAPI extends ArcanistRepositoryAPI { protected function buildLocalFuture(array $argv) { $env = $this->getMercurialEnvironmentVariables(); - $argv[0] = 'hg '.$argv[0]; + // Mercurial deceptively indicates that the default encoding is UTF-8 + // however the actual default appears to be "something else", at least on + // Windows systems. Force all mercurial commands to use UTF-8 encoding. + $argv[0] = 'hg --encoding utf-8 '.$argv[0]; $future = newv('ExecFuture', $argv) ->setEnv($env)