Re: [PATCH v2] checkpatch: Only encode UTF-8 quoted printable mail headers

From: Andrew Morton
Date: Wed Jul 18 2018 - 17:42:27 EST


On Wed, 18 Jul 2018 16:52:54 +0200 Geert Uytterhoeven <geert+renesas@xxxxxxxxx> wrote:

> As PERL uses its own internal character encoding, always calling
> encode("utf8", ...) on the author name may cause corruption, leading to
> an author signoff mismatch.
>
> This happens in the following cases:
> - If a patch is in ISO-8859, and contains a non-ASCII author name in
> the From: line, it is converted to UTF-8, while the Signed-off-by
> line will still be in ISO-8859.
> - If a patch is in UTF-8, and contains a non-ASCII author name in the
> body (not header) From: line, it is assumed to be encoded in PERL's
> internal character encoding, and converted to UTF-8 incorrectly,
> while the Signed-off-by line will be in real UTF-8.
>
> Fix this by only doing the encode step if the From: line used UTF-8
> quoted printable encoding.

Works for me, thanks.


Relatedly, would it be worth adding a checkpatch warning if a patch
contains anything other than ASCII or UTF-8?

I added this to my little local patch-checking script.

if ! file $p | grep -q -P "ASCII text|Unicode text"
then
echo $p: weird charset
fi