Re: [PATCH v2 2/2] scripts/gdb: lx-dmesg: Use explicit encoding=utf8 errors=replace
From: Jan Kiszka
Date: Fri Jul 07 2017 - 05:16:47 EST
On 2017-06-26 14:52, Leonard Crestez wrote:
> Use errors=replace because it is never desirable for lx-dmesg to fail on
> string decoding errors, not even if the log buffer is corrupt and we show
> incorrect info.
>
> The kernel will sometimes print utf8, for example the copyright symbol from
> jffs2. In order to make this work specify 'utf8' everywhere because python2
> otherwise defaults to 'ascii'.
>
> In theory the second errors='replace' is not be required because everything
> that can be decoded as utf8 should also be encodable back to utf8. But
> it's better to be extra safe here. It's worth noting that this is
> definitely not true for encoding='ascii', unknown characters are
> replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back
> to ascii.
>
> Signed-off-by: Leonard Crestez <leonard.crestez@xxxxxxx>
>
> ---
> Changes since v1:
> * Add encoding='utf8'
> * Only do an explicit encode for python2. On python3 this returns a
> bytes object which formats to b'BLAH' instead.
> * Elaborate commit message explaining what's wrong. The original patch
> was hacked together while debugging something else.
>
> Link: https://lkml.org/lkml/2017/6/23/405
> Signed-off-by: Leonard Crestez <leonard.crestez@xxxxxxx>
> ---
> scripts/gdb/linux/dmesg.py | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
> index f5a0303..6d2e09a 100644
> --- a/scripts/gdb/linux/dmesg.py
> +++ b/scripts/gdb/linux/dmesg.py
> @@ -12,6 +12,7 @@
> #
>
> import gdb
> +import sys
>
> from linux import utils
>
> @@ -52,13 +53,19 @@ class LxDmesg(gdb.Command):
> continue
>
> text_len = utils.read_u16(log_buf[pos + 10:pos + 12])
> - text = log_buf[pos + 16:pos + 16 + text_len].decode()
> + text = log_buf[pos + 16:pos + 16 + text_len].decode(
> + encoding='utf8', errors='replace')
> time_stamp = utils.read_u64(log_buf[pos:pos + 8])
>
> for line in text.splitlines():
> - gdb.write("[{time:12.6f}] {line}\n".format(
> + msg = u"[{time:12.6f}] {line}\n".format(
> time=time_stamp / 1000000000.0,
> - line=line))
> + line=line)
> + # With python2 gdb.write will attempt to convert unicode to
> + # ascii and might fail so pass an utf8-encoded str instead.
> + if sys.hexversion < 0x03000000:
> + msg = msg.encode(encoding='utf8', errors='replace')
> + gdb.write(msg)
>
> pos += length
>
>
Acked-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
Andrew, please pick this up.
Jan
--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux