[PATCH v2 2/2] scripts/gdb: lx-dmesg: Use explicit encoding=utf8 errors=replace

From: Leonard Crestez
Date: Mon Jun 26 2017 - 08:53:25 EST


Use errors=replace because it is never desirable for lx-dmesg to fail on
string decoding errors, not even if the log buffer is corrupt and we show
incorrect info.

The kernel will sometimes print utf8, for example the copyright symbol from
jffs2. In order to make this work specify 'utf8' everywhere because python2
otherwise defaults to 'ascii'.

In theory the second errors='replace' is not be required because everything
that can be decoded as utf8 should also be encodable back to utf8. But
it's better to be extra safe here. It's worth noting that this is
definitely not true for encoding='ascii', unknown characters are
replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back
to ascii.

Signed-off-by: Leonard Crestez <leonard.crestez@xxxxxxx>

---
Changes since v1:
* Add encoding='utf8'
* Only do an explicit encode for python2. On python3 this returns a
bytes object which formats to b'BLAH' instead.
* Elaborate commit message explaining what's wrong. The original patch
was hacked together while debugging something else.

Link: https://lkml.org/lkml/2017/6/23/405
Signed-off-by: Leonard Crestez <leonard.crestez@xxxxxxx>
---
scripts/gdb/linux/dmesg.py | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index f5a0303..6d2e09a 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -12,6 +12,7 @@
#

import gdb
+import sys

from linux import utils

@@ -52,13 +53,19 @@ class LxDmesg(gdb.Command):
continue

text_len = utils.read_u16(log_buf[pos + 10:pos + 12])
- text = log_buf[pos + 16:pos + 16 + text_len].decode()
+ text = log_buf[pos + 16:pos + 16 + text_len].decode(
+ encoding='utf8', errors='replace')
time_stamp = utils.read_u64(log_buf[pos:pos + 8])

for line in text.splitlines():
- gdb.write("[{time:12.6f}] {line}\n".format(
+ msg = u"[{time:12.6f}] {line}\n".format(
time=time_stamp / 1000000000.0,
- line=line))
+ line=line)
+ # With python2 gdb.write will attempt to convert unicode to
+ # ascii and might fail so pass an utf8-encoded str instead.
+ if sys.hexversion < 0x03000000:
+ msg = msg.encode(encoding='utf8', errors='replace')
+ gdb.write(msg)

pos += length

--
2.7.4