Re: kernel BUG at /usr/src/kernel/lib/dynamic_debug.c:267!

From: Petr Mladek
Date: Wed Sep 09 2020 - 04:00:45 EST


On Wed 2020-09-09 14:47:45, Masami Hiramatsu wrote:
> Hi Naresh,
>
> Thanks for reporting, it seems that you have run the kselftests/livepatch/test-livepatch.sh.
> Then, I think it is better to report to Livepatch maintainers too. (I Cc'd them)
>
> Thank you,
>
> On Wed, 9 Sep 2020 10:24:09 +0530
> Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
>
> > While testing livepatch test cases on x86_64 with Linux next 20200908 tag kernel
> > this kernel BUG noticed several times.
> >
> > metadata:
> > git branch: master
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > git commit: dff9f829e5b0181d4ed9d35aa62d695292399b54
> > git describe: next-20200908
> > kernel-config:
> > http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/853/config
> >
> > kernel BUG log,
> > ----------------------
> > [ 634.063970] % rmmod test_klp_livepatch
> > [ 634.114787] test_klp_atomic_replace: this has been live patched
> > [ 634.121953] % echo 0 > /sys/kernel/livepatch/test_klp_atomic_replace/enabled
> > [ 634.129391] livepatch: 'test_klp_atomic_replace': starting
> > unpatching transition
> > [ 634.143990] livepatch: 'test_klp_atomic_replace': unpatching complete
> > [ 634.156223] % rmmod test_klp_atomic_replace
> > [ 634.235451] ------------[ cut here ]------------
> > [ 634.240314] kernel BUG at /usr/src/kernel/lib/dynamic_debug.c:267!
> > [ 634.246584] invalid opcode: 0000 [#1] SMP PTI
> > [ 634.250955] CPU: 0 PID: 12791 Comm: test-livepatch. Tainted: G
> > W K 5.9.0-rc4-next-20200908 #1
> > [ 634.260615] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.2 05/23/2018
> > [ 634.268007] RIP: 0010:ddebug_exec_query+0x77b/0xb90
> > [ 634.272886] Code: 4c 89 ad 70 ff ff ff e9 db fb ff ff b8 03 00 00
> > 00 e9 20 fb ff ff b8 02 00 00 00 e9 16 fb ff ff b8 01 00 00 00 e9 0c
> > fb ff ff <0f> 0b 31 c0 e9 03 fb ff ff 49 89 f4 48 89 f7 e9 78 f9 ff ff
> > 8b 15
> > [ 634.291630] RSP: 0018:ffff9f0c80a5bd18 EFLAGS: 00010246
> > [ 634.296856] RAX: 000000000000003d RBX: 0000000000000000 RCX: 0000000000000000
> > [ 634.303987] RDX: 000000000000003d RSI: 0000000000000000 RDI: ffff90db906583ec
> > [ 634.311111] RBP: ffff9f0c80a5bde8 R08: 000000000000000a R09: 000000000000003b
> > [ 634.318236] R10: ffff90db92610000 R11: 0000000000000246 R12: ffff90db906583ec
> > [ 634.325368] R13: ffffffffbe87cbc0 R14: 0000000000000000 R15: 0000000000000004
> > [ 634.332500] FS: 00007fd37249a740(0000) GS:ffff90dbefa00000(0000)
> > knlGS:0000000000000000
> > [ 634.340578] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 634.346315] CR2: 0000000000e6d00c CR3: 000000026a4b8004 CR4: 00000000003706f0
> > [ 634.353446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 634.360570] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 634.367693] Call Trace:
> > [ 634.370139] ? lock_acquire+0xa6/0x390
> > [ 634.373892] ? __might_fault+0x34/0x80
> > [ 634.377648] ddebug_exec_queries+0x6e/0x140
> > [ 634.381831] ddebug_proc_write+0x4b/0xa0
> > [ 634.385756] full_proxy_write+0x5f/0x90
> > [ 634.389595] vfs_write+0xed/0x240
> > [ 634.392915] ksys_write+0xad/0xf0
> > [ 634.396233] ? syscall_trace_enter+0x17a/0x240
> > [ 634.400670] __x64_sys_write+0x1a/0x20
> > [ 634.404416] do_syscall_64+0x37/0x50
> > [ 634.407993] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 634.413038] RIP: 0033:0x7fd371b84144
> > [ 634.416617] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00
> > 00 00 00 00 66 90 48 8d 05 c1 e7 2c 00 8b 00 85 c0 75 13 b8 01 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
> > f5 53
> > [ 634.435362] RSP: 002b:00007ffd19447658 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000001
> > [ 634.442928] RAX: ffffffffffffffda RBX: 00000000000000bc RCX: 00007fd371b84144
> > [ 634.450059] RDX: 00000000000000bc RSI: 0000000000e6cf70 RDI: 0000000000000001
> > [ 634.457181] RBP: 0000000000e6cf70 R08: 00000000000000e9 R09: 0000000000000000
> > [ 634.464305] R10: 00007ffd19447c48 R11: 0000000000000246 R12: 00007fd371e4e760
> > [ 634.471429] R13: 00000000000000bc R14: 00007fd371e49760 R15: 00000000000000bc
> > [ 634.478559] Modules linked in: trace_printk sch_fq 8021q
> > iptable_filter xt_mark ip_tables cls_bpf sch_ingress veth algif_hash
> > x86_pkg_temp_thermal fuse [last unloaded: test_klp_atomic_replace]
> > [ 634.495695] ---[ end trace d04d7e11bd1458bf ]---
> > [ 634.500333] RIP: 0010:ddebug_exec_query+0x77b/0xb90
> > [ 634.505218] Code: 4c 89 ad 70 ff ff ff e9 db fb ff ff b8 03 00 00
> > 00 e9 20 fb ff ff b8 02 00 00 00 e9 16 fb ff ff b8 01 00 00 00 e9 0c
> > fb ff ff <0f> 0b 31 c0 e9 03 fb ff ff 49 89 f4 48 89 f7 e9 78 f9 ff ff
> > 8b 15
> > [ 634.523969] RSP: 0018:ffff9f0c80a5bd18 EFLAGS: 00010246
> > [ 634.529197] RAX: 000000000000003d RBX: 0000000000000000 RCX: 0000000000000000
> > [ 634.536330] RDX: 000000000000003d RSI: 0000000000000000 RDI: ffff90db906583ec
> > [ 634.543462] RBP: ffff9f0c80a5bde8 R08: 000000000000000a R09: 000000000000003b
> > [ 634.550593] R10: ffff90db92610000 R11: 0000000000000246 R12: ffff90db906583ec
> > [ 634.557727] R13: ffffffffbe87cbc0 R14: 0000000000000000 R15: 0000000000000004
> > [ 634.564869] FS: 00007fd37249a740(0000) GS:ffff90dbefa00000(0000)
> > knlGS:0000000000000000
> > [ 634.572953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 634.578699] CR2: 0000000000e6d00c CR3: 000000026a4b8004 CR4: 00000000003706f0
> > [ 634.585829] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 634.592964] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

It is caused by the commit 42f07816ac0cc79792811 ("dyndbg:
fix problem parsing format="foo bar"). It modified the condition
when the above BUG() gets triggered.

It has even been reported by the kernel test robot, see
https://lore.kernel.org/lkml/20200902074124.GP4299@shao2-debian/


It is triggered when the livepatching selftest is trying to restore the
original setting of dynamic debug messages in the livepatch framework.

It can get reliably reproduced by the following call:

$> echo -n 'file kernel/livepatch/transition.c line 586 =_' > \
/sys/kernel/debug/dynamic_debug/control


Adding people from the problematic patch into CC.

I haven't dived into the dynamic debug code yet. The logic might
be wrong.

Anyway, the BUG_ON() should get replaced by a normal error message.
There is no reason to panic() when a string can't be parsed !!!

Best Regards,
Petr