Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

From: Russell Currey
Date: Thu Apr 16 2020 - 22:27:49 EST


On Thu, 2020-04-16 at 22:17 -0400, Steven Rostedt wrote:
> On Thu, 16 Apr 2020 21:19:10 -0400
> Qian Cai <cai@xxxxxx> wrote:
>
> > OK, reverted the commit,
> >
> > c55d7b5e6426 (âpowerpc: Remove STRICT_KERNEL_RWX incompatibility
> > with RELOCATABLEâ)
> >
> > or set STRICT_KERNEL_RWX=n fixed the crash below and also mentioned
> > in this thread,
>
> This may be a symptom and not a cure.

Reverting the patch with the given config will have the same effect as
STRICT_KERNEL_RWX=n. Not discounting that it could be a bug on the
powerpc side (i.e. relocatable kernels with strict RWX on haven't been
exhaustively tested yet), but we should definitely figure out what's
going on with this bad access first.

>
> > https://lore.kernel.org/lkml/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@xxxxxx/
> >
> > [ 148.110969][T13115] LTP: starting chown04_16
> > [ 148.255048][T13380] kernel tried to execute exec-protected page
> > (c0000000016804ac) - exploit attempt? (uid: 0)
> > [ 148.255099][T13380] BUG: Unable to handle kernel instruction
> > fetch
> > [ 148.255122][T13380] Faulting instruction address:
> > 0xc0000000016804ac
> > [ 148.255136][T13380] Oops: Kernel access of bad area, sig: 11
> > [#1]
> > [ 148.255157][T13380] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256
> > DEBUG_PAGEALLOC NUMA PowerNV
> > [ 148.255171][T13380] Modules linked in: loop kvm_hv kvm xfs
> > sd_mod bnx2x mdio ahci tg3 libahci libphy libata firmware_class
> > dm_mirror dm_region_hash dm_log dm_mod
> > [ 148.255213][T13380] CPU: 45 PID: 13380 Comm: chown04_16 Tainted:
> > G W 5.6.0+ #7
> > [ 148.255236][T13380] NIP: c0000000016804ac LR: c00800000fa60408
> > CTR: c0000000016804ac
> > [ 148.255250][T13380] REGS: c0000010a6fafa00 TRAP: 0400 Tainted:
> > G W (5.6.0+)
> > [ 148.255281][T13380] MSR: 9000000010009033
> > <SF,HV,EE,ME,IR,DR,RI,LE> CR: 84000248 XER: 20040000
> > [ 148.255310][T13380] CFAR: c00800000fa66534 IRQMASK: 0
> > [ 148.255310][T13380] GPR00: c000000000973268 c0000010a6fafc90
> > c000000001648200 0000000000000000
> > [ 148.255310][T13380] GPR04: c000000d8a22dc00 c0000010a6fafd30
> > 00000000b5e98331 ffffffff00012c9f
> > [ 148.255310][T13380] GPR08: c000000d8a22dc00 0000000000000000
> > 0000000000000000 c00000000163c520
> > [ 148.255310][T13380] GPR12: c0000000016804ac c000001ffffdad80
> > 0000000000000000 0000000000000000
> > [ 148.255310][T13380] GPR16: 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000
> > [ 148.255310][T13380] GPR20: 0000000000000000 0000000000000000
> > 0000000000000000 0000000000000000
> > [ 148.255310][T13380] GPR24: 00007fff8f5e2e48 0000000000000000
> > c00800000fa6a488 c0000010a6fafd30
> > [ 148.255310][T13380] GPR28: 0000000000000000 000000007fffffff
> > c00800000fa60400 c000000efd0c6780
> > [ 148.255494][T13380] NIP [c0000000016804ac]
> > sysctl_net_busy_read+0x0/0x4
>
> The instruction pointer is on sysctl_net_busy_read? Isn't that data
> and
> not code?
>
> In net/socket.c:
>
> #ifdef CONFIG_NET_RX_BUSY_POLL
> unsigned int sysctl_net_busy_read __read_mostly;
> unsigned int sysctl_net_busy_poll __read_mostly;
> #endif
>
> -- Steve
>
>
> > [ 148.255516][T13380] LR [c00800000fa60408] find_free_cb+0x8/0x30
> > [loop]
> > [ 148.255528][T13380] Call Trace:
> > [ 148.255538][T13380] [c0000010a6fafc90] [c0000000009732c0]
> > idr_for_each+0xf0/0x170 (unreliable)
> > [ 148.255572][T13380] [c0000010a6fafd10] [c00800000fa626c4]
> > loop_lookup.part.1+0x4c/0xb0 [loop]
> > [ 148.255597][T13380] [c0000010a6fafd50] [c00800000fa634d8]
> > loop_control_ioctl+0x120/0x1d0 [loop]
> > [ 148.255623][T13380] [c0000010a6fafdb0] [c0000000004ddc08]
> > ksys_ioctl+0xd8/0x130
> > [ 148.255636][T13380] [c0000010a6fafe00] [c0000000004ddc88]
> > sys_ioctl+0x28/0x40
> > [ 148.255669][T13380] [c0000010a6fafe20] [c00000000000b378]
> > system_call+0x5c/0x68
> > [ 148.255699][T13380] Instruction dump:
> > [ 148.255718][T13380] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> > XXXXXXXX XXXXXXXX XXXXXXXX
> > [ 148.255744][T13380] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> > XXXXXXXX XXXXXXXX XXXXXXXX
> > [ 148.255772][T13380] ---[ end trace a5894a74208c22ec ]---
> > [ 148.576663][T13380]
> > [ 149.576765][T13380] Kernel panic - not syncing: Fatal exception
> >