Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

From: Russell Currey
Date: Thu Apr 16 2020 - 22:46:36 EST


On Thu, 2020-04-16 at 22:40 -0400, Qian Cai wrote:
> > On Apr 16, 2020, at 10:27 PM, Russell Currey <ruscur@xxxxxxxxxx>
> > wrote:
> >
> > Reverting the patch with the given config will have the same effect
> > as
> > STRICT_KERNEL_RWX=n. Not discounting that it could be a bug on the
> > powerpc side (i.e. relocatable kernels with strict RWX on haven't
> > been
> > exhaustively tested yet), but we should definitely figure out
> > what's
> > going on with this bad access first.
>
> BTW, this bad access only happened once. The overwhelming rest of
> crashes are with NULL pointer NIP like below. How can you explain
> that STRICT_KERNEL_RWX=n would also make those NULL NIP disappear if
> STRICT_KERNEL_RWX is just a messenger?

What happens if you test with STRICT_KERNEL_RWX=y and RELOCATABLE=n,
reverting my patch? This would give us an idea of whether it's
something broken recently or if there's something else going on.

>
> [ 215.281666][T16896] LTP: starting chown04_16
> [ 215.424203][T18297] BUG: Unable to handle kernel instruction fetch
> (NULL pointer?)
> [ 215.424289][T18297] Faulting instruction address: 0x00000000
> [ 215.424313][T18297] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 215.424341][T18297] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256
> DEBUG_PAGEALLOC NUMA PowerNV
> [ 215.424383][T18297] Modules linked in: loop kvm_hv kvm ip_tables
> x_tables xfs sd_mod bnx2x mdio tg3 ahci libahci libphy libata
> firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [ 215.424459][T18297] CPU: 85 PID: 18297 Comm: chown04_16 Tainted:
> G W 5.6.0-next-20200405+ #3
> [ 215.424489][T18297] NIP: 0000000000000000 LR: c00800000fbc0408
> CTR: 0000000000000000
> [ 215.424530][T18297] REGS: c000200b8606f990 TRAP: 0400 Tainted:
> G W (5.6.0-next-20200405+)
> [ 215.424570][T18297] MSR: 9000000040009033
> <SF,HV,EE,ME,IR,DR,RI,LE> CR: 84000248 XER: 20040000
> [ 215.424619][T18297] CFAR: c00800000fbc64f4 IRQMASK: 0
> [ 215.424619][T18297] GPR00: c0000000006c2238 c000200b8606fc20
> c00000000165ce00 0000000000000000
> [ 215.424619][T18297] GPR04: c000201a58106400 c000200b8606fcc0
> 000000005f037e7d ffffffff00013bfb
> [ 215.424619][T18297] GPR08: c000201a58106400 0000000000000000
> 0000000000000000 c000000001652ee0
> [ 215.424619][T18297] GPR12: 0000000000000000 c000201fff69a600
> 0000000000000000 0000000000000000
> [ 215.424619][T18297] GPR16: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> [ 215.424619][T18297] GPR20: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000007
> [ 215.424619][T18297] GPR24: 0000000000000000 0000000000000000
> c00800000fbc8688 c000200b8606fcc0
> [ 215.424619][T18297] GPR28: 0000000000000000 000000007fffffff
> c00800000fbc0400 c00020068b8c0e70
> [ 215.424914][T18297] NIP [0000000000000000] 0x0
> [ 215.424953][T18297] LR [c00800000fbc0408] find_free_cb+0x8/0x30
> [loop]
> find_free_cb at drivers/block/loop.c:2129
> [ 215.424997][T18297] Call Trace:
> [ 215.425036][T18297] [c000200b8606fc20] [c0000000006c2290]
> idr_for_each+0xf0/0x170 (unreliable)
> [ 215.425073][T18297] [c000200b8606fca0] [c00800000fbc2744]
> loop_lookup.part.2+0x4c/0xb0 [loop]
> loop_lookup at drivers/block/loop.c:2144
> [ 215.425105][T18297] [c000200b8606fce0] [c00800000fbc3558]
> loop_control_ioctl+0x120/0x1d0 [loop]
> [ 215.425149][T18297] [c000200b8606fd40] [c0000000004eb688]
> ksys_ioctl+0xd8/0x130
> [ 215.425190][T18297] [c000200b8606fd90] [c0000000004eb708]
> sys_ioctl+0x28/0x40
> [ 215.425233][T18297] [c000200b8606fdb0] [c00000000003cc30]
> system_call_exception+0x110/0x1e0
> [ 215.425274][T18297] [c000200b8606fe20] [c00000000000c9f0]
> system_call_common+0xf0/0x278
> [ 215.425314][T18297] Instruction dump:
> [ 215.425338][T18297] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX XXXXXXXX
> [ 215.425374][T18297] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX XXXXXXXX
> [ 215.425422][T18297] ---[ end trace ebed248fad431966 ]---
> [ 215.642114][T18297]
> [ 216.642220][T18297] Kernel panic - not syncing: Fatal exception