Re: [cfs_trace_lock_tcd] BUG: unable to handle kernel NULL pointer dereference at 00000050

From: Fengguang Wu
Date: Wed Apr 18 2018 - 10:13:26 EST


Hi James,

On Wed, Apr 18, 2018 at 02:59:15PM +0100, James Simmons wrote:

Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It looks like a new regression.

[ 7.587002] lnet_selftest_init+0x2c4/0x5d9:
lnet_selftest_init at drivers/staging/lustre/lnet/selftest/module.c:134
[ 7.587002] ? lnet_selftest_exit+0x8d/0x8d:
lnet_selftest_init at drivers/staging/lustre/lnet/selftest/module.c:90

Are you running lnet selftest ?

Perhaps yes -- it's randconfig boot test and the .config does include
CONFIG_LNET_SELFTEST:

CONFIG_LNET=y
CONFIG_LNET_MAX_PAYLOAD=1048576
==> CONFIG_LNET_SELFTEST=y
CONFIG_LNET_XPRT_IB=y

Is this a UMP setup?

Yes, .config has:

# CONFIG_SMP is not set

The reason I ask is that their is a SMP handling bug in lnet
selftest. If you look at the mailing list I pushed a SMP patch
series. Can you try that series and tell me if it works for you.

So it looks your fixup patch is not for this case? Anyway the
reproduce-* script attached in the previous email should be fairly
straightforward to try out for reproducing the bug.

Thanks,
Fengguang