[v2 PATCH 2/2] crypto/nx: Fix invalid wait context during kexec reboot

From: Vishal Chourasia
Date: Tue Oct 15 2024 - 06:58:35 EST


nx842_remove() call of_reconfig_notifier_unregister while holding the
devdata_spinlock. This could lead to an invalid wait context error during
kexec reboot, as of_reconfig_notifier_unregister tries to acquire a read-write
semaphore (check logs) while holding a spinlock.

Move the of_reconfig_notifier_unregister() call before acquiring the
spinlock to prevent this race condition invalid wait contexts during system
shutdown or kexec operations.

Log:

[ BUG: Invalid wait context ]
6.11.0-test2-10547-g684a64bf32b6-dirty #79 Not tainted
-----------------------------
kexec/61926 is trying to lock:
c000000002d8b590 ((of_reconfig_chain).rwsem){++++}-{4:4}, at: blocking_notifier_chain_unregister+0x44/0xa0
other info that might help us debug this:
context-{5:5}
4 locks held by kexec/61926:
#0: c000000002926c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xf8/0x2e0
#1: c00000000291af30 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x160/0x310
#2: c000000051011938 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x174/0x310
#3: c000000002d88070 (devdata_mutex){....}-{3:3}, at: nx842_remove+0xac/0x1bc
stack backtrace:
CPU: 2 UID: 0 PID: 61926 Comm: kexec Not tainted 6.11.0-test2-10547-g684a64bf32b6-dirty #79
Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_012) hv:phyp pSeries
Call Trace:
[c0000000bb577400] [c000000001239704] dump_stack_lvl+0xc8/0x130 (unreliable)
[c0000000bb577440] [c000000000248398] __lock_acquire+0xb68/0xf00
[c0000000bb577550] [c000000000248820] lock_acquire.part.0+0xf0/0x2a0
[c0000000bb577670] [c00000000127faa0] down_write+0x70/0x1e0
[c0000000bb5776b0] [c0000000001acea4] blocking_notifier_chain_unregister+0x44/0xa0
[c0000000bb5776e0] [c000000000e2312c] of_reconfig_notifier_unregister+0x2c/0x40
[c0000000bb577700] [c000000000ded24c] nx842_remove+0x148/0x1bc
[c0000000bb577790] [c00000000011a114] vio_bus_remove+0x54/0xc0
[c0000000bb5777c0] [c000000000c1a44c] device_shutdown+0x20c/0x310
[c0000000bb577850] [c0000000001b0ab4] kernel_restart_prepare+0x54/0x70
[c0000000bb577870] [c000000000308718] kernel_kexec+0xa8/0x110
[c0000000bb5778e0] [c0000000001b1144] __do_sys_reboot+0x214/0x2e0
[c0000000bb577a40] [c000000000032f98] system_call_exception+0x148/0x310
[c0000000bb577e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
--- interrupt: 3000 at 0x7fffa07e7df8
NIP: 00007fffa07e7df8 LR: 00007fffa07e7df8 CTR: 0000000000000000
REGS: c0000000bb577e80 TRAP: 3000 Not tainted (6.11.0-test2-10547-g684a64bf32b6-dirty)
MSR: 800000000280f033 CR: 48022484 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000058 00007ffff961f1e0 00007fffa08f7100 fffffffffee1dead
GPR04: 0000000028121969 0000000045584543 0000000000000000 0000000000000003
GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffa0a9b360 ffffffffffffffff 0000000000000000
GPR16: 0000000000000001 0000000000000002 0000000000000001 0000000000000001
GPR20: 000000011710f520 0000000000000000 0000000000000000 0000000000000001
GPR24: 0000000129be0480 0000000000000003 0000000000000003 00007ffff961f2b0
GPR28: 00000001170f2d30 00000001170f2d28 00007fffa08f18d0 0000000129be04a0
NIP [00007fffa07e7df8] 0x7fffa07e7df8
LR [00007fffa07e7df8] 0x7fffa07e7df8
--- interrupt: 3000

Suggested-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Signed-off-by: Vishal Chourasia <vishalc@xxxxxxxxxxxxx>
---
drivers/crypto/nx/nx-common-pseries.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/nx/nx-common-pseries.c b/drivers/crypto/nx/nx-common-pseries.c
index a0eb900383af7..1660c5cf3641c 100644
--- a/drivers/crypto/nx/nx-common-pseries.c
+++ b/drivers/crypto/nx/nx-common-pseries.c
@@ -1122,10 +1122,11 @@ static void nx842_remove(struct vio_dev *viodev)

crypto_unregister_alg(&nx842_pseries_alg);

+ of_reconfig_notifier_unregister(&nx842_of_nb);
+
spin_lock_irqsave(&devdata_spinlock, flags);
old_devdata = rcu_dereference_check(devdata,
lockdep_is_held(&devdata_spinlock));
- of_reconfig_notifier_unregister(&nx842_of_nb);
RCU_INIT_POINTER(devdata, NULL);
spin_unlock_irqrestore(&devdata_spinlock, flags);
synchronize_rcu();
--
2.47.0