Re: [PATCH vfio] pds/vfio: Fix possible sleep while in atomic context

From: Brett Creeley
Date: Wed Sep 13 2023 - 13:57:36 EST


On 9/13/2023 10:51 AM, Dan Carpenter wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Wed, Sep 13, 2023 at 10:42:38AM -0700, Brett Creeley wrote:
The driver could possibly sleep while in atomic context resulting
in the following call trace while CONFIG_DEBUG_ATOMIC_SLEEP=y is
set:

[ 675.116953] BUG: spinlock bad magic on CPU#2, bash/2481
[ 675.116966] lock: 0xffff8d6052a88f50, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 675.116978] CPU: 2 PID: 2481 Comm: bash Tainted: G S 6.6.0-rc1-next-20230911 #1
[ 675.116986] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021
[ 675.116991] Call Trace:
[ 675.116997] <TASK>
[ 675.117002] dump_stack_lvl+0x36/0x50
[ 675.117014] do_raw_spin_lock+0x79/0xc0
[ 675.117032] pds_vfio_reset+0x1d/0x60 [pds_vfio_pci]
[ 675.117049] pci_reset_function+0x4b/0x70
[ 675.117061] reset_store+0x5b/0xa0
[ 675.117074] kernfs_fop_write_iter+0x137/0x1d0
[ 675.117087] vfs_write+0x2de/0x410
[ 675.117101] ksys_write+0x5d/0xd0
[ 675.117111] do_syscall_64+0x3b/0x90
[ 675.117122] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 675.117135] RIP: 0033:0x7f9ebbd1fa28
[ 675.117141] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 15 4d 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 675.117148] RSP: 002b:00007ffdff410728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 675.117156] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f9ebbd1fa28
[ 675.117161] RDX: 0000000000000002 RSI: 000055ffc5fdf7c0 RDI: 0000000000000001
[ 675.117166] RBP: 000055ffc5fdf7c0 R08: 000000000000000a R09: 00007f9ebbd7fae0
[ 675.117170] R10: 000000000000000a R11: 0000000000000246 R12: 00007f9ebbfc06e0
[ 675.117174] R13: 0000000000000002 R14: 00007f9ebbfbb860 R15: 0000000000000002
[ 675.117180] </TASK>

This splat doesn't match the sleeping in atomic bug at all. That
warning should have said, "BUG: sleeping function called from invalid
context" and the stack trace would have looked totally different.

I don't have a problem with the patch itself, that seems reasonable. I
really like that you tested it but you're running into a different
bug here. Hopefully, you just pasted the wrong splat but otherwise we
need to investigate this other "bad magic" bug.

Hmm, good catch. Let me double check this and get back to you.

Thanks for the quick response and review.

Brett


regards,
dan carpenter