Re: [PATCH] ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery()

From: Li Nan
Date: Sat Jun 08 2024 - 02:35:10 EST




在 2024/6/6 17:52, Ming Lei 写道:
On Thu, Jun 06, 2024 at 04:05:33PM +0800, Li Nan wrote:


在 2024/6/6 12:48, Changhui Zhong 写道:

[...]


Hi Changhui,

The hang is actually expected because recovery fails.

Please pull the latest ublksrv and check if the issue can still be
reproduced:

https://github.com/ublk-org/ublksrv

BTW, one ublksrv segfault and two test cleanup issues are fixed.

Thanks,
Ming


Hi,Ming and Nan

after applying the new patch and pulling the latest ublksrv,
I ran the test for 4 hours and did not observe any task hang.
the test results looks good!

Thanks,
Changhui


.

Thanks for you test!

However, I got a NULL pointer dereference bug with ublksrv. It is not

BTW, your patch isn't related with generic/004 which won't touch
recovery code path.

introduced by this patch. It seems io was issued after deleting disk. And
it can be reproduced by:

while true; do make test T=generic/004; done

We didn't see that when running such test with linus tree, and usually
Changhui run generic test for hours.


[ 1524.286485] running generic/004
[ 1529.110875] blk_print_req_error: 109 callbacks suppressed
...
[ 1541.171010] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1541.171734] #PF: supervisor write access in kernel mode
[ 1541.172271] #PF: error_code(0x0002) - not-present page
[ 1541.172798] PGD 0 P4D 0
[ 1541.173065] Oops: Oops: 0002 [#1] PREEMPT SMP
[ 1541.173515] CPU: 0 PID: 43707 Comm: ublk Not tainted
6.9.0-next-20240523-00004-g9bc7e95c7323 #454
[ 1541.174417] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
[ 1541.175311] RIP: 0010:io_fallback_tw+0x252/0x300

This one looks one io_uring issue.

Care to provide which line of source code points to by 'io_fallback_tw+0x252'?

gdb> l *(io_fallback_tw+0x252)

(gdb) list * io_fallback_tw+0x252
0xffffffff81d79dc2 is in io_fallback_tw (./arch/x86/include/asm/atomic64_64.h:25).
20 __WRITE_ONCE(v->counter, i);
21 }
22
23 static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
24 {
25 asm volatile(LOCK_PREFIX "addq %1,%0"
26 : "=m" (v->counter)
27 : "er" (i), "m" (v->counter) : "memory");
28 }

The corresponding code is:
io_fallback_tw
percpu_ref_get(&last_ctx->refs);

I have the vmcore of this issue. If you have any other needs, please let me
know.

--
Thanks,
Nan