Re: CVE-2025-40146: blk-mq: fix potential deadlock while nr_requests grown

From: Zheng Qixing

Date: Fri Nov 28 2025 - 04:44:34 EST

在 2025/11/28 15:15, Nilay Shroff 写道:

commit b86433721f46d934940528f28d49c1dedb690df1 (HEAD -> master)
Author: Yu Kuai <yukuai3@xxxxxxxxxx>
Date:   Wed Sep 10 16:04:43 2025 +0800

    blk-mq: fix potential deadlock while nr_requests grown

    Allocate and free sched_tags while queue is freezed can deadlock[1],
    this is a long term problem, hence allocate memory before freezing
    queue and free memory after queue is unfreezed.

    [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/
    Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")

    Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
    Reviewed-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

We are assume that what's the problem Yu describe is when we update
nr_request, we may need some memory allocation(nr_requests grows). And
the memory allocation may trigger some memory reclaim, and fall into
another I/O process, and since the request_queue has been freezen, there
exist deadlock.

But after checking the source code, there exist
queue_requests_store->blk_mq_freeze_queue->memalloc_noio_save, the
whole process which may trigger memory allocation won't trigger I/O
process. So deadlock can not happened... And if that's true, this patch
does not fix any problem.

Yes, memalloc_noio_save() is invoked before we freeze the queue (e.g., in
elv_iosched_store()), but that does not prevent the deadlock scenario described
in the lockdep splat.

If you look closely at the splat, the problematic lock is not fs_reclaim (which
may be the first impression), but rather ->pcpu_alloc_mutex. From the splat, the
chain of dependencies looks like this:

thread #0: blocked on q->elevator_lock
thread #1: blocked on ->pcpu_alloc_mutex
thread #2: blocked on fs-reclaim
Here is the key detail:

Thread #0 is running under GFP_NOIO scope (due to memalloc_noio_save()).
However, it is not blocked on fs_reclaim. Instead, it is blocked
on ->elevator_lock.

Thread #1 is also running with GFP_NOIO and holds ->elevator_lock
while the queue is frozen. It is blocked on ->pcpu_alloc_mutex,
which is already held by Thread #2 (the thread that is stuck in
fs_reclaim). Thread #2 is running without GFP_NOIO scope.

In other words:
- GFP_NOIO prevents a thread from entering fs_reclaim, but it does
not prevent triggering per-CPU memory allocations, which require
taking ->pcpu_alloc_mutex.
- This ->pcpu_alloc_mutex is the actual source of contention in the
splat, and it sits outside the protections offered by GFP_NOIO.

That means:
- Even though memalloc_noio_save() avoids fs reclaim recursion,
it does not prevent per-CPU allocations from blocking, and thus
it cannot prevent the deadlock involving ->pcpu_alloc_mutex.

Thank you for the detailed explanation.

Now I understand that there could indeed be a deadlock issue here :)

Thanks,

Qixing