Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()

From: Tetsuo Handa

Date: Thu May 28 2026 - 20:16:33 EST


On 2026/05/29 8:00, Hillf Danton wrote:
>> Given the loop workqueue that triggered the jfs warning, can you specify
>> the reason why the workqueue in question is NOT flushed while closing disk?
>>
> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
> And the deadlock can be reproduced by flushing the loop workqueue with
> disk->open_mutex held [1].
>
> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@xxxxxxxxxx/

We can avoid the following lockdep warnings (including [1] you mentioned)

https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4

caused by "drain_workqueue() with disk->open_mutex held" if we assign
caller-specific lockdep class to disk->open_mutex

https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/

.

Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
in the loop driver

https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@xxxxxxxxxxxxxxxxxxx

which we can reproduce with

echo 7:0 > /sys/power/resume
losetup /dev/loop0 /sys/power/resume
cat /dev/loop0 > /dev/null
losetup -d /dev/loop0

.

Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
held" in the loop driver side.



However, the possibility that the last milli-second writeback request
(which runs during unmount operation) from filesystem fails due to

if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
return BLK_STS_IOERR;

check in loop_queue_rq() will remain. Therefore, addressing this problem
within individual filesystem will be more strict solution. But guessing from
the pace jfs fixes bugs, it would take long time before we stop seeing
this problem...