Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Hillf Danton
Date: Fri May 29 2026 - 03:08:40 EST
On Fri, 29 May 2026 09:14:47 +0900 Tetsuo Handa wrote:
>On 2026/05/29 8:00, Hillf Danton wrote:
>>> Given the loop workqueue that triggered the jfs warning, can you specify
>>> the reason why the workqueue in question is NOT flushed while closing disk?
>>>
>> Got it, the loop workqueue is NOT flushed to avoid deadlock, see d292dc80686a
>> ("loop: don't destroy lo->workqueue in __loop_clr_fd") for detail.
>> And the deadlock can be reproduced by flushing the loop workqueue with
>> disk->open_mutex held [1].
>>
>> [1] Subject: Re: [syzbot] possible deadlock in blkdev_put (3)
>> https://lore.kernel.org/lkml/000000000000ea753505da2658d5@xxxxxxxxxx/
>
>We can avoid the following lockdep warnings (including [1] you mentioned)
>
> https://syzkaller.appspot.com/bug?extid=2f62807dc3239b8f584e
> https://syzkaller.appspot.com/bug?extid=c4e9d077bcc86bee08dc
> https://syzkaller.appspot.com/bug?extid=0f427123ae84b3ba6dc7
> https://syzkaller.appspot.com/bug?extid=4feabfc9641267769c97
> https://syzkaller.appspot.com/bug?extid=fb0ff9bfe34ad282ebd4
>
>caused by "drain_workqueue() with disk->open_mutex held" if we assign
>caller-specific lockdep class to disk->open_mutex
>
> https://sourceforge.net/p/tomoyo/tomoyo.git/ci/c2245c765ebeba9dcb924d9171d8d470a9ac41c8/
>
>.
>
>Also, we can avoid lockdep warning caused by "drain_workqueue() with disk->open_mutex held" +
>"holding system_transition_mutex" if we forbid binding to pseudo files as backing file
>in the loop driver
>
> https://lkml.kernel.org/r/d38e4600-3c32-491f-aa49-905f4fad1bfb@xxxxxxxxxxxxxxxxxxx
>
>which we can reproduce with
>
> echo 7:0 > /sys/power/resume
> losetup /dev/loop0 /sys/power/resume
> cat /dev/loop0 > /dev/null
> losetup -d /dev/loop0
>
>.
>
>Therefore, I think we can address this problem by "drain_workqueue() with disk->open_mutex
>held" in the loop driver side.
>
Good news.
>
>
>However, the possibility that the last milli-second writeback request
>(which runs during unmount operation) from filesystem fails due to
>
> if (data_race(READ_ONCE(lo->lo_state)) != Lo_bound)
> return BLK_STS_IOERR;
>
>check in loop_queue_rq() will remain.
This conflicts with "There is no need to destroy the workqueue when
clearing unbinding a loop device from a backing file." in d292dc80686a
>Therefore, addressing this problem
>within individual filesystem will be more strict solution. But guessing from
Conflicts with "Another thing is, if it's some btrfs bios on-the-fly after
close_ctree(), the most common symptom should be NULL pointer
dereference inside various btrfs endio functions." [2] once more.
And you need to pay the fs guys more than two cents I think for cooking
a FIX.
[2] Subject: Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
https://lore.kernel.org/lkml/36571f8a-4df8-4152-b078-d82dbff4ad7e@xxxxxxxx/
>the pace jfs fixes bugs, it would take long time before we stop seeing
>this problem...