Re: [PATCH v3] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Damien Le Moal
Date: Wed May 27 2026 - 14:12:45 EST
On 2026/05/27 20:29, Tetsuo Handa wrote:
> On 2026/05/27 12:00, Ming Lei wrote:
>> On Wed, May 27, 2026 at 10:35:56AM +0900, Tetsuo Handa wrote:
>>> On 2026/05/27 10:20, Ming Lei wrote:
>>>>> Of course we should try to figure out the root cause first, but how can we do?
>>>>
>>>> Definitely unexpected write IO(after umount & loop closed) from btrfs is more serious,
>>>> which may cause data loss, so CC btrfs list and maintainer.
>>>
>
> I had a conversation with Google AI mode, and received the following response.
>
> --------------------------------------------------------------------------------
> Technical Analysis: lo_rw_aio Null Pointer Dereference / UAF since v7.1-rc1
>
>
> 1. The Root Cause of the Timing Shift
>
> This regression was introduced during the v7.1-rc1 merge window, primarily exposed by
> Commit 65565ca5f99b ("block: unify the synchronous bi_end_io callbacks"), along with
> helper refactorings like Commit 92c3737a2473 ("block: add a bio_submit_or_kill helper").
>
> Prior to v7.0, the synchronous I/O completion path inherently contained execution lags (due
> to serialized completion handling and context switches) before notifying upper layers. This
> latency accidentally acted as a natural safety barrier. It ensured that by the time a file
> system completed its final sync_filesystem() and initiated umount, the loop driver's internal
> workqueue (lo_rw_aio) had already finished processing everything.
>
> In v7.1, the unification and optimization of bi_end_io significantly minimized this latency.
> The filesystem now learns of "I/O completion" much faster. Consequently, highly-concurrent
> execution pipelines like btrfs or jfs proceed rapidly through kill_sb() and blkdev_put(),
> ultimately invoking lo_release() -> __loop_clr_fd() while the loop driver's backend kworker
> is still in the middle of executing the last sub-millisecond asynchronous file-backed I/O
> request.
>
>
> 2. Why the Block Layer's Built-in Quiesce/Freeze Fails
>
> There is an implicit assumption that standard block layer freeze mechanisms (blk_mq_freeze_queue())
> protect the device lifetime during release. However, the v7.1 BIO helper refactoring introduced
> a synchronization gap:
>
> 1. The filesystem triggers its final metadata or journal updates (e.g., txCommit in jfs or
> delayed refcount updates in btrfs) right during the unmount/close boundary.
> 2. Due to the optimized execution path, these requests bypass the block layer's active
> request-tracking metrics at the exact moment blk_mq_freeze_queue() or state validation
> checks evaluated them as zero.
> 3. The block layer assumes the queue is safe and silent, allowing __loop_clr_fd() to
> progress and nullify lo->lo_backing_file (or trigger fput()).
> 4. The leaked asynchronous kworker wakes up a fraction of a millisecond too late, attempts
> to access lo->lo_backing_file or invokes kiocb_end_write() -> file_inode(), leading to
> either a general protection fault (Null pointer dereference) or a Use-After-Free (UAF).
>
>
> 3. Why This Isn't Just an "Unexpected FS Bug"
>
> While the write I/O originates from file systems like btrfs and jfs post-close, blaming the
> file systems entirely ignores the underlying infrastructure change. The core issue is that the
> block layer altered its synchronization behavior, breaking the barrier contract that
> VFS and file systems historically relied on during the device release path.
>
> Papering over this inside individual file systems would require adding heavy, duplicated
> barriers inside every single filesystem's unmount path.
It sounds like the VFS unmount call needs to have something that waits for
sync() to complete. Though, it really feels very strange that an FS can complete
unmount without itself ensuring that there are no more IOs in flight. The
generic VFS layer cannot know what the FS needs to flush on unmount, so waiting
on a generic sync might not be enough.
It really feels like this is a btrfs and jfs issue, unless the same can be
reproduced with any file system (XFS, ext4, f2fs, ...).
Just my 2 cents.
--
Damien Le Moal
Western Digital Research