Re: [PATCH v2] loop: Fix NULL pointer dereference by synchronizing lo_release and loop_queue_rq

From: Tetsuo Handa

Date: Tue May 19 2026 - 05:31:09 EST


On 2026/05/19 9:40, Andrew Morton wrote:
> AI review asked a couple of questions:
> https://sashiko.dev/#/patchset/9b2032d6-3f36-4d2b-8128-985c08a4fa37@xxxxxxxxxxxxxxxxxxx

To: gemini/gemini-3.1-pro-preview

Thank you for your valuable feedback. Your point about asynchronous I/O completing after drain_workqueue()
and potentially causing a UAF at file_inode() from kiocb_end_write() from lo_rw_aio_do_completion() is correct.
The drain_workqueue() alone does not wait for in-flight AIOs that have already returned -EIOCBQUEUED. However,
I'm not convinced that use of blk_mq_freeze_queue() inside __loop_clr_fd() where disk->open_mutex was already
held by bdev_release() is absolutely deadlock-free.

1. VFS and Block Layer Lock Contention:
__loop_clr_fd() is exclusively invoked from the lo_release() path during the final close of the device.
At this stage, the block layer is holding disk->open_mutex. If we call blk_mq_freeze_queue() here, it will
synchronously wait for all in-flight AIOs to complete. However, the completion paths of those in-flight AIOs
(or subsequent metadata processing in the underlying filesystem) may attempt to acquire resources or execute
code paths that depend on the very same device state or open/close status. This creates a circular dependency,
leading to an unrecoverable hang.

2. Memory Reclaim Deadlock:
blk_mq_freeze_queue() blocks until the queue's usage counter drops to zero. If an in-flight AIO requires memory
allocation for metadata updates upon completion, and the system is under heavy memory pressure, it can trigger
direct memory reclaim. If the reclaim path attempts to sync other buffers or interact with the frozen loop
device/queue, a circular deadlock occurs.

Therefore, I would like to choose SRCU-based synchronization instead of blk_mq_freeze_queue().

* Locking: We call srcu_read_lock(&loop_io_srcu) only for asynchronous paths (cmd->use_aio) immediately
before submitting the I/O to the underlying filesystem in lo_rw_aio().

* Unlocking: The reader lock is released via srcu_read_unlock() at the very end of the AIO completion handler
(lo_rw_aio_do_completion()).

* Synchronization: We place synchronize_srcu(&loop_io_srcu) immediately after drain_workqueue() in __loop_clr_fd().

I think that this guarantees that __loop_clr_fd() safely blocks until all pending AIO callbacks are 100% completed,
fully eliminating the UAF risk and ensuring the safety of the subsequent mapping_set_gfp_mask() and fput(), while
remaining entirely deadlock-free.

What do you think about this approach?

drivers/block/loop.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0000913f7efc..7c3961f3cbc9 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -80,6 +80,7 @@ struct loop_cmd {
struct list_head list_entry;
bool use_aio; /* use AIO interface to handle I/O */
atomic_t ref; /* only for aio */
+ int srcu_idx;
long ret;
struct kiocb iocb;
struct bio_vec *bvec;
@@ -93,6 +94,7 @@ struct loop_cmd {
static DEFINE_IDR(loop_index_idr);
static DEFINE_MUTEX(loop_ctl_mutex);
static DEFINE_MUTEX(loop_validate_mutex);
+DEFINE_SRCU(loop_io_srcu);

/**
* loop_global_lock_killable() - take locks for safe loop_validate_file() test
@@ -327,6 +329,8 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
kiocb_end_write(&cmd->iocb);
if (likely(!blk_should_fake_timeout(rq->q)))
blk_mq_complete_request(rq);
+ if (cmd->use_aio)
+ srcu_read_unlock(&loop_io_srcu, cmd->srcu_idx);
}

static void lo_rw_aio_complete(struct kiocb *iocb, long ret)
@@ -392,6 +396,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
if (cmd->use_aio) {
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
+ cmd->srcu_idx = srcu_read_lock(&loop_io_srcu);
} else {
cmd->iocb.ki_complete = NULL;
cmd->iocb.ki_flags = 0;
@@ -1118,6 +1123,22 @@ static void __loop_clr_fd(struct loop_device *lo)
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;

+ /*
+ * Now that loop_queue_rq() sees lo->lo_state != Lo_bound,
+ * wait for already started loop_queue_rq() to complete.
+ */
+ synchronize_rcu();
+ /*
+ * Now that no more works are scheduled by loop_queue_rq(),
+ * wait for already scheduled works to complete.
+ */
+ drain_workqueue(lo->workqueue);
+ /*
+ * Now that no more AIO requests are scheduled by lo_rw_aio(),
+ * wait for already started AIO to complete.
+ */
+ synchronize_srcu(&loop_io_srcu);
+
spin_lock_irq(&lo->lo_lock);
filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;