Re: [PATCH] net/9p: fix race condition on rdma->state in trans_rdma.c

From: Dominique Martinet

Date: Mon Jun 15 2026 - 09:22:52 EST


Yizhou Zhao wrote on Fri, May 29, 2026 at 03:39:31PM +0800:
> The rdma->state field is modified without holding req_lock in both
> recv_done() and p9_cm_event_handler(), while rdma_request() accesses
> the same field under the req_lock spinlock. This inconsistent locking
> creates a race condition:
>
> - recv_done() running in softirq completion context sets
> rdma->state = P9_RDMA_FLUSHING without acquiring req_lock
>
> - p9_cm_event_handler() modifies rdma->state at multiple points
> (ADDR_RESOLVED, ROUTE_RESOLVED, ESTABLISHED, CLOSED) without
> req_lock
>
> - rdma_request() uses spin_lock_irqsave(&rdma->req_lock, flags) to
> protect the read-modify-write of rdma->state
>
> The race can cause lost state transitions: recv_done() or the CM
> event handler could set state to FLUSHING/CLOSED while rdma_request()
> is concurrently checking or modifying state under the lock, leading to
> the FLUSHING transition being silently overwritten by CLOSING. This
> corrupts the connection state machine and can cause use-after-free on
> RDMA request objects during teardown.
>
> Fix by adding req_lock protection to all rdma->state modifications in
> recv_done() and p9_cm_event_handler(), matching the pattern already
> used in rdma_request(). Use spin_lock_irqsave/spin_unlock_irqrestore
> in the CM event handler since it can race with recv_done() which runs
> in softirq context.
>
> Tested with a kernel module that races two threads (simulating
> rdma_request and recv_done/CM handler) on rdma->state with proper
> locking: 5.5M+ FLUSHING writes over 27M iterations with 0 lost
> transitions.
>
> Fixes: 473c7dd1d7b5 ("9p/rdma: remove useless check in cm_event_handler")
> Reported-by: Yizhou Zhao <zhaoyz24@xxxxxxxxxxxxxxxxxxxxx>
> Reported-by: Yuxiang Yang <yangyx22@xxxxxxxxxxxxxxxxxxxxx>
> Reported-by: Ao Wang <wangao@xxxxxxxxxx>
> Reported-by: Xuewei Feng <fengxw06@xxxxxxx>
> Reported-by: Qi Li <qli01@xxxxxxxxxxxxxxx>
> Reported-by: Ke Xu <xuke@xxxxxxxxxxxxxxx>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@xxxxxxxxxxxxxxxxxxxxx>

None of this is frequent so taking lock is sound, picking this up

--
Dominique