Re: [PATCH] 9p/client: fix data race on req->status

From: Dominique Martinet
Date: Fri Dec 09 2022 - 16:13:10 EST


Christian Schoenebeck wrote on Fri, Dec 09, 2022 at 02:45:51PM +0100:
> > > What about p9_tag_alloc()?
> >
> > I think that one's ok: it happens during the allocation before the
> > request is enqueued in the idr, so it should be race-free by defition.
> >
> > tools/memory-model/Documentation/access-marking.txt says
> > "Initialization-time and cleanup-time accesses" should use plain
> > C-language accesses, so I stuck to that.
>
> When it is allocated then it is safe, but the object may also come from a pool
> here. It's probably not likely to cause an issue here, just saying.

If it comes from the pool then it is gated by the refcount... But that
would require a similar barrier indeed (init stuff, wmb, init refcount
// get req + check refcount, rmb, read stuff e.g. tag); just a
write_once would not help.

For the init side I assume unlocking c->lock acts as a write barrier
after tag is set, which is conveniently the last step, but we'd need a
read barrier here in tag lookup:
--------
diff --git a/net/9p/client.c b/net/9p/client.c
index fef6516a0639..68585ad9003c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -363,6 +363,7 @@ struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
*/
if (!p9_req_try_get(req))
goto again;
+ smp_rmb();
if (req->tc.tag != tag) {
p9_req_put(c, req);
goto again;
--------

OTOH this cannot happen with a normal server (a req should only be looked
up after it has been sent to the server and comes back, which involves a
few round trip and a few locks in the recv paths for tcp); but if syzbot
tries hard enough I guess that could be hit...
I don't have a strong opinion on this: I don't think anything really bad
can happen here as long as the refcount is correct (status is read under
lock when it matters before extra decrements of the refcount, and writes
to the buffer itself are safe from a memory pov), even if it's obviously
not correct strictly speaking.
(And I have no way of measuring what impact that extra barrier would have
tbh; for virtio at least lookup is actually never used...)

--
Dominique