Re: [PATCH] 9p: fix crash when transaction killed
From: asmadeus
Date: Wed Nov 30 2022 - 01:17:07 EST
(fixed Christophe's address, hopefully that will do for good...)
Schspa Shi wrote on Wed, Nov 30, 2022 at 10:22:44AM +0800:
> > I'm happy to believe we have a race somewhere (even if no sane server
> > would produce it), but right now I don't see it looking at the code.. :/
>
> And I think there is a race too. because the syzbot report about 9p fs
> memory corruption multi times.
Yes, no point in denying that :)
> As for the problem, the p9_tag_lookup only takes the rcu_read_lock when
> accessing the IDR, why it doesn't take the p9_client->lock? Maybe the
> root cause is that a lock is missing here.
It shouldn't need to, but happy to try adding it.
For the logic:
- idr_find is RCU-safe (trusting the comment above it)
- reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
This means that if we get a req from idr_find, even if it has just been
freed, it either is still in the state it was freed at (hence refcount
0, we ignore it) or is another req coming from the same cache (if
refcount isn't zero, we can check its tag)
The refcount itself is an atomic operation so doesn't require lock.
... And in the off chance I hadn't considered that we're already
dealing with a new request with the same tag here, we'll be updating
its status so another receive for it shouldn't use it?...
I don't think adding the client lock helps with anything here, but it'll
certainly simplify this logic as we then are guaranteed not to get
obsolete results from idr_find.
Unfortunately adding a lock will slow things down regardless of
correctness, so it might just make the race much harder to hit without
fixing it and we might not notice that, so it'd be good to understand
the race.
--
Dominique