Re: [PATCH] net/9p: fix infinite loop in p9_client_rpc on fatal signal

From: Dominique Martinet

Date: Sun Jun 21 2026 - 09:01:14 EST


Dominique Martinet wrote on Fri, Apr 17, 2026 at 07:52:52AM +0900:
> > While the ideal long-term goal is the asynchronous implementation (as
> > seen in your 9p-async-v2 branch [2]), this patch serves as a reliable
> > intermediate solution for a critical regression.
> > [2] https://github.com/martinetd/linux/commits/9p-async-v2
>
> iirc one of the problem with the async branch is that the process would
> quit immediately on, say, ^C, before the IO has completed, but it's
> possible for the server to process the IO (and not the flush) afterwards
> and you'd get something that's not supposed to happen e.g.
>
> p1 p2
>
> write(1)
> ^C/sigkill
> flush sent but process exit without waiting for server ack
> 1 not written yet
> write(2) in same spot
> write(2) done
> write(1) completes
> data isn't 2 as expected after p2 completed
>
>
> So it's quite possible async isn't the way to go, but that there is no
> good solution for this
> (given this is true even without async on sigkill: if we have something
> that works safely, there's no reason to wait only for non-fatal signals...)

Sorry to come back to this after two months but I'm still a bit worried
about this patch, and just came back to it as I'm about to send the PR
to Linus...
And I'm still thinking about the problem above, or rather possible
variants involving cache (e.g. write going through the server, but
client believing it didn't because the response didn't make it in time)

.. But the thing is, I couldn't actually hit the `if
(fatal_signal_pending(current))` you added (adding some print
statement):
- if cache is enabled, the actual I/Os are done by the vfs in the
background, so any kill to user processes won't have any impact (and
thus I guess my main worry about cache is alleviated there)
- with cache=none I'm not sure why I can't hit it, I tried with an
external server, breaking on the write() call while running dd, and
killing dd with SIGKILL a few times but that doesn't appear to be
enough? (task still stuck in write > rpc > flush > rpc, but it doesn't
appear to ever get out of io_wait_event_killable() even when I hammer it
with more signals?)

So, given that my worry with cache is irrelevant (runs in background &
won't ever hit this), I can't seem to hit this with what I consider
to be normal workloads, and assuming it does fix your problems given you
were able to test it... I'll leave it in and send to Linus now but I'd
appreciate clarifications on how to test this more thoroughly as time
permits...
(I honestly probably should drop the patch at this point, but it'll
still be time to revert if I figure something out in the next few weeks
given it's been in -next for almost 2 months already)

Thanks,
--
Dominique Martinet