Re: 9pfs hangs since 4.7

From: Al Viro
Date: Wed Jan 04 2017 - 19:09:43 EST


> Here's logs that should be complete this time:
>
> https://gist.githubusercontent.com/dezgeg/08629d4c8ca79da794bc087e5951e518/raw/a1a82b9bc24e5282c82beb43a9dc91974ffcf75a/9p.qemu.log
> https://gist.githubusercontent.com/dezgeg/1d5f1cc0647e336c59989fc62780eb2e/raw/d7623755a895b0441c608ddba366d20bbf47ab0b/9p.dmesg.log

Fun. All requests prior to
[ 360.110282] dd-1899 1.... 18497262us : 9p_client_req: client 18446612134390128640 request P9_TWALK tag 25
line in dmesg had been seen by the servers; all requests starting with that
one had not. Replies to earlier requests kept arriving just fine.

>From the server side, everything looks nice and sane - it has processed
all requests it had seen, and aside of slight difference in the arrival
order server-side and client-side logs match, except for the last 26
requests the client claims to have sent and the server has never seen.

All traffic for another client (there had been less of it) has ceased long
before that point, so we can't really tell if it's just this client that
got buggered. Interesting...

Looking at the tracepoints, those requests got through p9_client_prepare_req();
we have no idea whether it got through p9_virtio_request(). OTOH, AFAICS
nothing had been sleeping in there...

FWIW, it might be interesting to try
WARN_ON(!virtqueue_kick(chan->vq));
in p9_virtio_request() (instead of blind virtqueue_kick(chan->vq)) and see
if it triggers. Incidentally, it looks like p9_virtio_request() ought to
return an error if that happens...