Re: [V9fs-developer] 9pfs hangs since 4.7

From: Al Viro
Date: Sat Jan 07 2017 - 01:27:23 EST


On Fri, Jan 06, 2017 at 02:52:35PM +0100, Greg Kurz wrote:

> Looking at the tag numbers, I think we're hitting the hardcoded limit of 128
> simultaneous requests in QEMU (which doesn't produce any error, new requests
> are silently dropped).
>
> Tuomas, can you change MAX_REQ to some higher value (< 65535 since tag is
> 2-byte and 0xffff is reserved) to confirm ?

Huh?

Just how is a client supposed to cope with that behaviour? 9P is not
SunRPC - there's a reason why it doesn't live on top of UDP. Sure, it's
datagram-oriented, but it really wants reliable transport...

Setting the ring size at MAX_REQ is fine; that'll give you ENOSPC on
attempt to put a request there, and p9_virtio_request() will wait for
things to clear, but if you've accepted a request, that's bloody it -
you really should go and handle it.

How does it happen, anyway? qemu-side, I mean... Does it move the buffer
to used ring as soon as it has fetched the request? AFAICS, it doesn't -
virtqueue_push() is called just before pdu_free(); we might get complications
in case of TFLUSH handling (queue with MAX_REQ-1 requests submitted, TFLUSH
arrives, cancel_pdu is found and ->cancelled is set on it, then v9fs_flush()
waits for it to complete. Once the damn thing is done, buffer is released by
virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken
up. In the meanwhile, another request arrives into the slot of freed by
that virtqueue_push() and we are out of pdus.

So it could happen, and the things might get unpleasant to some extent, but...
no TFLUSH had been present in all that traffic. And none of the stuck
processes had been spinning in p9_virtio_request(), so they *did* find
ring slots...