Re: [PATCH 4/5] 9p: introduce async read requests
From: Stefano Stabellini
Date: Mon Dec 12 2016 - 20:16:11 EST
Hi Al,
thanks for looking at the patch.
On Sat, 10 Dec 2016, Al Viro wrote:
> On Thu, Dec 08, 2016 at 12:59:05PM -0800, Stefano Stabellini wrote:
>
>
> > + } else {
> > + req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
> > + if (IS_ERR(req)) {
> > + *err = PTR_ERR(req);
> > + break;
> > + }
> > + req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec,
> > + (size_t)rsize, &req->offset);
> > + req->kiocb = iocb;
> > + for (i = 0; i < req->rsize; i += PAGE_SIZE)
> > + page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]);
> > + req->callback = p9_client_read_complete;
> > +
> > + *err = clnt->trans_mod->request(clnt, req);
> > + if (*err < 0) {
> > + clnt->status = Disconnected;
> > + release_pages(req->pagevec,
> > + (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE,
> > + true);
> > + kvfree(req->pagevec);
> > + p9_free_req(clnt, req);
> > + break;
> > + }
> > +
> > + *err = -EIOCBQUEUED;
>
> IDGI. AFAICS, your code will result in shitloads of short reads - every
> time when you give it a multi-iovec array, only the first one will be
> issued and the rest won't be even looked at. Sure, it is technically
> legal, but I very much doubt that aio users will be happy with that.
>
> What am I missing here?
There is a problem with short reads, but I don't think it is the one
you described, unless I made a mistake somewhere.
The code above will issue one request as big as possible (size is
limited by clnt->msize, which is the size of the channel). No matter how
many segs in the iov_iter, the request will cover the maximum amount of
bytes allowed by the channel, typically 4K but can be larger. So
multi-iovec arrays should be handled correctly up to msize. Please let
me know if I am wrong, I am not an expert on this. I tried, but couldn't
actually get any multi-iovec arrays in my tests.
That said, reading the code again, there are indeed two possible causes
of short reads. The first one are responses which have a smaller byte
count than requested. Currently this case is not handled, forwarding
the short read up to the user. But I wrote and tested successfully a
patch that issues another follow-up request from the completion
function. Example:
p9_client_read, issue request, size 4K
p9_client_read_complete, receive response, count 2K
p9_client_read_complete, issue request size 4K-2K = 2K
p9_client_read_complete, receive response size 2K
p9_client_read_complete, call ki_complete
The second possible cause of short reads are requests which are bigger
than msize. For example a 2MB read over a 4K channel. In this patch we
simply issue one request as big as msize, and eventually return to the
caller a smaller byte count. One option is to simply fall back to sync
IO in these cases. Another solution is to use the same technique
described above: we issue the first request as big as msize, then, from
the callback function, we issue a follow-up request, and again, and
again, until we fully complete the large read. This way, although we are
not issuing any simultaneous requests for a specific large read, at
least we can issue other aio requests in parallel for other reads or
writes. It should still be more performant than sync IO.
I am thinking of implementing this second option in the next version of
the series.