Re: [RFC v2 PATCH] fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE

From: Abhijith Das
Date: Wed Dec 16 2015 - 10:55:35 EST




----- Original Message -----
> From: "Jan Kara" <jack@xxxxxxx>
> To: "Abhi Das" <adas@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, "Bob Peterson" <rpeterso@xxxxxxxxxx>
> Sent: Wednesday, December 16, 2015 9:34:04 AM
> Subject: Re: [RFC v2 PATCH] fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
>
> On Tue 15-12-15 09:43:25, Abhi Das wrote:
> > During testing, I discovered that __generic_file_splice_read() returns
> > 0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
> > page of a single/multi-page splice read operation. This EOF return code
> > causes the userspace test to (correctly) report a zero-length read error
> > when it was expecting otherwise.
> >
> > The current strategy of returning a partial non-zero read when ->readpage
> > returns AOP_TRUNCATED_PAGE works only when the failed page is not the
> > first of the lot being processed.
> >
> > This patch attempts to retry lookup and call ->readpage again on pages
> > that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
> > tests pass and I haven't noticed any unwanted side effects.
> >
> > This version fixes a return code issue pointed out by Bob Peterson.
> >
> > Signed-off-by: Abhi Das <adas@xxxxxxxxxx>
> > Cc: Bob Peterson <rpeterso@xxxxxxxxxx>
> > ---
> > fs/splice.c | 13 +++++++------
> > 1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 801c21c..365cd2a 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -387,6 +387,7 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> > spd.nr_pages = 0;
> > for (page_nr = 0; page_nr < nr_pages; page_nr++) {
> > unsigned int this_len;
> > + int retries = 0;
> >
> > if (!len)
> > break;
> > @@ -415,6 +416,7 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> > */
> > if (!page->mapping) {
> > unlock_page(page);
> > +retry_lookup:
> > page = find_or_create_page(mapping, index,
> > mapping_gfp_mask(mapping));
> >
> > @@ -439,14 +441,13 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> > error = mapping->a_ops->readpage(in, page);
> > if (unlikely(error)) {
> > /*
> > - * We really should re-lookup the page here,
> > - * but it complicates things a lot. Instead
> > - * lets just do what we already stored, and
> > - * we'll get it the next time we are called.
> > + * Re-lookup the page
> > */
> > - if (error == AOP_TRUNCATED_PAGE)
> > + if (error == AOP_TRUNCATED_PAGE) {
> > error = 0;
> > -
> > + if (retries++ < 3)
> > + goto retry_lookup;
> > + }
>
> I don't like this retry-three-times loop. That is still leaving the
> possibility of 0 return just much less likely (so it will lead to even
> weirder and harded to debug failures). IMO we should just terminate the
> loop like we did previously if spd.nr_pages > 0 and we retry indefinitely
> if it is the first page that failed to read.
>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR
>

The 3-retry loop was the thing I was unsure about too. With regards to the
indefinite retry, I was wondering if there's some corner case where we might
get into an infinite retry loop...

If we're doing the retry for the first page, why not for other pages too? Is
it because we'd potentially be increasing the odds for an infinite loop and/or
affecting performance by doing more lookups?

Cheers!
--Abhi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/