Re: [PATCH 2/4] direct-io: make O_DIRECT IO path be page based

From: Jens Axboe
Date: Mon Aug 24 2009 - 04:00:57 EST


On Thu, Aug 20 2009, Andrew Morton wrote:
> On Thu, 20 Aug 2009 12:17:38 +0200
> Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
>
> > Currently we pass in the iovec array and let the O_DIRECT core
> > handle the get_user_pages() business. This work, but it means that
> > we can ever only use user pages for O_DIRECT.
> >
> > Switch the aops->direct_IO() and below code to use page arrays
> > instead, so that it doesn't make any assumptions about who the pages
> > belong to. This works directly for all users but NFS, which just
> > uses the same helper that the generic mapping read/write functions
> > also call.
> >
>
> This:
>
> > + struct page **pages; /* page buffer */
> > + unsigned int head_page; /* next page to process */
> > + unsigned int total_pages; /* last valid page + 1 */
> > + unsigned int first_page_off; /* offset into first page in map */
>
> Is a disaster waiting to happen.

It was a disaster already, there's actually not a lot of change. So I do
agree that that the hole page/cur_page/head_page needs cleaning up.

> > static struct page *dio_get_page(struct dio *dio)
> > {
> > - if (dio_pages_present(dio) == 0) {
> > - int ret;
> > + if (dio->head_page < dio->total_pages)
> > + return dio->pages[dio->head_page++];
> >
> > - ret = dio_refill_pages(dio);
> > - if (ret)
> > - return ERR_PTR(ret);
> > - BUG_ON(dio_pages_present(dio) == 0);
> > - }
> > - return dio->pages[dio->head++];
> > + return NULL;
> > }
> >
> > ...
> >
> > @@ -351,8 +280,10 @@ static void dio_bio_submit(struct dio *dio)
> > */
> > static void dio_cleanup(struct dio *dio)
> > {
> > - while (dio_pages_present(dio))
> > - page_cache_release(dio_get_page(dio));
> > + struct page *page;
> > +
> > + while ((page = dio_get_page(dio)) != NULL)
> > + page_cache_release(page);
> > }
>
> What is the value of dio->head_page here?
>
> How are callers to use this function at the start of the operation, at
> the end and partway through is IO error, EFAULT etc is detected?
>
> There's good potential for double-releases, leaks etc here. We need to
> be very careful and explicit in documenting the usage and state and
> meaning of head_page, and the protocol around managing it.
>
> It's just like that ghastly bio.bi_idx thing - how many bugs did we add
> and have to fix due to that thing?
>
> Really I think it'd be better to just nuke .head_page altogether and get
> callers to manage their index openly.

Completely agree, I'll attempt to sanitize it. Was postponed due to the
messy nature of it, but with that done it should be easier to verify
that it does the right thing.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/