Re: [PATCH] mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations

From: Michal Hocko
Date: Thu Jul 02 2015 - 11:13:41 EST


On Thu 02-07-15 10:25:51, Theodore Ts'o wrote:
> On Wed, Jul 01, 2015 at 03:37:15PM +0200, Michal Hocko wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 37e90db1520b..6c44d424968e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -995,7 +995,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > goto keep_locked;
> >
> > /* Case 3 above */
> > - } else {
> > + } else if (sc->gfp_mask & __GFP_FS) {
> > wait_on_page_writeback(page);
> > }
> > }
>
> Um, I've just taken a closer look at this code now that I'm back from
> vacation, and I'm not sure this is right. This Case 3 code occurs
> inside an
>
> if (PageWriteback(page)) {
> ...
> }
>
> conditional, and if I'm not mistaken, if the flow of control exits
> this conditional, it is assumed that the page is *not* under writeback.
> This patch will assume the page has been cleaned if __GFP_FS is set,
> which could lead to a dirty page getting dropped, so I believe this is
> a bug. No?

Yes you are right! My bad. I should have noticed that. Sorry about that.

> It would seem to me that a better fix would be to change the Case 2
> handling:
>
> /* Case 2 above */
> } else if (global_reclaim(sc) ||
> - !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) {
> + !PageReclaim(page) || !(sc->gfp_mask & __GFP_FS)) {

OK, this should work because the loopback path should clear both
__GFP_IO and __GFP_FS. I would be tempted to use may_enter_fs here
as the original patch which introduced wait_on_page_writeback did
but this sounds more clear.

> /*
> * This is slightly racy - end_page_writeback()
> * might have just cleared PageReclaim, then
> * setting PageReclaim here end up interpreted
> * as PageReadahead - but that does not matter
> * enough to care. What we do want is for this
> * page to have PageReclaim set next time memcg
> * reclaim reaches the tests above, so it will
> * then wait_on_page_writeback() to avoid OOM;
> * and it's also appropriate in global reclaim.
> */
> SetPageReclaim(page);
> nr_writeback++;
>
> goto keep_locked;
>
>
> Am I missing something?

You are not missing anything and thanks for the double checking. This
was wery well spotted!
The updated patch with the full changelog:
---