Re: [PATCH v2 0/2] mm: soft-offline: fix race against page allocation

From: Michal Hocko
Date: Wed Aug 22 2018 - 04:00:33 EST


On Wed 22-08-18 01:37:48, Naoya Horiguchi wrote:
> On Wed, Aug 15, 2018 at 03:43:34PM -0700, Andrew Morton wrote:
> > On Tue, 17 Jul 2018 14:32:30 +0900 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote:
> >
> > > I've updated the patchset based on feedbacks:
> > >
> > > - updated comments (from Andrew),
> > > - moved calling set_hwpoison_free_buddy_page() from mm/migrate.c to mm/memory-failure.c,
> > > which is necessary to check the return code of set_hwpoison_free_buddy_page(),
> > > - lkp bot reported a build error when only 1/2 is applied.
> > >
> > > > mm/memory-failure.c: In function 'soft_offline_huge_page':
> > > > >> mm/memory-failure.c:1610:8: error: implicit declaration of function
> > > > 'set_hwpoison_free_buddy_page'; did you mean 'is_free_buddy_page'?
> > > > [-Werror=implicit-function-declaration]
> > > > if (set_hwpoison_free_buddy_page(page))
> > > > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > is_free_buddy_page
> > > > cc1: some warnings being treated as errors
> > >
> > > set_hwpoison_free_buddy_page() is defined in 2/2, so we can't use it
> > > in 1/2. Simply doing s/set_hwpoison_free_buddy_page/!TestSetPageHWPoison/
> > > will fix this.
> > >
> > > v1: https://lkml.org/lkml/2018/7/12/968
> > >
> >
> > Quite a bit of discussion on these two, but no actual acks or
> > review-by's?
>
> Really sorry for late response.
> Xishi provided feedback on previous version, but no final ack/reviewed-by.
> This fix should work on the reported issue, but rewriting soft-offlining
> without PageHWPoison flag would be the better fix (no actual patch yet.)

If we can go with the later the I would obviously prefer that. I cannot
promise to work on the patch though. I can help with reviewing of
course.

If this is important enough that people are hitting the issue in normal
workloads then sure, let's go with the simple fix and continue on top of
that.
--
Michal Hocko
SUSE Labs