Re: [PATCH v5 08/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages

From: Naoya Horiguchi
Date: Tue Jul 17 2018 - 02:37:49 EST


On Fri, Jul 13, 2018 at 05:28:05PM -0700, Dan Williams wrote:
> On Fri, Jul 13, 2018 at 1:52 AM, Naoya Horiguchi
> <n-horiguchi@xxxxxxxxxxxxx> wrote:
> > On Wed, Jul 04, 2018 at 02:41:06PM -0700, Dan Williams wrote:
...
> >> +
> >> + /*
> >> + * Use this flag as an indication that the dax page has been
> >> + * remapped UC to prevent speculative consumption of poison.
> >> + */
> >> + SetPageHWPoison(page);
> >
> > The number of hwpoison pages is maintained by num_poisoned_pages,
> > so you can call num_poisoned_pages_inc()?
>
> I don't think we want these pages accounted in num_poisoned_pages().
> We have the badblocks infrastructure in libnvdimm to track how many
> errors and where they are located, and since they can be repaired via
> driver actions I think we should track them separately.

OK.

> > Related to this, I'm interested in whether/how unpoison_page() works
> > on a hwpoisoned dev_pagemap page.
>
> unpoison_page() is only triggered via freeing pages to the page
> allocator, and that never happens for dev_pagemap / ZONE_DEVICE pages.

sorry, my bad comment.
I meant unpoison_memory() in mm/memory-failure.c, which is triggered
via debugfs:hwpoison/unpoison-pfn. This interface looks like below

int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

if (!pfn_valid(pfn))
return -ENXIO;

p = pfn_to_page(pfn);
page = compound_head(p);

if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
return 0;
}
...

so I think that we can add is_zone_device_page() check at the beginning
of this function to call hwpoison_clear() introduced in patch 13/13?
Otherwise maybe compound_head() will cause some critical issue like
general protection fault.

Thanks,
Naoya Horiguchi