Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
From: Naoya Horiguchi
Date: Tue Apr 25 2017 - 23:19:59 EST
On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > this should be skipped when onlining the pages too.
> >
> > Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx>
> > ---
> > mm/memory_hotplug.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 6fa7208bcd56..741ddb50e7d2 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > if (PageReserved(pfn_to_page(start_pfn)))
> > for (i = 0; i < nr_pages; i++) {
> > page = pfn_to_page(start_pfn + i);
> > + if (PageHWPoison(page)) {
> > + ClearPageReserved(page);
>
> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> was never offlined to begin with? Or do you expect this to be set on newly
> hotplugged memory? Also don't we need to skip the entire pageblock?
If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
that we skip the page status check for hwpoisoned pages *not* to prevent
memory offlining for memblocks with hwpoisoned pages. That means that
hwpoisoned pages can be offlined.
And another reason to clear PageReserved is that we could reuse the
hwpoisoned page after onlining back with replacing the broken DIMM.
In this usecase, we first do unpoisoning to clear PageHWPoison,
but it doesn't work if PageReserved is set. My simple testing shows
the BUG below in unpoisoning (without the ClearPageReserved):
Unpoison: Software-unpoisoned page 0x18000
BUG: Bad page state in process page-types pfn:18000
page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599
flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x800(reserved)
Thanks,
Naoya Horiguchi