Re: [PATCH 0/7] HWPOISON for hugepage (v5)

From: Naoya Horiguchi
Date: Fri May 14 2010 - 03:39:16 EST


(Add Cc: Andi and Fengguang)

On Thu, May 13, 2010 at 03:27:50PM +0100, Mel Gorman wrote:
> On Thu, May 13, 2010 at 04:55:19PM +0900, Naoya Horiguchi wrote:
> > This patchset enables error handling for hugepage by containing error
> > in the affected hugepage.
> >
> > Until now, memory error (classified as SRAO in MCA language) on hugepage
>
> What does SRAO stand for? It doesn't matter, I'm just curious.

SRAO stands for "Software Recoverable Action Optional."
SRAO errors can be contained by software and then become harmless.

> > was simply ignored, which means if someone accesses the error page later,
> > the second MCE (severer than the first one) occurs and the system panics.
> >
> > It's useful for some aggressive hugepage users if only affected processes
> > are killed. Then other unrelated processes aren't disturbed by the error
> > and can continue operation.
> >
>
> Surely, it's useful for any user of huge pages?

Yes.

> > Moreover, for other extensive hugetlb users which have own "pagecache"
> > on hugepage, the most valued feature would be being able to receive
> > the early kill signal BUS_MCEERR_AO, because the cache pages have
> > good opportunity to be dropped without side effects on BUS_MCEERR_AO.
> >
>
> Be careful here. The page cache that hugetlb uses is for MAP_SHARED
> mappings. If the pages are discarded, they are gone and the result is data
> loss. I think what you are suggesting is that a kill signal can instead be
> translated into a harmless loss of page cache. That works for normal files
> but not hugetlb.

"Pagecache" I meant here is not the page cache in Linux kernel,
but a cache managed by an application, e.g. the application reads/writes
the cache contents with direct I/O and manages clean/dirty status itself.
If HWPOISON-aware application catches signal BUS_MCEERR_AO, it can discard
hugepage used as a cache and can reread from the file.

Thanks,
Naoya Horiguchi

> > The design of hugepage error handling is based on that of non-hugepage
> > error handling, where we:
> > 1. mark the error page as hwpoison,
> > 2. unmap the hwpoisoned page from processes using it,
> > 3. invalidate error page, and
> > 4. block later accesses to the hwpoisoned pages.
> >
> > Similarities and differences between huge and non-huge case are
> > summarized below:
> >
> > 1. (Difference) when error occurs on a hugepage, PG_hwpoison bits on all pages
> > in the hugepage are set, because we have no simple way to break up
> > hugepage into individual pages for now. This means there is a some
> > risk to be killed by touching non-guilty pages within the error hugepage.
> >
>
> You're right in that you cannot easily demote a hugepage. It is possible but
> I cannot see the value of the required effort. If there is an error within
> the hugepage and touching another part of it results in the process being
> unnecessarily killed, then so be it.
>
> > 2. (Similarity) hugetlb entry for the error hugepage is replaced by hwpoison
> > swap entry, with which we can detect hwpoisoned memory in VM code.
> > This is accomplished by adding rmapping code for hugepage, which enables
> > to use try_to_unmap() for hugepage.
> >
>
> This will be interesting. hugetlbfs pages could look like a file or anon
> depending on whether it has been mapped shared or private. It's odd.
>
> > 3. (Difference) since hugepage is not linked to LRU list and is unswappable,
> > there are not many things to do for page invalidation (only dequeuing
> > free/reserved hugepage from freelist. See patch 5/7.)
> > If we want to contain the error into one page, there may be more to do.
> >
> > 4. (Similarity) we block later accesses by forcing page requests for
> > hwpoisoned hugepage to fail as done in non-hugepage case in do_wp_page().
> >
> > ToDo:
> > - Narrow down the containment region into one raw page.
> > - Soft-offlining for hugepage is not supported due to the lack of migration
> > for hugepage.
> > - Counting file-mapped/anonymous hugepage in NR_FILE_MAPPED/NR_ANON_PAGES.
> >
> > [PATCH 1/7] hugetlb, rmap: add reverse mapping for hugepage
> > [PATCH 2/7] HWPOISON, hugetlb: enable error handling path for hugepage
> > [PATCH 3/7] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
> > [PATCH 4/7] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
> > [PATCH 5/7] HWPOISON, hugetlb: isolate corrupted hugepage
> > [PATCH 6/7] HWPOISON, hugetlb: detect hwpoison in hugetlb code
> > [PATCH 7/7] HWPOISON, hugetlb: support hwpoison injection for hugepage
> >
> > Dependency:
> > - patch 2 depends on patch 1.
> > - patch 3 to patch 6 depend on patch 2.
> >
> > include/linux/hugetlb.h | 3 +
> > mm/hugetlb.c | 98 ++++++++++++++++++++++++++++++++++++++-
> > mm/hwpoison-inject.c | 15 ++++--
> > mm/memory-failure.c | 120 +++++++++++++++++++++++++++++++++++------------
> > mm/rmap.c | 16 ++++++
> > 5 files changed, 215 insertions(+), 37 deletions(-)
> >
> > ChangeLog from v4:
> > - rebased to 2.6.34-rc7
> > - add isolation code for free/reserved hugepage in me_huge_page()
> > - set/clear PG_hwpoison bits of all pages in hugepage.
> > - mce_bad_pages counts all pages in hugepage.
> > - rename __hugepage_set_anon_rmap() to hugepage_add_anon_rmap()
> > - add huge_pte_offset() dummy function in header file on !CONFIG_HUGETLBFS
> >
> > ChangeLog from v3:
> > - rebased to 2.6.34-rc5
> > - support for privately mapped hugepage
> >
> > ChangeLog from v2:
> > - rebase to 2.6.34-rc3
> > - consider mapcount of hugepage
> > - rename pointer "head" into "hpage"
> >
> > ChangeLog from v1:
> > - rebase to 2.6.34-rc1
> > - add comment from Wu Fengguang
> >
> > Thanks,
> > Naoya Horiguchi
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> >
>
> --
> Mel Gorman
> Part-time Phd Student Linux Technology Center
> University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/