Re: [PATCH 09/22] HWPOISON: Handle hardware poisoned pages intry_to_unmap

From: Wu Fengguang
Date: Wed Jun 17 2009 - 10:04:18 EST


On Wed, Jun 17, 2009 at 09:43:29PM +0800, Minchan Kim wrote:
> On Wed, Jun 17, 2009 at 10:37 PM, Wu Fengguang<fengguang.wu@xxxxxxxxx> wrote:
> > On Wed, Jun 17, 2009 at 09:27:36PM +0800, Minchan Kim wrote:
> >> On Wed, Jun 17, 2009 at 4:23 PM, Wu Fengguang<fengguang.wu@xxxxxxxxx> wrote:
> >> > On Wed, Jun 17, 2009 at 08:28:26AM +0800, Minchan Kim wrote:
> >> >> On Tue, 16 Jun 2009 21:49:44 +0800
> >> >> Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> >> >>
> >> >> > On Tue, Jun 16, 2009 at 08:03:08AM +0800, Minchan Kim wrote:
> >> >> > > On Mon, 15 Jun 2009 23:26:12 +0800
> >> >> > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> >> >> > >
> >> >> > > > On Mon, Jun 15, 2009 at 09:09:03PM +0800, Minchan Kim wrote:
> >> >> > > > > On Mon, Jun 15, 2009 at 11:45 AM, Wu Fengguang<fengguang.wu@xxxxxxxxx> wrote:
> >> >> > > > > > From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> >> >> > > > > >
> >> >> > > > > > When a page has the poison bit set replace the PTE with a poison entry.
> >> >> > > > > > This causes the right error handling to be done later when a process runs
> >> >> > > > > > into it.
> >> >> > > > > >
> >> >> > > > > > Also add a new flag to not do that (needed for the memory-failure handler
> >> >> > > > > > later)
> >> >> > > > > >
> >> >> > > > > > Reviewed-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> >> >> > > > > > Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> >> >> > > > > >
> >> >> > > > > > ---
> >> >> > > > > > Âinclude/linux/rmap.h | Â Â1 +
> >> >> > > > > > Âmm/rmap.c      Â|  Â9 ++++++++-
> >> >> > > > > > Â2 files changed, 9 insertions(+), 1 deletion(-)
> >> >> > > > > >
> >> >> > > > > > --- sound-2.6.orig/mm/rmap.c
> >> >> > > > > > +++ sound-2.6/mm/rmap.c
> >> >> > > > > > @@ -958,7 +958,14 @@ static int try_to_unmap_one(struct page
> >> >> > > > > > Â Â Â Â/* Update high watermark before we lower rss */
> >> >> > > > > > Â Â Â Âupdate_hiwater_rss(mm);
> >> >> > > > > >
> >> >> > > > > > - Â Â Â if (PageAnon(page)) {
> >> >> > > > > > + Â Â Â if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
> >> >> > > > > > + Â Â Â Â Â Â Â if (PageAnon(page))
> >> >> > > > > > + Â Â Â Â Â Â Â Â Â Â Â dec_mm_counter(mm, anon_rss);
> >> >> > > > > > + Â Â Â Â Â Â Â else if (!is_migration_entry(pte_to_swp_entry(*pte)))
> >> >> > > > >
> >> >> > > > > Isn't it straightforward to use !is_hwpoison_entry ?
> >> >> > > >
> >> >> > > > Good catch! ÂIt looks like a redundant check: the
> >> >> > > > page_check_address() at the beginning of the function guarantees that
> >> >> > > > !is_migration_entry() or !is_migration_entry() tests will all be TRUE.
> >> >> > > > So let's do this?
> >> >> > > It seems you expand my sight :)
> >> >> > >
> >> >> > > I don't know migration well.
> >> >> > > How page_check_address guarantee it's not migration entry ?
> >> >> >
> >> >> > page_check_address() calls pte_present() which returns the
> >> >> > (_PAGE_PRESENT | _PAGE_PROTNONE) bits. While x86-64 defines
> >> >> >
> >> >> > #define __swp_entry(type, offset) Â Â Â ((swp_entry_t) { \
> >> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â((type) << (_PAGE_BIT_PRESENT + 1)) \
> >> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â| ((offset) << SWP_OFFSET_SHIFT) })
> >> >> >
> >> >> > where SWP_OFFSET_SHIFT is defined to the bigger one of
> >> >> > max(_PAGE_BIT_PROTNONE + 1, _PAGE_BIT_FILE + 1) = max(8+1, 6+1) = 9.
> >> >> >
> >> >> > So __swp_entry(type, offset) := (type << 1) | (offset << 9)
> >> >> >
> >> >> > We know that the swap type is 5 bits. So the bit 0 _PAGE_PRESENT and bit 8
> >> >> > _PAGE_PROTNONE will all be zero for swap entries.
> >> >> >
> >> >>
> >> >> Thanks for kind explanation :)
> >> >
> >> > You are welcome~
> >> >
> >> >> >
> >> >> > > In addtion, If the page is poison while we are going to
> >> >> > > migration((PAGE_MIGRATION && migration) == TRUE), we should decrease
> >> >> > > file_rss ?
> >> >> >
> >> >> > It will die on trying to migrate the poisoned page so we don't care
> >> >> > the accounting. But normally the poisoned page shall already be
> >> >>
> >> >>
> >> >> Okay. then, how about this ?
> >> >> We should not increase file_rss on trying to migrate the poisoned page
> >> >>
> >> >> - Â Â Â Â Â Â Â else if (!is_migration_entry(pte_to_swp_entry(*pte)))
> >> >> + Â Â Â Â Â Â Â else if (!(PAGE_MIGRATION && migration))
> >> >
> >> > This is good if we are going to stop the hwpoison page from being
> >> > consumed by move_to_new_page(), but I highly doubt we'll ever add
> >> > PageHWPoison() checks into the migration code.
> >> >
> >> > Because this race window is small enough:
> >> >
> >> > Â Â Â ÂTestSetPageHWPoison(p);
> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â lock_page(page);
> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â try_to_unmap(page, TTU_MIGRATION|...);
> >> > Â Â Â Âlock_page_nosync(p);
> >> >
> >> > such small race windows can be found all over the kernel, it's just
> >> > insane to try to fix any of them.
> >>
> >> Sorry for too late response.
> >>
> >> I see your point.
> >> My opinion is that at least we must be notified when such situation happen.
> >> So I think it would be better to add some warning to fix up it when it
> >> happen even thought Âit is small race window.
> >
> > Notification is also pointless here: we'll die hard on
> > accessing/consuming the poisoned page anyway :(
>
> My intention wasn't to recover it.

Yes, that's not the point.

> It just add something like WARN_ON.
> You said it is small window enough. but I think it can happen more
> hight probability in migration-workload.(At a moment, I don't know
> what kinds of app)
> For such case, If we can hear reporting of warning, at that time we
> can consider migration handling for HWPoison.

The point is, any page can go corrupted any time. We don't need to add
1000 PageHWPoison() tests in the kernel like this. We don't aim for
100% protection, that's impossible. I'd be very contented if ever it
can reach 80% coverage :)

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/