Re: [PATCH 09/22] HWPOISON: Handle hardware poisoned pages intry_to_unmap
From: Wu Fengguang
Date: Wed Jun 17 2009 - 03:23:33 EST
On Wed, Jun 17, 2009 at 08:28:26AM +0800, Minchan Kim wrote:
> On Tue, 16 Jun 2009 21:49:44 +0800
> Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
>
> > On Tue, Jun 16, 2009 at 08:03:08AM +0800, Minchan Kim wrote:
> > > On Mon, 15 Jun 2009 23:26:12 +0800
> > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> > >
> > > > On Mon, Jun 15, 2009 at 09:09:03PM +0800, Minchan Kim wrote:
> > > > > On Mon, Jun 15, 2009 at 11:45 AM, Wu Fengguang<fengguang.wu@xxxxxxxxx> wrote:
> > > > > > From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> > > > > >
> > > > > > When a page has the poison bit set replace the PTE with a poison entry.
> > > > > > This causes the right error handling to be done later when a process runs
> > > > > > into it.
> > > > > >
> > > > > > Also add a new flag to not do that (needed for the memory-failure handler
> > > > > > later)
> > > > > >
> > > > > > Reviewed-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> > > > > > Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> > > > > >
> > > > > > ---
> > > > > > Âinclude/linux/rmap.h | Â Â1 +
> > > > > > Âmm/rmap.c      Â|  Â9 ++++++++-
> > > > > > Â2 files changed, 9 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > --- sound-2.6.orig/mm/rmap.c
> > > > > > +++ sound-2.6/mm/rmap.c
> > > > > > @@ -958,7 +958,14 @@ static int try_to_unmap_one(struct page
> > > > > > Â Â Â Â/* Update high watermark before we lower rss */
> > > > > > Â Â Â Âupdate_hiwater_rss(mm);
> > > > > >
> > > > > > - Â Â Â if (PageAnon(page)) {
> > > > > > + Â Â Â if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
> > > > > > + Â Â Â Â Â Â Â if (PageAnon(page))
> > > > > > + Â Â Â Â Â Â Â Â Â Â Â dec_mm_counter(mm, anon_rss);
> > > > > > + Â Â Â Â Â Â Â else if (!is_migration_entry(pte_to_swp_entry(*pte)))
> > > > >
> > > > > Isn't it straightforward to use !is_hwpoison_entry ?
> > > >
> > > > Good catch! It looks like a redundant check: the
> > > > page_check_address() at the beginning of the function guarantees that
> > > > !is_migration_entry() or !is_migration_entry() tests will all be TRUE.
> > > > So let's do this?
> > > It seems you expand my sight :)
> > >
> > > I don't know migration well.
> > > How page_check_address guarantee it's not migration entry ?
> >
> > page_check_address() calls pte_present() which returns the
> > (_PAGE_PRESENT | _PAGE_PROTNONE) bits. While x86-64 defines
> >
> > #define __swp_entry(type, offset) ((swp_entry_t) { \
> > ((type) << (_PAGE_BIT_PRESENT + 1)) \
> > | ((offset) << SWP_OFFSET_SHIFT) })
> >
> > where SWP_OFFSET_SHIFT is defined to the bigger one of
> > max(_PAGE_BIT_PROTNONE + 1, _PAGE_BIT_FILE + 1) = max(8+1, 6+1) = 9.
> >
> > So __swp_entry(type, offset) := (type << 1) | (offset << 9)
> >
> > We know that the swap type is 5 bits. So the bit 0 _PAGE_PRESENT and bit 8
> > _PAGE_PROTNONE will all be zero for swap entries.
> >
>
> Thanks for kind explanation :)
You are welcome~
> >
> > > In addtion, If the page is poison while we are going to
> > > migration((PAGE_MIGRATION && migration) == TRUE), we should decrease
> > > file_rss ?
> >
> > It will die on trying to migrate the poisoned page so we don't care
> > the accounting. But normally the poisoned page shall already be
>
>
> Okay. then, how about this ?
> We should not increase file_rss on trying to migrate the poisoned page
>
> - Â Â Â Â Â Â Â else if (!is_migration_entry(pte_to_swp_entry(*pte)))
> + Â Â Â Â Â Â Â else if (!(PAGE_MIGRATION && migration))
This is good if we are going to stop the hwpoison page from being
consumed by move_to_new_page(), but I highly doubt we'll ever add
PageHWPoison() checks into the migration code.
Because this race window is small enough:
TestSetPageHWPoison(p);
lock_page(page);
try_to_unmap(page, TTU_MIGRATION|...);
lock_page_nosync(p);
such small race windows can be found all over the kernel, it's just
insane to try to fix any of them.
For example, if the newly allocated page get corrupted, this kind of code who
assumes it is the only user of the page (but memory_failure() comes in between
like a ghost) will go BUG():
/*
* Block others from accessing the page when we get around to
* establishing additional references. We are the only one
* holding a reference to the new page at this point.
*/
if (!trylock_page(newpage))
BUG();
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/