Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

From: Nick Piggin
Date: Tue Jun 02 2009 - 10:33:39 EST


On Tue, Jun 02, 2009 at 08:51:34PM +0800, Wu Fengguang wrote:
> On Tue, Jun 02, 2009 at 08:19:40PM +0800, Nick Piggin wrote:
> > On Tue, Jun 02, 2009 at 07:14:07PM +0800, Wu Fengguang wrote:
> > > On Mon, Jun 01, 2009 at 10:40:51PM +0800, Nick Piggin wrote:
> > > > But you just said that you try to intercept the IO. So the underlying
> > > > data is not necessarily corrupt. And even if it was then what if it
> > > > was reinitialized to something else in the meantime (such as filesystem
> > > > metadata blocks?) You'd just be introducing worse possibilities for
> > > > coruption.
> > >
> > > The IO interception will be based on PFN instead of file offset, so it
> > > won't affect innocent pages such as your example of reinitialized data.
> >
> > OK, if you could intercept the IO so it never happens at all, yes
> > of course that could work.
> >
> > > poisoned dirty page == corrupt data => process shall be killed
> > > poisoned clean page == recoverable data => process shall survive
> > >
> > > In the case of dirty hwpoison page, if we reload the on disk old data
> > > and let application proceed with it, it may lead to *silent* data
> > > corruption/inconsistency, because the application will first see v2
> > > then v1, which is illogical and hence may mess up its internal data
> > > structure.
> >
> > Right, but how do you prevent that? There is no way to reconstruct the
> > most updtodate data because it was destroyed.
>
> To kill the application ruthlessly, rather than allow it go rotten quietly.

Right, but you don't because you just do EIO in a lot of cases. See
EIO subthread.


> > > > You will need to demonstrate a *big* advantage before doing crazy things
> > > > with writeback ;)
> > >
> > > OK. We can do two things about poisoned writeback pages:
> > >
> > > 1) to stop IO for them, thus avoid corrupted data to hit disk and/or
> > > trigger further machine checks
> >
> > 1b) At which point, you invoke the end-io handlers, and the page is
> > no longer writeback.
> >
> > > 2) to isolate them from page cache, thus preventing possible
> > > references in the writeback time window
> >
> > And then this is possible because you aren't violating mm
> > assumptions due to 1b. This proceeds just as the existing
> > pagecache mce error handler case which exists now.
>
> Yeah that's a good scheme - we are talking about two interception
> scheme. Mine is passive one and yours is active one.

Oh, hmm, not quite. I had assumed your IO interception is based
on another MCE from DMA transfer (Andi said you get another exception
in that case).

If you are just hoping to get an MCE from CPU access in order to
intercept IO, then you may as well not bother because it is not
closing the window much (very likely that the page will never be
touched again by the CPU).

So if you can get an MCE from the DMA, then you would fail the
request, which will automatically clear writeback, so your CPU MCE
handler never has to bother with writeback pages.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/