Re: [PATCH] [13/16] HWPOISON: The high level memory error handlerin the VM v3

From: Wu Fengguang
Date: Tue Jun 02 2009 - 09:33:10 EST


On Tue, Jun 02, 2009 at 08:37:20PM +0800, Nick Piggin wrote:
> On Tue, Jun 02, 2009 at 02:34:50PM +0200, Andi Kleen wrote:
> > On Tue, Jun 02, 2009 at 02:10:31PM +0200, Nick Piggin wrote:
> > > > It's not, there are various differences (like the reference count)
> > >
> > > No. If there are, then it *really* needs better documentation. I
> > > don't think there are, though.
> >
> > Better documentation on what? You want a detailed listing in a comment
> > how it is different from truncate?
> >
> > To be honest I have some doubts of the usefulness of such a comment
> > (why stop at truncate and not list the differences to every other
> > page cache operation? @) but if you're insist (do you?) I can add one.
>
> Because I don't see any difference (see my previous patch). I
> still don't know what it is supposed to be doing differently.
> So if you reinvent your own that looks close enough to truncate
> to warrant a comment to say /* this is close to truncate but
> not quite */, then yes I insist that you say exactly why it is
> not quite like truncate ;)

The truncate topic goes boring. EIO is more interesting and imminent, hehe.

> > > I'm suggesting that EIO is traditionally for when the data still
> > > dirty in pagecache and was not able to get back to backing
> > > store. Do you deny that?
> >
> > Yes. That is exactly the case when memory-failure triggers EIO
> >
> > Memory error on a dirty file mapped page.
>
> But it is no longer dirty, and the problem was not that the data
> was unable to be written back.

Or rather, cannot be written back ;)

> > > And I think the application might try to handle the case of a
> > > page becoming corrupted differently. Do you deny that?
> >
> > You mean a clean file-mapped page? In this case there is no EIO,
> > memory-failure just drops the page and it is reloaded.
> >
> > If the page is dirty we trigger EIO which as you said above is the
> > right reaction.
>
> No I mean the difference between the case of dirty page unable to
> be written to backing sotre, and the case of dirty page becoming
> corrupted.

legacy EIO: may success on (do something then) retry?
hwpoison EIO: a permanent unrecoverable error

> > > OK, given the range of errors that APIs are defined to return,
> > > then maybe EIO is the best option. I don't suppose it is possible
> > > to expand them to return something else?
> >
> > Expand the syscalls to return other errnos on specific
> > kinds of IO error?
> >
> > Of course that's possible, but it has the problem that you
> > would need to fix all the applications that expect EIO for
> > IO error. The later I consider infeasible.
>
> They would presumably exit or do some default thing, which I
> think would be fine. Actually if your code catches them in the
> act of manipulating a corrupted page (ie. if it is mmapped),
> then it gets a SIGBUS.

That's OK. filemap_fault() returns VM_FAULT_SIGBUS for legacy EIO,
while hwpoison pages will return VM_FAULT_HWPOISON. Both kills the
application I guess?

read()/write() are the more interesting cases.

With read IO interception, the read() call will succeed.

The write() call have to be failed. But interestingly writes are
mostly delayed ones, and we have only one AS_EIO bit for the entire
file, which will be cleared after the EIO reporting. And the poisoned
page will be isolated (if succeed) and later read()/write() calls
won't even notice there was a poisoned page!

How are we going to fix this mess? EIO errors seem to be fuzzy and
temporary by nature at least in the current implementation, and hard
to be improved to be exact and/or permanent in both implementation and
interface:
- can/shall we remember the exact EIO page? maybe not.
- can EIO reporting be permanent? sounds like a horrible user interface..


Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/