Re: [PATCH, RFC/T] Fix handling of write failures to swap devices

From: Nick Piggin
Date: Wed Nov 01 2006 - 00:38:27 EST


Richard Purdie wrote:
On Sat, 2006-10-28 at 22:10 +1000, Nick Piggin wrote:

That isn't your only problem though, and we simply don't want to do
this (potentially expensive) unusing from interrupt context. Noting
the error and dealing with it in process context I think is the best
way to do it.


The reasoning was that this circumstance should be extremely rare. If it
happens, we have a hardware problem. Recovering from that hardware
problem gracefully is more important than a slightly longer interrupt.
But yes, process context would be nicer, *if* we can find a way to do
it.

Also note that the current code *should* "gracefully" handle the failure.
In that, it will not reclaim the page on a write error, so it isn't going
to cause a data loss...

It's just that it currently results in unswappable pages.

Handling it more gracefully by allowing the page to be retried with another
swap entry is OK I guess, but given the added complexity, I'm not even sure
it is worthwhile.

Perhaps we should just do the ClearPageError in the try_to_unuse path,
because the sysadmin should take down that swap device on failure. So if a
new device is added, we want to be able to unpin the failed pages.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/