[34-longterm 091/179] md: avoid endless recovery loop when waiting for fail device to complete.

From: Paul Gortmaker
Date: Mon May 14 2012 - 22:53:37 EST

From: NeilBrown <neilb@xxxxxxx>

This is a commit scheduled for the next v2.6.34 longterm release.
If you see a problem with using this for longterm, please comment.

commit 4274215d24633df7302069e51426659d4759c5ed upstream.

If a device fails in a way that causes pending request to take a while
to complete, md will not be able to immediately remove it from the
array in remove_and_add_spares.
It will then incorrectly look like a spare device and md will try to
recover it even though it is failed.
This leads to a recovery process starting and instantly aborting over
and over again.

We should check if the device is faulty before considering it to be a
spare. This will avoid trying to start a recovery that cannot

This bug was introduced in 2.6.26 so that patch is suitable for any
kernel since then.

Reported-by: Jim Paradis <james.paradis@xxxxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>
Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>
drivers/md/md.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1287b03..d26df7f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6863,6 +6863,7 @@ static int remove_and_add_spares(mddev_t *mddev)
list_for_each_entry(rdev, &mddev->disks, same_set) {
if (rdev->raid_disk >= 0 &&
!test_bit(In_sync, &rdev->flags) &&
+ !test_bit(Faulty, &rdev->flags) &&
!test_bit(Blocked, &rdev->flags))
if (rdev->raid_disk < 0

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/