[34-longterm 050/209] md/raid1: really fix recovery looping when single good device fails.

From: Paul Gortmaker
Date: Thu Apr 14 2011 - 14:29:45 EST

Next message: Paul Gortmaker: "[34-longterm 039/209] ssb: b43-pci-bridge: Add new vendor for BCM4318"
Previous message: Paul Gortmaker: "[34-longterm 046/209] net: NETIF_F_HW_CSUM does not imply FCoE CRC offload"
In reply to: Paul Gortmaker: "[34-longterm 046/209] net: NETIF_F_HW_CSUM does not imply FCoE CRC offload"
Next in thread: Paul Gortmaker: "[34-longterm 039/209] ssb: b43-pci-bridge: Add new vendor for BCM4318"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: NeilBrown <neilb@xxxxxxx>

=====================================================================
| This is a commit scheduled for the next v2.6.34 longterm release. |
| If you see a problem with using this for longterm, please comment.|
=====================================================================

commit 8f9e0ee38f75d4740daa9e42c8af628d33d19a02 upstream.

Commit 4044ba58dd15cb01797c4fd034f39ef4a75f7cc3 supposedly fixed a
problem where if a raid1 with just one good device gets a read-error
during recovery, the recovery would abort and immediately restart in
an infinite loop.

However it depended on raid1_remove_disk removing the spare device
from the array. But that does not happen in this case. So add a test
so that in the 'recovery_disabled' case, the device will be removed.

This suitable for any kernel since 2.6.29 which is when
recovery_disabled was introduced.

Reported-by: Sebastian FÃrber <faerber@xxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>
Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>
---
drivers/md/raid1.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 52c6b5f..aaa49f1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1214,6 +1214,7 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
* is not possible.
*/
if (!test_bit(Faulty, &rdev->flags) &&
+ !mddev->recovery_disabled &&
mddev->degraded < conf->raid_disks) {
err = -EBUSY;
goto abort;
--
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Gortmaker: "[34-longterm 039/209] ssb: b43-pci-bridge: Add new vendor for BCM4318"
Previous message: Paul Gortmaker: "[34-longterm 046/209] net: NETIF_F_HW_CSUM does not imply FCoE CRC offload"
In reply to: Paul Gortmaker: "[34-longterm 046/209] net: NETIF_F_HW_CSUM does not imply FCoE CRC offload"
Next in thread: Paul Gortmaker: "[34-longterm 039/209] ssb: b43-pci-bridge: Add new vendor for BCM4318"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]