raid5 multi-drive-failure and recovery?

From: Dan Merillat (harik@chaos.ao.net)
Date: Thu Dec 19 2002 - 19:49:05 EST

Next message: Hanna Linder: "Re: Dedicated kernel bug database"
Previous message: Rusty Russell: "Re: module-init-tools 0.9.5"
Next in thread: Neil Brown: "Re: raid5 multi-drive-failure and recovery?"
Maybe reply: Neil Brown: "Re: raid5 multi-drive-failure and recovery?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I've got a problem. Two drives in my array developed bad sectors at
the same time. They're in completely different places, but I can't
read the disk because md simply fails out both of them, and marks the
array unusable.

Now, I can either buy new drives and dd the raw partition over, or I
can hack the kernel to make it a bit smarter about unrecoverable
reads.

Obviously, if I have raid5_error not mark the drive bad, it will hammer
away on it, failing over and over. My thought was to let the read
fail, catch it in raid5_end_read_request then tag the stripe_head
with the device that's failed. If one has already failed, return
EIO. This way further reads on the stripe_head will go to the parity
disk (until it's eventually freed. One IO error per stripe isn't too
harsh a price to pay for disaster recovery)

in 2.4.20, I'm at raid5.c:421 where we're about to call md_error.
What happens to the bh from that point? Obviously, it's not up-to-date,
so when 1 drive fails how does it get re-issued to be pulled from a parity
drive to reconstruct it?

Please CC me, I read via the archives.

Thanks,

--Dan

Obviously, this would ONLY be for recovery in the face of bad sectors.
As quickly as possible the bad drives need to be replaced
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hanna Linder: "Re: Dedicated kernel bug database"
Previous message: Rusty Russell: "Re: module-init-tools 0.9.5"
Next in thread: Neil Brown: "Re: raid5 multi-drive-failure and recovery?"
Maybe reply: Neil Brown: "Re: raid5 multi-drive-failure and recovery?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Dec 23 2002 - 22:00:25 EST