Re: ALERT: md/raid6 data corruption risk.

From: Dan Williams
Date: Mon Aug 18 2014 - 12:33:47 EST


On Sun, Aug 17, 2014 at 11:16 PM, NeilBrown <neilb@xxxxxxx> wrote:
>
> Hi all,
> There is a risk of data loss with md/raid6 arrays running on Linux since
> 2.6.32.
> If:
> - the array is doubly degraded
> - one or both failed devices are being recovered, and
> - the array is written to
>
> then it is possible for data on the array to be lost. The patch below fixes
> the problem. If you apply the patch to an older kernel which has separate
> handle_stripe5() and handle_stripe6() functions, be sure that patch changes
> handle_stripe6().
>
> There is no risk to an optimal array or a singly-degraded array. There is
> also no risk on a doubly-degraded array which is not recovering a device or
> is not receiving write requests.
>
> If you have data on a RAID6 array, please consider how to avoid corruption,
> possibly by applying the patch, possibly by removing any hot spares so
> recovery does not automatically start.
>
> This patch will be sent upstream shortly and will subsequently appear in
> future "-stable" kernels.
>
> NeilBrown
>
> From f94e37dce722ec7b6666fd04be357f422daa02b5 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@xxxxxxx>
> Date: Wed, 13 Aug 2014 09:57:07 +1000
> Subject: [PATCH] md/raid6: avoid data corruption during recovery of
> double-degraded RAID6
>
> During recovery of a double-degraded RAID6 it is possible for
> some blocks not to be recovered properly, leading to corruption.
>
> If a write happens to one block in a stripe that would be written to a
> missing device, and at the same time that stripe is recovering data
> to the other missing device, then that recovered data may not be written.
>
> This patch skips, in the double-degraded case, an optimisation that is
> only safe for single-degraded arrays.
>
> Bug was introduced in 2.6.32 and fix is suitable for any kernel since
> then. In an older kernel with separate handle_stripe5() and
> handle_stripe6() functions that patch must change handle_stripe6().
>
> Cc: stable@xxxxxxxxxxxxxxx (2.6.32+)
> Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
> Cc: Yuri Tikhonov <yur@xxxxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Reported-by: "Manibalan P" <pmanibalan@xxxxxxxxxxxxxx>
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
> Signed-off-by: NeilBrown <neilb@xxxxxxx>
>

Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx>

...with a capital "ACK"!.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/