Re: modifying degraded raid 1 then re-adding other members is bad

From: Michael Tokarev
Date: Tue Aug 08 2006 - 07:16:54 EST

Next message: Andi Kleen: "Re: [PATCH 1/3] uml: use -mcmodel=kernel for x86_64"
Previous message: Jens Axboe: "Re: swsusp regression [Was: 2.6.18-rc3-mm2]"
In reply to: Neil Brown: "Re: modifying degraded raid 1 then re-adding other members is bad"
Next in thread: Krzysztof Halasa: "Re: modifying degraded raid 1 then re-adding other members is bad"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Neil Brown wrote:
> On Tuesday August 8, aoliva@xxxxxxxxxx wrote:
>> Assume I have a fully-functional raid 1 between two disks, one
>> hot-pluggable and the other fixed.
>>
>> If I unplug the hot-pluggable disk and reboot, the array will come up
>> degraded, as intended.
>>
>> If I then modify a lot of the data in the raid device (say it's my
>> root fs and I'm running daily Fedora development updates :-), which
>> modifies only the fixed disk, and then plug the hot-pluggable disk in
>> and re-add its members, it appears that it comes up without resyncing
>> and, well, major filesystem corruption ensues.
>>
>> Is this a known issue, or should I try to gather more info about it?
>
> Looks a lot like
> http://bugzilla.kernel.org/show_bug.cgi?id=6965
>
> Attached are two patches. One against -mm and one against -linus.
>
> They are below.
>
> Please confirm if the appropriate one help.
>
> NeilBrown
>
> (-mm)
>
> Avoid backward event updates in md superblock when degraded.
>
> If we
> - shut down a clean array,
> - restart with one (or more) drive(s) missing
> - make some changes
> - pause, so that they array gets marked 'clean',
> the event count on the superblock of included drives
> will be the same as that of the removed drives.
> So adding the removed drive back in will cause it
> to be included with no resync.
>
> To avoid this, we only update the eventcount backwards when the array
> is not degraded. In this case there can (should) be no non-connected
> drives that we can get confused with, and this is the particular case
> where updating-backwards is valuable.

Why we're updating it BACKWARD in the first place?

Also, why, when we adding something to the array, the event counter is
checked -- should it resync regardless?

Thanks.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andi Kleen: "Re: [PATCH 1/3] uml: use -mcmodel=kernel for x86_64"
Previous message: Jens Axboe: "Re: swsusp regression [Was: 2.6.18-rc3-mm2]"
In reply to: Neil Brown: "Re: modifying degraded raid 1 then re-adding other members is bad"
Next in thread: Krzysztof Halasa: "Re: modifying degraded raid 1 then re-adding other members is bad"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]