Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition
From: Mike Snitzer
Date: Mon May 19 2008 - 00:34:52 EST
On Fri, May 16, 2008 at 7:54 AM, Neil Brown <neilb@xxxxxxx> wrote:
> On Friday May 9, snitzer@xxxxxxxxx wrote:
>> On Fri, May 9, 2008 at 2:01 AM, Neil Brown <neilb@xxxxxxx> wrote:
>> >
>> > On Friday May 9, snitzer@xxxxxxxxx wrote:
>>
>> > > Unfortunately my testing with this patch results in a full resync.
...
>> > diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
>> > --- .prev/drivers/md/bitmap.c 2008-05-09 11:02:13.000000000 +1000
>> > +++ ./drivers/md/bitmap.c 2008-05-09 16:00:07.000000000 +1000
>> >
>> > @@ -465,8 +465,6 @@ void bitmap_update_sb(struct bitmap *bit
>> > spin_unlock_irqrestore(&bitmap->lock, flags);
>> > sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
>> > sb->events = cpu_to_le64(bitmap->mddev->events);
>> > - if (!bitmap->mddev->degraded)
>> > - sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
>>
>> Before, events_cleared was _not_ updated if the array was degraded.
>> Your patch doesn't appear to maintain that design.
>
> It does, but it is well hidden.
> Bits in the bitmap are only cleared when the array is not degraded.
> The new code for updating events_cleared is only triggered when a bit
> is about to be cleared.
Hi Neil,
Sorry about not getting back with you sooner. Thanks for putting
significant time to chasing this problem.
I tested your most recent patch and unfortunately still hit the case
where the nbd member becomes degraded yet the array continues to clear
bits (events_cleared of the non-degraded member is higher than the
degraded member). Is this behavior somehow expected/correct?
This was the state of the array after the nbd0 member became degraded
and the array was stopped:
# mdadm -X /dev/nbd0 /dev/sdq
Filename : /dev/nbd0
Magic : 6d746962
Version : 4
UUID : 7140cc3c:8681416c:12c5668a:984ca55d
Events : 2642
Events Cleared : 2642
State : OK
Chunksize : 128 KB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 52428736 (50.00 GiB 53.69 GB)
Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
Filename : /dev/sdq
Magic : 6d746962
Version : 4
UUID : 7140cc3c:8681416c:12c5668a:984ca55d
Events : 2646
Events Cleared : 2645
State : OK
Chunksize : 128 KB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 52428736 (50.00 GiB 53.69 GB)
Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
At the time the nbd0 member became degraded events_cleared was 2642.
What I'm failing to understand is how sdq's events_cleared could be
allowed to increment higher than 2642?
I've not yet taken steps to understand/verify your test script. As
such I'm not sure it models my test scenario yet.
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/