Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition
From: Neil Brown
Date: Mon May 19 2008 - 01:27:28 EST
On Monday May 19, snitzer@xxxxxxxxx wrote:
>
> Hi Neil,
>
> Sorry about not getting back with you sooner. Thanks for putting
> significant time to chasing this problem.
>
> I tested your most recent patch and unfortunately still hit the case
> where the nbd member becomes degraded yet the array continues to clear
> bits (events_cleared of the non-degraded member is higher than the
> degraded member). Is this behavior somehow expected/correct?
It shouldn't be..... ahhh.
There is a delay between noting that the bit can be cleared, and
actually writing the zero to disk. This is obviously intentional
in case the bit gets set again quickly.
I'm sampling the event count at the latter point instead of the
former, and there is time for it to change.
Maybe this patch on top of what I recently sent out?
Thanks,
NeilBrown
Signed-off-by: Neil Brown <neilb@xxxxxxx>
### Diffstat output
./drivers/md/bitmap.c | 10 ++++++++--
./include/linux/raid/bitmap.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c 2008-05-19 15:23:42.000000000 +1000
+++ ./drivers/md/bitmap.c 2008-05-19 15:24:56.000000000 +1000
@@ -1092,9 +1092,9 @@ void bitmap_daemon_work(struct bitmap *b
/* We are possibly going to clear some bits, so make
* sure that events_cleared is up-to-date.
*/
- if (bitmap->events_cleared < bitmap->mddev->events) {
+ if (bitmap->need_sync) {
bitmap_super_t *sb;
- bitmap->events_cleared = bitmap->mddev->events;
+ bitmap->need_sync = 0;
wait_event(bitmap->mddev->sb_wait,
!test_bit(MD_CHANGE_CLEAN,
&bitmap->mddev->flags));
@@ -1273,6 +1273,12 @@ void bitmap_endwrite(struct bitmap *bitm
return;
}
+ if (success &&
+ bitmap->events_cleared < bitmap->mddev->events) {
+ bitmap->events_cleared = bitmap->mddev->events;
+ bitmap->need_sync = 1;
+ }
+
if (!success && ! (*bmc & NEEDED_MASK))
*bmc |= NEEDED_MASK;
diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h
--- .prev/include/linux/raid/bitmap.h 2008-05-19 15:23:50.000000000 +1000
+++ ./include/linux/raid/bitmap.h 2008-05-19 15:24:56.000000000 +1000
@@ -221,6 +221,7 @@ struct bitmap {
unsigned long syncchunk;
__u64 events_cleared;
+ int need_sync;
/* bitmap spinlock */
spinlock_t lock;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/