Bug in 2.6.16-rc6 RAID size reporting

From: Joshua Kugler
Date: Wed Mar 15 2006 - 16:46:13 EST


In my quest to get some stability in my system with large RAID devices, I
installed mdadm 2.3.1 and compiled and installed 2.6.16-rc6.

Ran this command:

mdadm -C /dev/md1 --auto=yes -l raid1 -n 2 /dev/etherd/e0.0 /dev/etherd/e1.0

Those are AoE devices, by the way.

Started fine, syncing at 43000K/sec or so. Came in this morning,
and /proc/mdstat had this to report:

Personalities : [raid1]
md1 : active raid1 etherd/e1.0[1] etherd/e0.0[0]
5469958900 blocks super 1.0 [2/2] [UU]
[==========================================================>] resync
=292.8% (3440402688/1174991604) finish=785.8min speed=43043K/sec

You'll notice that it says 5469958900 blocks, but 3440402688/1174991604 done.
Oops.

I tried this with two 400GB drives:

mdadm -C /dev/md2 --auto=yes -l raid1 -n 2 /dev/sdc /dev/sdd

And they report correctly:

md2 : active raid1 sdd[1] sdc[0]
390711296 blocks [2/2] [UU]
[=>...................] resync = 8.1% (31782144/390711296)
finish=152.8min speed=39133K/sec

Line 4045, and following, in drivers/md/md.c seems to be the offending code:

if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
max_blocks = mddev->resync_max_sectors >> 1;
else
max_blocks = mddev->size;

Should max_blocks be mddev->size even if it is resyncing?

The resyncing isn't done, so I can't tell you if the block count is correct
after it's done. I'll let you know as soon as it's done (about 37 hours,
since I had to do a hard reboot due to another unrelated issue...i.e.
PEBKAC).

j----- k-----

--
Joshua Kugler PGP Key: http://pgp.mit.edu/
CDE System Administrator ID 0xDB26D7CE
http://distance.uaf.edu/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/