Re: [PATCH 000 of 5] md: Introduction

From: Neil Brown
Date: Sun Jan 22 2006 - 17:51:28 EST


On Saturday January 21, akropel1@xxxxxxxxxxxxxxxx wrote:
> NeilBrown <neilb@xxxxxxx> wrote:
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
>
> I've been looking forward to a feature like this, so I took the
> opportunity to set up a vmware session and give the patches a try. I
> encountered both success and failure, and here are the details of both.
>
> On the first try I neglected to read the directions and increased the
> number of devices first (which worked) and then attempted to add the
> physical device (which didn't work; at least not the way I intended).
> The result was an array of size 4, operating in degraded mode, with
> three active drives and one spare. I was unable to find a way to coax
> mdadm into adding the 4th drive as an active device instead of a
> spare. I'm not an mdadm guru, so there may be a method I overlooked.
> Here's what I did, interspersed with trimmed /proc/mdstat output:

Thanks, this is exactly the sort of feedback I was hoping for - people
testing thing that I didn't think to...

>
> mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
>
> md0 : active raid5 sda[0] sdc[2] sdb[1]
> 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> mdadm --grow -n4 /dev/md0
>
> md0 : active raid5 sda[0] sdc[2] sdb[1]
> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

I assume that no "resync" started at this point? It should have done.

>
> mdadm --manage --add /dev/md0 /dev/sdd
>
> md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>
> mdadm --misc --stop /dev/md0
> mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd
>
> md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

This really should have started a recovery.... I'll look into that
too.


>
> For my second try I actually read the directions and things went much
> better, aside from a possible /proc/mdstat glitch shown below.
>
> mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
>
> md0 : active raid5 sda[0] sdc[2] sdb[1]
> 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> mdadm --manage --add /dev/md0 /dev/sdd
>
> md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0]
> 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> mdadm --grow -n4 /dev/md0
>
> md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
> 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> ...should this be... --> [4/3] [UUU_] perhaps?

Well, part of the array is "4/4 UUUU" and part is "3/3 UUU". How do
you represent that? I think "4/4 UUUU" is best.


> [>....................] recovery = 0.4% (5636/1048512) finish=9.1min speed=1878K/sec
>
> [...time passes...]
>
> md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>
> My final test was a repeat of #2, but with data actively being written
> to the array during the reshape (the previous tests were on an idle,
> unmounted array). This one failed pretty hard, with several processes
> ending up in the D state. I repeated it twice and sysrq-t dumps can be
> found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>.
> The writeout load was a kernel tree untar started shortly before the
> 'mdadm --grow' command was given. mdadm hung, as did tar. Any process
> which subsequently attmpted to access the array hung as well. A second
> attempt at the same thing hung similarly, although only pdflush shows up
> hung in that trace. mdadm and tar are missing for some reason.

Hmmm... I tried similar things but didn't get this deadlock. Somehow
the fact that mdadm is holding the reconfig_sem semaphore means that
some IO cannot proceed and so mdadm cannot grab and resize all the
stripe heads... I'll have to look more deeply into this.

>
> I'm happy to do more tests. It's easy to conjur up virtual disks and
> load them with irrelevant data (like kernel trees ;)

Great. I'll probably be putting out a new patch set late this week
or early next. Hopefully it will fix the issues you can found and you
can try it again..


Thanks again,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/