Re: [PATCH 000 of 5] md: Introduction

From: Phillip Susi
Date: Tue Jan 17 2006 - 17:37:44 EST


Michael Tokarev wrote:
<snip>
Compare this with my statement about "offline" "reshaper" above:
separate userspace (easier to write/debug compared with kernel
space) program which operates on an inactive array (no locking
needed, no need to worry about other I/O operations going to the
array at the time of reshaping etc), with an ability to plan it's
I/O strategy in alot more efficient and safer way... Yes this
apprpach has one downside: the array has to be inactive. But in
my opinion it's worth it, compared to more possibilities to lose
your data, even if you do NOT use that feature at all...

I also like the idea of this kind of thing going in user space. I was also under the impression that md was going to be phased out and replaced by the device mapper. I've been kicking around the idea of a user space utility that manipulates the device mapper tables and performs block moves itself to reshape a raid array. It doesn't seem like it would be that difficult and would not require modifying the kernel at all. The basic idea is something like this:

/dev/mapper/raid is your raid array, which is mapped to a stripe between /dev/sda, /dev/sdb. When you want to expand the stripe to add /dev/sdc to the array, you create three new devices:

/dev/mapper/raid-old: copy of the old mapper table, striping sda and sdb
/dev/mapper/raid-progress: linear map with size = new stripe width, and pointing to raid-new
/dev/mapper/raid-new: what the raid will look like when done, i.e. stripe of sda, sdb, and sdc

Then you replace /dev/mapper/raid with a linear map to raid-new, raid-progress, and raid-old, in that order. Initially the length of the chunks from raid-progress and raid-new are zero, so you will still be entirely accessing raid-old. For each stripe in the array, you change raid-progress to point to the corresponding blocks in raid-new, but suspended, so IO to this stripe will block. Then you update the raid map so raid-progress overlays the stripe you are working on to catch IO instead of allowing it to go to raid-old. After you read that stripe from raid-old and write it to raid-new, resume raid-progress to flush any blocked writes to the raid-new stripe. Finally update raid so the previously in progress stripe now maps to raid-new.

Repeat for each stripe in the array, and finally replace the raid table with raid-new's table, and delete the 3 temporary devices.


Adding transaction logging to the user mode utility wouldn't be very hard either.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/