On 08/13/2018 08:42 PM, David Sterba wrote:Device replace is implemented in largely the same manner as most other live data migration tools (for example, LVM2's pvmove command).
On Fri, Aug 10, 2018 at 03:04:33AM +0900, Naohiro Aota wrote:How would a device replace work in general?
This series adds zoned block device support to btrfs.
Yay, thanks!
As this a RFC, I'll give you some. The code looks ok for what it claims
to do, I'll skip style and unimportant implementation details for now as
there are bigger questions.
The zoned devices bring some constraints so not all filesystem features
cannot be expected to work, so this rules out any form of in-place
updates like NODATACOW.
Then there's list of 'how will zoned device work with feature X'?
You disable fallocate and DIO. I haven't looked closer at the fallocate
case, but DIO could work in the sense that open() will open the file but
any write will fallback to buffered writes. This is implemented so it
would need to be wired together.
Mixed device types are not allowed, and I tend to agree with that,
though this could work in principle. Just that the chunk allocator
would have to be aware of the device types and tweaked to allocate from
the same group. The btrfs code is not ready for that in terms of the
allocator capabilities and configuration options.
Device replace is disabled, but the changlog suggests there's a way to
make it work, so it's a matter of implementation. And this should be
implemented at the time of merge.
While I do understand that device replace is possible with RAID thingies, I somewhat fail to see how could do a device replacement without RAID functionality.
Is it even possible?
If so, how would it be different from a simple umount?
RAID5/6 + zoned support is highly desired and lack of it could beThat really depends on the allocator.
considered a NAK for the whole series. The drive sizes are expected to
be several terabytes, that sounds be too risky to lack the redundancy
options (RAID1 is not sufficient here).
If we can make the RAID code to work with zone-sized stripes it should be pretty trivial. I can have a look at that; RAID support was on my agenda anyway (albeit for MD, not for btrfs).
The changelog does not explain why this does not or cannot work, so IAs mentioned, it really should work for zone-sized stripes. I'm not sure we can make it to work with stripes less than zone sizes.
cannot reason about that or possibly suggest workarounds or solutions.
But I think it should work in principle.
As this is first post and RFC I don't expect that everything isFYI, I've run a simple stress-test on a zoned device (git clone linus && make) and haven't found any issue with those; compilation ran without a problem, and with quite decent speed.
implemented, but at least the known missing points should be documented.
You've implemented lots of the low-level zoned support and extent
allocation, so even if the raid56 might be difficult, it should be the
smaller part.
Good job!
Cheers,
Hannes