Re: (reiserfs) Re: LVM / Filesystems / High availability

Erik Corry (erik@arbat.com)
Thu, 25 Jun 1998 01:33:42 +0200


In article <199806241751.NAA10373@dcl.MIT.EDU> you wrote:

> OK, suppose I have a 54 terrabyte filesystem, with all PE's in use, and
> I need to remove a 9 gigabyte disk from the middle of the filesystem,
> because it's failing.

> I can't do (2) because I don't have a spare PE to use.

> I could do (1) but that would mean temporarily copying *more* data to
> the failing 9 gigabyte disk as part of the compaction process, thus
> putting that data at risk.

This is a rather special situation, with hardware failing,
and a bad basis for design decisions. Normally in the
kernel we assume that the hardware works. If you want to
guard against that, then you can use more advanced RAID
features. If your disk is failing, perhaps you should be
getting the backup tapes out of the fireproof safe anyway.

> The increase disk activity would also make
> it more likely for the disk to fail completely. Once the filesystem has
> been compacted you now have space to exchange the PE, but that involves
> needless 9 gigabyte copy on top of the overhead of the filesystem
> compaction.

Sure, there's some needless copying, but filesystem
resizing isn't supposed to happen all the time. Performance
doesn't seem like the first priority for this operation.

> If the filesystem is LVM aware, and is using structed block addresses,
> then all it needs to do is to stop allocating blocks in that particular
> PE, and start vacating blocks and inodes out of the failing disk to
> others, on-line. This is faster and more robust.

Sounds like a messy non-solution, where real solutions
(RAID, backups) exist. No serious HA-solution is going to
be based on "If the disk starts failing, then get as much
data as possible off it before it blows up entirely".

-- 
Erik Corry

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu