Re: EXT2 and BadBlock updating.....

From: Theodore Y. Ts'o (tytso@MIT.EDU)
Date: Tue Apr 11 2000 - 17:02:43 EST


   Date: Tue, 11 Apr 2000 16:37:49 -0500
   From: Ed Carp <erc@pobox.com>

> When the kernel detects a bad block, it's not so simple to just throw it
> into the badblocks list --- the block may very well likely be in use as
> filesystem metadata, or because it's in use as a file data block. It
> might be possible to have the kernel handle more of these cases
> automatically without requiring an fsck, but past a certain point,
> you're introducing *way* to much hair into the kernel.

   The problem with this approach is, if you're working with systems
   that are up 24x7, to *not* have the ability to automatically detect a
   bad block, copy the data to another block, then mark that block as
   bad is a real pain at best and completely unacceptable at worst. One
   of my clients is using Linux in a network communications controller
   (SONET/ATM backplane) and this sort of thing is going to raise the
   pain level around here as soon as someone realizes that badblocks
   aren't taken case of.

It's one thing if the bad block is in a file data block; there, you can
relocate the data to another block, assuming you can still read the
block by the time you find you have a disk error. It's quite another if
the disk failure happens in a critical piece of the filesystem metadata.

The real right solution to this problem, if you have this kind of
reliability, is to either use disks that do badblock sparing at a
low-level, or (better yet) to use RAID. If you have this kind of
reliability consideration that's what you should really be doing.
Or, if you using Linux in a somewhat embedded system (such as a network
communications controller), then perhaps you should be booting off of
flash ROM, and then keeping temporary files on a RAM disk.

But don't expect a filesystem to be able to magically recover from
arbitrary media failures. There are things we could do to make things
better, but it comes at an increased kernel complexity, and it still
won't solve the problem 100%. The right tool for the sort of problem
you've outlined really is RAID.

                                                - Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:17 EST