Re: The ext3 way of journalling

From: Kyle Moffett
Date: Tue Jan 08 2008 - 22:22:53 EST


On Jan 08, 2008, at 15:51:53, Andi Kleen wrote:
Theodore Tso <tytso@xxxxxxx> writes:
Now, there are good reasons for doing periodic checks every N mounts and after M months. And it has to do with PC class hardware. (Ted's aphorism: "PC class hardware is cr*p").

If these reasons are good ones (some skepticism here) then the correct way to really handle this would be to do regular background scrubbing during runtime; ideally with metadata checksums so that you can actually detect all corruption.

Poor man's background scrubbing:

(A) Use LVM like virtually all modern distros offer
(B) Leave some extra space in your LVM volume group (enough for 1 snapshot over the time it takes to do an FSCK).
(C) Periodically run the following scriptlet:

set -e
START="$(date +'%Y%m%d%H%M%S')"
lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}"
if nice +20 fsck -fy "/dev/mapper/${VG}_${VOLUME}-snap"; then
echo 'Background scrubbing succeeded!'
tune2fs -T "${START}" "/dev/mapper/${VG}_${VOLUME}"
else
echo 'Background scrubbing failed! Reboot to fsck soon!'
tune2fs -C 16383 -T "19000101" "/dev/mapper/${VG}_${VOLUME}"
fi
lvremove "${VG}/${VOLUME}-snap"

Basically you can fsck the offline snapshot in the background. If it succeeds you can adjust the "last checked" date to the time when the snapshot was taken and if it fails you can schedule an FSCK at next reboot (and possibly remount the filesystem read-only or reboot immediately).

You can do the same thing for your /boot volume, although you probably have to manually use dmsetup since most bootloaders can't interpret LVM volumes.

I've always been surprised that distros like RedHat which automatically use LVM don't stuff this in their weekly or monthly checks on desktop systems. User experience could also be dramatically improved with automated smartd configuration and user- interactive logging and warning messages.


But since fsck is so slow and disks are so big this whole thing is a ticking time bomb now. e.g. it is not uncommon to require tens of minutes or even hours of fsck time and some server that reboots only every few months will eat that when it happens to reboot. This means you get a quite long downtime.

My servers all have an "interval-between-checks" of 2-6 weeks and are configured to run nice +20 background "fsck" checks during off-hours between once every few days and once every few weeks. I also have the "max mount count" numbers set to primes between 7 and 37 (depending on the filesystem) so that troubled or frequently-rebooted systems are more frequently verified. The end result is that I almost never have the dreaded 4-hour-fsck-on-boot problem. A drive has certainly been fscked within the last few weeks of operation, and I will only ever have multiple large filesystems all fscked at the same time very rarely (gcd of their max-mount-counts).

Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/