Re: 0-nlink inodes can lead to dirty filesystems being marked "clean"

Theodore Y. Ts'o (tytso@mit.edu)
Fri, 26 Jun 1998 21:56:48 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Bradley M Keryan: "Re: 2.1.107 frame buffer _badly_ broken"
Previous message: Alan Modra: "Re: Weird spelling fixes in 2.1.107"

From: buhr@stat.wisc.edu (Kevin Buhr)
Date: 26 Jun 1998 13:36:50 -0500

As far as I can see, the current semantics are as follows. The
filesystem on which the inode resides can be remounted read-only. For
"ext2", this means the filesystem will be marked "clean". However,
the 0-nlink inode and its blocks are still marked "in-use" in the
bitmaps. The filesystem will only *truly* be clean (i.e., not in need
of an "fsck") if the process exits; then, the implied "iput" will
deallocate the inode and its blocks in the associated bitmaps.

You're right. This is not a disaster, as the filesystem is not invalid,
per se, but it does mean that the space isn't reclaimed until the next
fsck, which ideally should be soon.

2. When a filesystem is remounted read-only, it is *not* marked
clean if it has in-use, 0-nlink inodes. If a following "umount"
or "remount" finds no such inodes, *then* the filesystem can be
marked clean.

Implementation would require changes in the VFS layer (to
communicate to the "remount" functions that the filesystem is
still dirty) and in various filesystem-specific functions,
however.

This is certainly the cleaner solution, but it does require more work.
It doesn't have to require VFS layer changes, by the way; you can do
this by having the ext2 code bump a counter in the in-core ext2
superblock structure when it unlinks an inode, and decrement the counter
when ext2_free_inode actually frees the inode. If that counter is
non-zero, then the ext2 filesystem knows that there must be in-use,
0-nlink inodes, and it can mark the filesystem dirty when it remounts it
r/o.

3. We refuse to remount a filesystem read-only if it has in-use,
0-nlink inodes; that is, we have "fs_may_remount_ro" return 0 so
that the remount request returns EBUSY.

This is certainly simpler, and easier to implement. The one potential
danger with this is that since the filesystem isn't mounted read-only,
badly misbehaving init scripts might actually try to write files on the
root filesystem (which would normally fail), and then leave the
filesystem in a state which might require a manual e2fsck. This would
arguably be a badly written init script, but it would be nice to be able
to avoid such a secnario.

Anyway, --- Kevin --- would you be willing to give a try at implementing
(2) using the implementation strategy I suggested? If not, I'll be
happy to implement it myself, but given that you were the first to note
this problem, I thought I'd give you first crack at this alternative
fix, if you want. (Otherwise I should have time to get to this sometime
next week.)

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu

Next message: Bradley M Keryan: "Re: 2.1.107 frame buffer _badly_ broken"
Previous message: Alan Modra: "Re: Weird spelling fixes in 2.1.107"