Re: NTFS clear_inode w/ patch (Was: Extra per-inode data)

Bill Hawes (whawes@star.net)
Sun, 01 Feb 1998 09:44:47 -0500


Martin von Loewis wrote:

> Sorry to go through such a lengthy discussion, but I think a clear understanding
> of this is important. I promise to put the conclusions we get into
> Documentation/filesystems/vfs.txt.
>
> > By an "active hard link" I mean that the dentry for hard link path
> > exists, and therefore has incremented i_count. The i_nlink value
> > most certainly is "related" to i_count, as it defines the upper
> > bound for i_count for a file in a well-behaved filesystem.
>
> No it is not, or ext2 is not a well-behaved filesystem. The sequence
> link("foo","bar");
> open("foo");
> open("bar");
> unlink("foo");
> will produce an inode with i_count>i_nlink.

The inode can't be removed until i_count goes to 0, so this is just a special
case of a temporary state until the last use of the inode is retired.

> i_nlink counts the on-disk references, whereas i_count counts the in-memory
> references. When it comes to cleaning up in-memory resources, i_nlink should
> not matter. If you tell me I have to do this in delete_inode, i_nlink does
> matter.

i_nlink == 0 can simply be used as a flag to call the delete_inode operation. It
sounds as though much of your objection is to the calling convention -- would
you prefer an inode flag meaning "free on last use"?

> No, the name very well describes of what the operation should do, and
> indeed does for all local file systems I've been looking at (which
> also support deleting file systems in the first place).
>
> Please note that affs has the very same problem: it releases resources in
> put_inode. I guess you can construct bad scenarios here as well. They are
> typically exposed by doing 'cat' to a file repeatedly.

The VFS doesn't prescribe the exact semantics to be followed in delete_inode or
the other operations. It provides calls to cover the operations likely to be
needed by the filesystems, and performs the prerequisite operations to make the
calls safe. In the case of delete_inode, the inode is unhashed prior to the
call, thereby protecting the inode from being reused if the cleanup operations
block. This is obviously needed if you're deleting a file from the disk, but
applies equally to cleanup that may block.

> > A file that has been unlinked is just one case
> > in which the in-memory inode isn't needed any more.
>
> Wrong. In the above example, the in-memory inode will be needed even
> after the unlink. Opening, then unlinking a file is a common Unix idiom
> for temporary files. All local file systems handle it well, AFAICT.

We were speaking in the context of operations to be done when i_count goes to 0,
at which point the in-memory inode for an unlinked isn't needed.

> Network file systems are somewhat different, though. NFS does silly renames.
> How would smbfs behave if you unlink a file that is still open?

Smbfs is stateful and the server won't allow a file to be deleted if it's open.
Inodes are freed on the last use, and the close call is made from delete_inode.

> Well, I decided to extend the VFS instead. All discussions aside, can you
> please review this patch? At least, I could not reproduce the Oops with it
> anymore.
>
> Everybody testing this patch: you'll have to recompile the entire kernel, not
> just the NTFS code.

The patch looks OK, but for the case here it isn't necessary to extend the VFS
to fix NTFS. As I said for Martin Mares's original posting, adding a clear_inode
operation is very reasonable, but additions should be made in response to
necessity -- when the current interface can't do the job.

There are some efficiency considerations that make it advantageous to drop
directly into delete_inode from put_inode, and to do the clear_inode there
rather than waiting for the VFS to reclaim the inode later. The iput() call is
going to get the inode spinlock anyway, and if you implement a delete_inode and
clean up the inode, the inode can go directly to the unused list. Otherwise, the
VFS has to gather freeable inodes in try_to_free_inode and put them on a
temporary list to use clear_inode.

You can get the same effect as your patch by simply moving the current put_inode
code into a delete_inode operation, and setting i_nlink to 0 in put_inode when
i_count is 1.

Regards,
Bill