generic_drop_inode

From: Mark Fasheh
Date: Mon Jun 27 2005 - 17:10:58 EST


On Mon, Jun 27, 2005 at 09:26:48PM +0100, Christoph Hellwig wrote:
> I think this still qualifies as calling generic_delete_inode because it's
> a trivial wrapper. Manipulating i_nlink seems rather odd to me, I'd
> say you should rather call into generic_delete_inode directly if
> OCFS2_INODE_MAYBE_ORPHANED is set (that's what generic_drop_inode will
> do for i_nlink == 0 anyway).
Allowing the file system to make the full decision and call the proper
function would be fine, but then we'd need generic_forget_inode exported
too :)

> In fact given every cluster and possibly many network filesystems will
> need this it might make sense to take the OCFS2_INODE_MAYBE_ORPHANED into
> the VFS, i.e. make it an i_state flag (after fixing can_unuse to not do
> something totally stupid with i_state)
Yes, the problem certainly isn't necessarily specific to OCFS2 so I'd be
more than happy to see that as a generic VFS feature. As is, the export and
callback get us past quite a few problems with tracking orphaned inodes in
the cluster.

This however, brings me to a related issue for which I'd appreciate some
advice.

If a node crashes or otherwise fails to complete the unlink(2) then the
inode in qusetion will *not* have actually been orphaned. Of course, the
local node doesn't know this and eventually gets to ocfs2_delete_inode() (via
the MAYBE_ORPHANED flag) which will do the work of determining whether the
inode is actually orphaned. The problem is that by the time we've gotten
there, generic_delete_inode() has done a truncate_inode_pages() which might
throw out data for a file which otherwise wants it on disk.

In fact, because OCFS2 supports POSIX style unlink-while-open across the
entire cluster, we might get to ocfs2_delete_inode() and during the voting
process, discover that the file is still open on another node in which case
we don't want to wipe it but would have lost file data due to the
truncate_inode_pages() anyway.

Essentially, we get to ocfs2_delete_inode() before we have a chance to
take the cluster lock and determine the true state of the inode. By
then it is too late. We really need to be able to make the
determination before calling generic_drop_inode(), but drop_inode() is
under the inode lock, and we can't call out to the cluster. The
question becomes where and how to do this work?
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@xxxxxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/