Re: [RFC] st_nlink after rmdir() and rename()
From: Al Viro
Date: Thu Mar 03 2011 - 16:24:00 EST
On Thu, Mar 03, 2011 at 12:05:43PM -0800, Linus Torvalds wrote:
> The thing is, I don't think it's a QoI question at all - since any
> user that _depends_ on this kind of behavior is simply broken. We
> aren't going to guarantee it, exactly because some filesystems simply
> will not ever guarantee it.
Umm... Not really. "If you decide to use idiotify, at least retain some
measure of sanity and keep it to local filesystems" is more or less
feasible. "Just say no to CONFIG_INOTIFY" is getting harder and harder -
recent udev won't run without that and recent X depends on udev.
And inotify *does* expose that to userland. Set inotify watch on a
directory. Ask to be notified of IN_DELETE_SELF. Then do overwriting
rename() (note that it doesn't have to be busy - just sitting there
being watched). You will get event on most of the local filesystems,
same as you would from rmdir(). However, do that on jffs2 and event
will be generated only on rmdir(). Directory itself will be killed
by rename() just as thoroughly as by rmdir(); it's not something like
silllyrename on NFS where we really have different behaviour.
> Now, for FAT we do in fact try rather hard to fake the i_nlink count,
> but I'm not at all convinced that's a good idea. It makes reading
> directory inodes on FAT much more expensive (we have to basically do a
> readdir for each open). So I'd hate to make that whole "you need to
> emulate i_nlink even if you really don't care" be something that we
> actually end up thinking is a quality issue.
That is completely separate story; keeping st_nlink for live directories
equal to 2 + number of subdirectories is, IMO, fairly silly on such
filesystems. The only reason for doing that was to allow find(1) some
optimizations, IIRC.
But we are talking about a very different thing - not "I can tell if it's
a leaf directory by looking at st_nlink", but "link count 0 means that it's
only kept alive by being busy". And it's trivial to maintain.
Look, in rename() we *must* check that victim is empty anyway. IOW, any
instance on a local fs will have the place where it has decided that we
are killing a directory. Ditto for successful rmdir(). We don't need
to count subdirectories or anything like that - it's a matter of saying
victim->i_nlink = 0 instead of victim->i_nlink-- in a specific place in
foo_rename().
> There are other filesystems where i_nlink can be even _less_
> meaningful, ie if the filesystem does any kind of fancy
> content-indexing thing or lazy COW (think "union filesystem within the
> filesystem") or whatever, I could easily see i_nlink not having any
> traditional unix filesystem semantics.
>
> Seriously - how did you even notice this?
Code review. Went crawling through i_nlink users in the tree after the
ext2_rename() fun caught by Josh, found a bunch of obvious bugs and
this oddity.
> I'm not opposed to fix actual bugs, but I _do_ think it is
> questionable to make this kind of nonsense semantic issue be an issue.
See above re exposure to userland via inotify. I don't think that it's
an earth-shattering problem, obviously, but since fixing it really doesn't
cost anything on few local fs that have it...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/