Re: processes hung after sys_renameat, and 'missing' processes

From: Eric W. Biederman
Date: Fri Jun 08 2012 - 03:54:08 EST


Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Thu, Jun 07, 2012 at 10:25:46PM -0700, Eric W. Biederman wrote:
>
>> I am still learly of d_materialise_unique as it allows to create alias's
>> on non-directories. It isn't a functional problem as d_revalidate will
>> catch the issue and make it look like we have a unlink/link pair instead
>> of a proper rename. However since it is possible I would like to aim
>> for the higher quality of implemntation and use show renames as renames.
>
> ???
>
> Please, explain. link/unlink pair in which sense?

In the sense that if we don't use d_move. A rename will look to
userspace like a pair of sys_link and sys_unlink operations.

If I happen to have a file open with the old name and the dentry passes
through d_drop. The /proc/self/fd/N will show the filename as
"...(deleted)".

And in every other way I can think of that is userspace visible this
will look like a pair of link and unlink operations.

> I don't see what kind of *notify hookup do you have in mind. Anything that
> treats "dentry failed revalidation or got evicted by memory pressure" as
> "unlink" is completely nuts, IMO.

In this case much as it might be convinient to have a *notify report,
what I was thinking of were the much simpler userspace visible aspects,
like what /proc/self/fd/N symlinks report.


In the little corner case user visible details the current state of vfs
support for distributed filesystems looks nuts to me, especially where
we can't apply an appropriate d_move onto a renamed dentry.

The fact that open files, open directories and mount points pin dentries
in memory cause interesting challenges for keeping the local vfs state
in sync with the state of a remote filesystem.

What I would love to be able to do is to replay some kind of journal
that reports what happened to the filesystem outside of the linux vfs
onto the linux vfs so that we can get a more accurate picture of what
really happened to the filesystem. Which should allow *notify and the
like to actually work. Would make the /proc/self/fd/* symlinks more
useful, and make allow files that are mount points to be renamed.

But ultimately the change in semantics bugs me. Using d_move less
often feels user visible and because d_materialise_unique because it
does not handle renames of files feels like a lurking maintenance bomb
for sysfs.

Especially since renames on files with mount points on them should be
treated differently from normal files.

Speaking of I just found a small unhandled case in __d_unalias. We need
to deny renaming of mount points.

Eric

From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Subject: dcache: Deny renaming via __d_unalias dentries of mountpoints

Make __d_unalias match vfs_rename_dir and vfs_rename_other and don't
allow renaming mount points.

Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

---
diff --git a/fs/dcache.c b/fs/dcache.c
index 85c9e2b..d236722 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2380,14 +2380,17 @@ static struct dentry *__d_unalias(struct inode *inode,
struct dentry *dentry, struct dentry *alias)
{
struct mutex *m1 = NULL, *m2 = NULL;
- struct dentry *ret;
+ struct dentry *ret = ERR_PTR(-EBUSY);
+
+ /* Linux does not rename mount points */
+ if (d_mountpoint(alias))
+ goto out_err;

/* If alias and dentry share a parent, then no extra locks required */
if (alias->d_parent == dentry->d_parent)
goto out_unalias;

/* See lock_rename() */
- ret = ERR_PTR(-EBUSY);
if (!mutex_trylock(&dentry->d_sb->s_vfs_rename_mutex))
goto out_err;
m1 = &dentry->d_sb->s_vfs_rename_mutex;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/