Re: lock_kernel called under spinlock in NFS

From: Trond Myklebust
Date: Thu Jun 01 2006 - 16:13:02 EST


On Thu, 2006-06-01 at 15:55 -0400, Joe Korty wrote:
> Tree 5fdccf2354269702f71beb8e0a2942e4167fd992
>
> [PATCH] vfs: *at functions: core
>
> introduced a bug where lock_kernel() can be called from
> under a spinlock. To trigger the bug one must have
> CONFIG_PREEMPT_BKL=y and be using NFS heavily. It is
> somewhat rare and, so far, haven't traced down the userland
> sequence that causes the fatal path to be taken.
>
> The bug was caused by the insertion into do_path_lookup()
> of a call to file_permission(). do_path_lookup()
> read-locks current->fs->lock for most of its operation.
> file_permission() calls permission() which calls
> nfs_permission(), which has one path through it
> that uses lock_kernel().
>
> I am not sure how to fix this bug. It is not clear what
> the lock_kernel() call is protecting. Nor is it clear why,
> as part of the introduction of the openat() etc services,
> it was desirable to add a call to file_permission()
> to do_path_lookup().
>
> For now, I plan to turn off CONFIG_PREEMPT_BKL.

Nowhere should anyone be calling file_permission() under a spinlock.

Why would you need to read-protect current->fs in the case where you are
starting from a file? The correct thing to do there would appear to be
to read_protect only the cases where (*name=='/') and (dfd == AT_FDCWD).

Something like the attached patch...

Cheers,
Trond
commit 9ac4cdbb32d593132e03ecf9679e6bbfe04ed358
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Thu Jun 1 16:12:47 2006 -0400

fs/namei.c: Call to file_permission() under a spinlock in do_lookup_path()

We should in any case not need to hold the current->fs->lock for a codepath
that doesn't use current->fs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

diff --git a/fs/namei.c b/fs/namei.c
index 96723ae..a2f79d2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1080,8 +1080,8 @@ static int fastcall do_path_lookup(int d
nd->flags = flags;
nd->depth = 0;

- read_lock(&current->fs->lock);
if (*name=='/') {
+ read_lock(&current->fs->lock);
if (current->fs->altroot && !(nd->flags & LOOKUP_NOALT)) {
nd->mnt = mntget(current->fs->altrootmnt);
nd->dentry = dget(current->fs->altroot);
@@ -1092,9 +1092,12 @@ static int fastcall do_path_lookup(int d
}
nd->mnt = mntget(current->fs->rootmnt);
nd->dentry = dget(current->fs->root);
+ read_unlock(&current->fs->lock);
} else if (dfd == AT_FDCWD) {
+ read_lock(&current->fs->lock);
nd->mnt = mntget(current->fs->pwdmnt);
nd->dentry = dget(current->fs->pwd);
+ read_unlock(&current->fs->lock);
} else {
struct dentry *dentry;

@@ -1118,7 +1121,6 @@ static int fastcall do_path_lookup(int d

fput_light(file, fput_needed);
}
- read_unlock(&current->fs->lock);
current->total_link_count = 0;
retval = link_path_walk(name, nd);
out: