[PATCH] fs: don't use igrab() while holding i_lock (was Re: [RFCPATCH 1/2] Add unlocked version of igrab.)

From: Dave Chinner
Date: Mon Mar 28 2011 - 01:19:39 EST


On Mon, Mar 28, 2011 at 05:39:13PM +1300, Ryan Mallon wrote:
> On 03/28/2011 03:54 PM, Matthew Wilcox wrote:
> > On Mon, Mar 28, 2011 at 02:56:00PM +1300, Ryan Mallon wrote:
> >> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
> >> inode->i_state with inode->i_lock" changes igrab to acquire inode->i_lock,
> >> however some callees, notably nfs_inode_add_request, already hold the lock
> >> when calling igrab.
> >
> > I think a better solution to your problem is to notice that this is
> > called in the context of doing a write to an inode. That means we
> > must already have a reference count on this inode, so it can't possibly
> > be in I_FREEING or I_WILL_FREE. That means we can just call __iget()
> > instead ... except that __iget isn't exported to modules.
>
> Ah, okay. Thanks for the hint.
>
> A few other locations that I can see that call igrab with inode->i_lock
> held are:
>
> fs/ceph/snap.c::ceph_queue_cap_snap
> fs/ceph/addr.c::ceph_set_page_dirty

I don't know how I missed these uses when auditing Nick's code - we
caught the use of the dcache_lock inside i_lock and got that fixed,
but missed these ones.

> fs/nfs/nfs4state.c::nfs4_get_open_state

I know I fixed this one once, along with the first NFS issue you
tripped over. Somehow I lost them along the way.

> There may be some more cases where the locking is less obvious. I don't
> know enough about the filesystem code to say whether each of those can
> skip the (I_FREEING | I_WILL_FREE) check, or whether the correct
> approach is to modify the filesystems themselves so that they do not
> hold i_lock when calling igrab (i.e. rework to use a different outer lock)?
>
> If the correct approach is to use __iget or __igrab then I can prepare a
> patch for this. In the case of __iget, should it just be marked
> EXPORT_SYMBOL and added to include/linux/fs.h?

All of them should simply be a conversion from igrab() to ihold(),
which is already exported. Patch below for all 4 you've reported.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

fs: don't use igrab() while holding i_lock

From: Dave Chinner <dchinner@xxxxxxxxxx>

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
fs/ceph/addr.c | 2 +-
fs/ceph/snap.c | 4 ++--
fs/nfs/nfs4state.c | 2 +-
fs/nfs/write.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 561438b..37368ba 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -92,7 +92,7 @@ static int ceph_set_page_dirty(struct page *page)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
if (ci->i_wrbuffer_ref == 0)
- igrab(inode);
+ ihold(inode);
++ci->i_wrbuffer_ref;
dout("%p set_page_dirty %p idx %lu head %d/%d -> %d/%d "
"snapc %p seq %lld (%d snaps)\n",
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index f40b913..0aee66b 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -463,8 +463,8 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci)

dout("queue_cap_snap %p cap_snap %p queuing under %p\n", inode,
capsnap, snapc);
- igrab(inode);
-
+ ihold(inode);
+
atomic_set(&capsnap->nref, 1);
capsnap->ci = ci;
INIT_LIST_HEAD(&capsnap->ci_item);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index ab1bf5b..da6e895 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -590,7 +590,7 @@ nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner)
state->owner = owner;
atomic_inc(&owner->so_count);
list_add(&state->inode_states, &nfsi->open_states);
- state->inode = igrab(inode);
+ state->inode = ihold(inode);
spin_unlock(&inode->i_lock);
/* Note: The reclaim code dictates that we add stateless
* and read-only stateids to the end of the list */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 85d7525..3236951 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -390,7 +390,7 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
error = radix_tree_insert(&nfsi->nfs_page_tree, req->wb_index, req);
BUG_ON(error);
if (!nfsi->npages) {
- igrab(inode);
+ ihold(inode);
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/