Re: fs: break out inode operations from inode_lock V4

From: Al Viro
Date: Fri Oct 29 2010 - 05:29:44 EST


On Fri, Oct 29, 2010 at 07:59:55PM +1100, Dave Chinner wrote:
> Hi Al,
>
> Another update to the inode_lock splitting patch set. It's still
> based on your merge-stem branch. I'm going to be out all weekend, so
> any further changes will take a couple of days to turn around.
>
> Version 4:
> - whitespace cleanup
> - moved setting state on new inodes till after the hash search fails
> in insert_inode_locked
> - made hash insert operations atomic with state changes by holding
> inode->i_lock while doing hash inserts
> - made inode hash removals atomic with state changes by taking the
> inode_lock (later inode_hash_lock) and inode->i_lock. Combined
> with the insert changes, this means the inode_unhashed check in
> ->drop_inode is safely protected by just holding the
> inode->i_lock.
> - protect inode_unhashed() checks in insert_inode_locked with
> inode->i_lock

The last one is not needed at all; look at what's getting done there - we
drop that ->i_lock immediately after the check, so it doesn't buy us anything.
The stuff before that *is* a race fix; namely, the race with BS iget()
triggered by nfsd. This check is just verifying that it was a race and not
a badly confused filesystem. IOW, no need to lock anything and no _point_
locking anything. We are repeating the hash walk anyway; this is just making
sure that we hadn't run into infinite retries.

Other than that I'm OK with that set; could you add "lift ->i_lock from
the beginning of writeback_single_inode()" to the series and post your
current RCU-for-i_hash patch for review?

Nick, can you live with the results of that set as an intermediate point
for merge? Note that RCU for other lists (sb, wb, lru) is bloody pointless -
all heavy users are going to modify the lists in question anyway, so we'll
need exclusion for them.

And yes, removals from the lists ought to be conditional on presense in the
lists, but that's (a) easy to rediff on top of that and (b) is of somewhat
dubious usefulness - eviction will generally find the inode on lists; the
only likely exception is final iput() of unhashed inode. I'm not saying
it's not worth doing, just that the benefits will need to be verified...

NOTE: this is obviously not the end of the road; e.g. i_count is still atomic
at that point, RCU is not done, finer splitting of locks is not done, etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/