Re: Inode Lock Scalability V4

From: Nick Piggin
Date: Sat Oct 16 2010 - 13:55:47 EST

Next message: Patrick Ringl: "2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]"
Previous message: Maciej Rutecki: "[2.6.36-rc8] ERROR: "dump_write" [fs/binfmt_aout.ko] undefined!"
In reply to: Christoph Hellwig: "Re: [PATCH 02/19] kernel: add bl_list"
Next in thread: Dave Chinner: "Re: Inode Lock Scalability V4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Oct 16, 2010 at 07:13:54PM +1100, Dave Chinner wrote:
> Version 2 of this series is a complete rework of the original patch
> series. Nick's original code nested list locks inside the the
> inode->i_lock, resulting in a large mess of trylock operations to
> get locks out of order all over the place. In many cases, the reason
> fo this lock ordering is removed later on in Nick's series as
> cleanups are introduced.

I think you must misunderstand how my patchset is structured. It is not
that it was simply thrown together with cleanups piled on top of crap as
I discovered new ways to fix trylocks.

It is structured so that for the first steps, each (semi-)independent
aspect of inode_lock's concurrency protection is taken over by a new
lock. I've just made these locks dumb globals for simplicity at this
point.

Often, the required locking of these different aspects of the icache
overlap. Also inode_lock remains in place until the end. This makes lock
ordering get ugly.

But the point is that once we get past the trylocks and actually have
the locks held, it is relatively easy to demonstrate that the protection
provided is exactly what inode_lock provided.

After inode_lock is removed, I start to scale the locks properly and
introduce more complicated transforms to the code to improve the
locking. I really like how I split it up.

In your patch set you've basically pulled all these steps into the
inode_lock splitting itself. This is my first big objection.

Second I actually prefer how I cover all the icache state of an inode
with i_lock. You have avoided some coverage in the interest of reducing
lock ordering complexity, but as I have demonstrated RCU tends to work
as well for this too (and is mostly all solved in the second half of my
series).

> This patch set is just the basic inode_lock breakup patches plus a
> few more simple changes to the inode code. It stops short of
> introducing RCU inode freeing because those changes are not
> completely baked yet.

It also doesn't contain per-zone locking and lrus, or scalability of
superblock list locking.

And while the rcu-walk path walking is not fully baked, it has been
reviewed by Linus and is in pretty good shape. So I prefer to utilise
RCU locking here too, seeing as we know it will go in.

I much prefer to fix all of the problematic global locks in icache up
front and do the icache merge in a single release so we don't have
locking changes there stringing out over several releases.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Patrick Ringl: "2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]"
Previous message: Maciej Rutecki: "[2.6.36-rc8] ERROR: "dump_write" [fs/binfmt_aout.ko] undefined!"
In reply to: Christoph Hellwig: "Re: [PATCH 02/19] kernel: add bl_list"
Next in thread: Dave Chinner: "Re: Inode Lock Scalability V4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]