rcu-walk and dcache scaling tree update and status

From: Nick Piggin
Date: Sun Dec 12 2010 - 21:37:46 EST


The vfs-scale branch has had some progress, but it is now requiring
wider testing and detailed review, particularly of the fine details of
dcache_lock lifting, and rcu-walk synchronisation details and
documentation.

Linus has suggested pretty strongly that he wants to pull this in the
next merge window (recently, that "inodes will be RCU freed in 2.6.38"
in an urelated discussion). As far as I know, that's what he's going to
do. I'd like to get this some time in linux-next to improve test
coverage (many filesystems I can't even test, so there are bound to be a
few silly crashes). Stephen, how do I arrange that?

>From my point of view, it has had nowhere near enough review,
particularly I want Al to be happy with it, filesystem changes looked at
and tested by respective fs maintainers, and anybody who is good at
concurrency. However, if Linus still wants to merge it to kick things
along, I am not going to stop him this time, because I have no known
bugs or pending changes required.

I, like everybody else, would prefer bugs or design flaws to be found
*before* it goes upstream, of course. I would be happy to spend time on
irc with reviewers (ask me offline). And if anybody has reasonable
concerns or suggestions, I will be happy to take that into account. I
will not flame anybody who reads my replies, even if it takes a while
for one or both of us to understand.

Documentation/filesystems/path-lookup.txt is a good place to start
reviewing the fun stuff. I would much appreciate review of documentation
and comments too, if anything is not clear, omitted, or not matching the
code.

Also, please keep an eye on the end result when reviewing patches.
Particularly the locking patches before dcache_lock is lifted, these are
supposed to provide a lock coverage to lift dcache_lock with minimal
complexity. They are not supposed to be nice looking code that you'd
want to run on your production box, they are supposed to be nice
changesets (from a review and verification point of view).

Git tree is here:

git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git

Branch is:

vfs-scale-working

Changes since last posting:
* Add a lot more comments for rcu-walk code and functions
* Fix reported d_compare vfat crash
* Incorporate review suggestions
* Make rcu-walk bail out if we have to call a security subsystem
* Fix for filesystems rewriting dentry name in-place
* Audit d_seq barrier write-side, add a few places where it was missing
* Optimised dentry memcmp

Testing:
Testing filesystems is difficult, particularly remote filesystems, and
ones without mkfs packaged in debian. I'm running ltp and xfstests among
others, but those cover a tiny portion of what you can do with the
dcache. The more testing the merrier.

I have been unable to break anything for a long time, but the race
windows can be tiny. I've been trying to insert random delays into
different parts of critical sections, and write tests specifically
targetting particular races, but that's slow going -- review of locking,
or testing on different configurations should be much more productive.

Final note:
You won't be able to reproduce the parallel path walk scalability
numbers that I've posted, because the vfsmount refcounting scalability
patch is not included. I have a new idea for that now, so I'll be asking
for comments with that soon.

Thanks,
Nick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/