Re: [patch 00/27] [rfc] vfs scalability patchset

From: Eric W. Biederman
Date: Tue Apr 28 2009 - 07:32:40 EST


Christoph Hellwig <hch@xxxxxxxxxxxxx> writes:

> On Sat, Apr 25, 2009 at 09:06:49AM +0100, Al Viro wrote:
>> Maybe... What Eric proposed is essentially a reuse of s_list for per-inode
>> list of struct file. Presumably with something like i_lock for protection.
>> So that's not a conflict.
>
> But what do we actually want it for? Right now it's only used for
> ttys, which Nick has split out, and for remount r/o. For the normal
> remount r/o case it will go away once we have proper per-sb writer
> counts. And the fource remount r/o from sysrq is completely broken.

The plan is to post my updated patches tomorrow after I have slept.

What I am looking at is that the tty layer is not a special case. Any
subsystem that wants any revoke kind of functionality starts wanting
the list of files that are open. My current list where we have
something like this is: sysfs, proc, sysctl, tun, tty, sound.

I am in the process of generalizing the handling and bringing all of this
into the VFS, where we only need to maintain it once, and can see
clearly what is going on so we can optimize it.

For that I essentially need per inode lists of files. Devices don't
have inodes but the usually have some kind of equivalent like the
tty struct we can attach inodes to.

It looks like what I have could pretty easily be used to implement
mount -f except for some weird cases like nfsd where the usual vfs
rules are not followed. In particular things vfs_sync are a pain.

> A while ago Peter had patches for files_lock scalability that went even
> further than Nicks, and if I remember the arguments correctly just
> splitting the lock wasn't really enough and he required additional
> batching because there just were too many lock roundtrips. (Peter, do
> you remember the defails?)

I would love to hear what the issues are. Since everyone is worried
about performance and contention I have gone ahead and made the
files_list_lock per inode in my patches. We will see how well that works.
My goals has simply been to add functionality without making a significant
change in performance on the current workloads.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/