Re: Nick's vfs-scalability patches ported to 2.6.33-rt

From: Dave Chinner
Date: Thu Mar 11 2010 - 23:41:40 EST


On Thu, Mar 11, 2010 at 07:08:32PM -0800, john stultz wrote:
> On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote:
> > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote:
> > > So this all means that with Nick's patch set, we're no longer getting
> > > bogged down in the vfs (at least at 8-way) at all. All the contention is
> > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the
> > > journal and block allocation code).
> >
> > Can you check if you're running into any fs scaling limit with xfs?
>
>
> Here's the charts from some limited testing:
> http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png

What's the X-axis? Number of clients?

If so, I have previously tested XFS to make sure throughput is flat
out to about 1000 clients, not 8. i.e I'm not interested in peak
throughput from dbench (generally a meaningless number), I'm much
more interested in sustaining that throughput under the sorts of
loads a real fileserver would see...

> They're not great. And compared to ext3, the results are basically
> flat.
> http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png
>
> Now, I've not done any real xfs work before, so if there is any tuning
> needed for dbench, please let me know.

Dbench does lots of transactions which runs XFS into being log IO
bound. Make sure you have at least a 128MB log and are using
lazy-count=1 andperhaps even the logbsize=262144 mount option. but
in general it only takes 2-4 clients to reach maximum throughput on
XFS....

> The odd bit is that perf doesn't show huge overheads in the xfs runs.
> The spinlock contention is supposedly under 5%. So I'm not sure whats
> causing the numbers to be so bad.

It's bound by sleeping locks or IO. call-graph based profiles
triggered on context switches are the easiest way to find the
contending lock.

Last time I did this (around 2.6.16, IIRC) it involved patching the
kernel to put the sample point in the context switch code - can we
do that now without patching the kernel?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/