Re: XFS vs Elevators (was Re: [PATCH RFC] nilfs2: continuoussnapshotting file system)

From: Dave Chinner
Date: Thu Aug 21 2008 - 11:56:47 EST


On Thu, Aug 21, 2008 at 05:53:10AM -0600, Matthew Wilcox wrote:
> On Thu, Aug 21, 2008 at 04:04:18PM +1000, Dave Chinner wrote:
> > One thing I just found out - my old *laptop* is 4-5x faster than the
> > 10krpm scsi disk behind an old cciss raid controller. I'm wondering
> > if the long delays in dispatch is caused by an interaction with CTQ
> > but I can't change it on the cciss raid controllers. Are you using
> > ctq/ncq on your machine? If so, can you reduce the depth to
> > something less than 4 and see what difference that makes?
>
> I don't think that's going to make a difference when using CFQ. I did
> some tests that showed that CFQ would never issue more than one IO at a
> time to a drive. This was using sixteen userspace threads, each doing a
> 4k direct I/O to the same location. When using noop, I would get 70k
> IOPS and when using CFQ I'd get around 40k IOPS.

Not obviously the same sort of issue. The traces clearly show
multiple nested dispatches and completions so CTQ is definitely
active...

Anyway, after a teeth-pulling equivalent exercise of finding the
latest firmware for the machine in a format I could apply, I
upgraded the firmware throughout the machine (disks, raid
controller, system, etc) and XFS is a *lot* faster. In fact -
mostly back to +/- a small amount compared to ext3.

run complete:
==========================================================================
avg MB/s user sys
runs xfs ext3 xfs ext3 xfs ext3
intial create total 30 6.36 6.29 4.48 3.79 7.03 5.22
create total 7 5.20 5.68 4.47 3.69 7.34 5.23
patch total 6 4.53 5.87 2.26 1.96 6.27 4.86
compile total 9 16.46 9.61 1.74 1.72 9.02 9.74
clean total 4 478.50 553.22 0.09 0.06 0.92 0.70
read tree total 2 13.07 15.62 2.39 2.19 3.68 3.44
read compiled tree 1 53.94 60.91 2.57 2.71 7.35 7.27
delete tree total 3 15.94s 6.82s 1.38 1.06 4.10 1.49
delete compiled tree 1 24.07s 8.70s 1.58 1.18 5.56 2.30
stat tree total 5 3.30s 3.22s 1.09 1.07 0.61 0.53
stat compiled tree total 3 2.93s 3.85s 1.17 1.22 0.59 0.55


The blocktrace looks very regular, too. All the big bursts of
dispatch and completion are gone as are the latencies on
log I/Os. It would appear that ext3 is not sensitive to
concurrent I/O latency like XFS is...

At this point, I'm still interested to know if the original
results were had ctq/ncq enabled and if it is whether it is
introducing latencies are not.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/