Re: XFS write speed drop

From: Nathan Scott
Date: Mon May 22 2006 - 18:42:47 EST


Hi Jan,

On Mon, May 22, 2006 at 02:21:48PM +0200, Jan Engelhardt wrote:
> I shuffled a bit in the source code and the word 'order' is quite often
> around 'barrier', so I thought the 'barrier' option makes the IO
> scheduler strictly ordered, that is, write log first, then data. (like

Not quite, theres an explanation here:
http://oss.sgi.com/projects/xfs/faq.html#wcache

> >Can you send the benchmark results themselves please? (as in,
> >the test(s) you've run that lead you to see 6-8x, and the data
> >those tests produced). Also, xfs_info output, and maybe list
> >the device driver(s) involved here too.

Thanks!

> xfs_info
> ========
> 14:20 shanghai:/root # xfs_info /
> meta-data=/dev/hda3 isize=256 agcount=19, agsize=462621 blks
> = sectsz=512 attr=1
> data = bsize=4096 blocks=8586742, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=3614, version=1
> = sectsz=512 sunit=0 blks
> realtime =none extsz=65536 blocks=0, rtextents=0
> 14:20 shanghai:/root # xfs_info /D
> meta-data=/dev/hdc2 isize=256 agcount=16, agsize=3821963 blks
> = sectsz=512 attr=1
> data = bsize=4096 blocks=61151408, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=29859, version=1
> = sectsz=512 sunit=0 blks
> realtime =none extsz=65536 blocks=0, rtextents=0
> ...
> Script started on Mon May 22 15:32:29 2006
> 15:32 (none):~ # mount /dev/hdc2 /D -o barrier
> 15:32 (none):~ # df -Th / /D
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/hda3 xfs 33G 6.7G 27G 21% /
> /dev/hdc2 xfs 234G 142G 92G 61% /D

> A small nitpick BTW, the barrier flag is not listed in /proc/mounts.

Its the default, and so we don't list that. Quirk of the code I guess,
we could add code to distinguish default from requested but it doesn't
seem worth it really.

> hda is always in nobarrier mode.
> <6>hda: cache flushes not supported
> <6>hdc: cache flushes supported
> <5>Filesystem "hda3": Disabling barriers, not supported by the underlying
> device
> hdc can do barriers if wanted.
>
> CASE 1: Copying from one disk to another
> ========================================
> Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
> directories.

OK, we can call this a metadata intensive workload - lots of small
files, lots of creates. Barriers will hurt the most here, as we'd
already have been log I/O bound most likely, and I'd expect barriers
to only slow that further.

> 15:33 (none):/tmp # time rsync -PHSav kernel-source/ /D/kernel-0/ >/dev/null
> real 7m31.776s
> user 0m7.030s
> sys 0m14.450s

versus:

> 15:42 (none):/tmp # rsync -PHSav kernel-source/ /D/kernel-1/ >/dev/null
> real 2m14.401s
> user 0m7.290s
> sys 0m14.900s

Yep, note the user/sys shows no change, we're basically IO bound in
both tests, and barriers are hurting (as expected).

You may be able to combat this by switching to version 2 log format
(see the xfs_db version command in the _very_ latest xfs_db, as of a
few days ago, which allows this switch to be made on an unmounted fs).
Then you will be able to use larger incore log buffers, and this
should reduce the impact the barriers have (fewer, larger log IOs, so
fewer barriers).

> CASE 2: Removing
> ================
> Remove the copies we created in case 1.
>
> 15:45 (none):/tmp # mount /dev/hdc2 /D -o barrier
> 15:45 (none):/D # time rm -Rf kernel-0
> real 3m31.901s
> user 0m0.050s
> sys 0m3.140s

versus:

> 15:49 (none):/ # mount /dev/hdc2 /D -o nobarrier
> 15:49 (none):/D # time rm -Rf kernel-1
>
> real 0m53.471s
> user 0m0.070s
> sys 0m1.990s
> 15:50 (none):/D # cd /
> 15:50 (none):/ # umount /D

Also metadata intensive, of course. All the same issues as above,
and the same techniques should be used to address it.

Odd that the system time jumped here. Roughly the same decrease in
performance though (3-4x).

> CASE 3: Copying to the same disk
> ================================
> Copy a file from hdc2 to hdc2. The result is _extremely_ interesting. So the
> 'barrier' thing is only relevant for large sets of files, it seems.

Its consistent, and easily understood however...

> 15:50 (none):/ # mount /dev/hdc2 /D -o barrier
> 15:50 (none):/D/Video # ls -l The*
> -rw-r--r-- 1 jengelh users 6861772800 Mar 17 23:02 TLK3.iso
> 15:50 (none):/D/Video # time cp TLK3.iso ../tlk-0.iso
> real 8m1.051s
> user 0m0.670s
> sys 0m41.120s

versus:

> 15:59 (none):/ # mount /dev/hdc2 /D -o nobarrier
> 15:59 (none):/D/Video # time cp TLK3.iso ../tlk-1.iso
> real 7m53.275s
> user 0m0.560s
> sys 0m40.010s

This one is a file throughput workload, and so there are far fewer
metadata updates involved here. This means far fewer barrier
operations (far fewer log writes), so we should expect the results
to be as they are - slightly worse with barriers enabled, nothing
drastic though.

So, I agree, you're seeing the cost of write barriers here, but I
don't see anything unexpected (unfortunately for you, I guess).
The FAQ entry above will explain why they're enabled by default.
Try the v2 log change I suggested, hopefully that will mitigate
the problem somewhat. Alternatively, buy some decent hardware if
you're after performance. ;)

cheers.

--
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/