Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

From: Daniel Phillips
Date: Thu Apr 30 2015 - 07:14:45 EST


On Wednesday, April 29, 2015 5:20:08 PM PDT, Dave Chinner wrote:
It's easy to be fast on empty filesystems. XFS does not aim to be
fast in such situations - it aims to have consistent performance
across the life of the filesystem.

In this case, ext4, btrfs and tux3 have optimal allocation filling
from the outside of the disk, while XFS is spreading the files
across (at least) 4 separate regions of the whole disk. Hence XFS is
seeing seek times on read are much larger than the other filesystems
when the filesystem is empty as it is doing full disk seeks rather
than being confined to the outer edges of spindle.

Thing is, once you've abused those filesytsems for a couple of
months, the files in ext4, btrfs and tux3 are not going to be laid
out perfectly on the outer edge of the disk. They'll be spread all
over the place and so all the filesystems will be seeing large seeks
on read. The thing is, XFS will have roughly the same performance as
when the filesystem is empty because the spreading of the allocation
allows it to maintain better locality and separation and hence
doesn't fragment free space nearly as badly as the oher filesystems.
Free space fragmentation is what leads to performance degradation in
filesystems, and all the other filesystem will have degraded to be
*much worse* than XFS.

Put simply: empty filesystem benchmarking does not show the real
performance of the filesystem under sustained production workloads.
Hence benchmarks like this - while interesting from a theoretical
point of view and are widely used for bragging about whose got the
fastest - are mostly irrelevant to determining how the filesystem
will perform in production environments.

We can also look at this algorithm in a different way: take a large
filesystem (say a few hundred TB) across a few tens of disks in a
linear concat. ext4, btrfs and tux3 will only hit the first disk in
the concat, and so go no faster because they are still bound by
physical seek times. XFS, however, will spread the load across many
(if not all) of the disks, and so effectively reduce the average
seek time by the number of disks doing concurrent IO. Then you'll
see that application level IO concurrency becomes the performance
limitation, not the physical seek time of the hardware.

IOWs, what you don't see here is that the XFS algorithms that make
your test slow will keep *lots* of disks busy. i.e. testing empty
filesystem performance a single, slow disk demonstrates that an
algorithm designed for scalability isn't designed to acheive
physical seek distance minimisation. Hence your storage makes XFS
look particularly poor in comparison to filesystems that are being
designed and optimised for the limitations of single slow spindles...

To further demonstrate that it is physical seek distance that is the
issue here, lets take the seek time out of the equation (e.g. use a
SSD). Doing that will result in basically no difference in
performance between all 4 filesystems as performance will now be
determined by application level concurrency and that is the same for
all tests.

Lovely sounding argument, but it is wrong because Tux3 still beats XFS
even with seek time factored out of the equation.

Even with SSD, if you just go splattering files all over the disk you
will pay for it in latency and lifetime when the disk goes into
continuous erase and your messy layout causes write multiplication.
But of course you can design your filesystem any way you want. Tux3
is designed to be fast on the hardware that people actually have.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/