Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

From: Theodore Ts'o
Date: Thu Apr 30 2015 - 10:57:21 EST


On Thu, Apr 30, 2015 at 11:00:05AM +0200, Martin Steigerwald wrote:
> > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> > the problem goes away. :)
>
> I am quite surprised that a traditional filesystem that was created in the
> age of rotating media does not like this kind of media and even seems to
> excel on BTRFS on the new non rotating media available.

You shouldn't be surprised; XFS was designed in an era where RAID was
extremely important. To this day, on a very large RAID arrays, I'm
pretty sure none of the other file systems will come close to touching
XFS, because it was optimized by some really, really good file system
engineers for that hardware. And while RAID systems are certainly not
identical to SSD, the fact that you have multiple disk heads means
that a good file system will optimize for that parallelism, and that's
how SSD's get their speed (individual SSD channels aren't really all
that fast; it's the fast that you can be reading or writing arge
numbers of them in parallel that high end flash get their really great
performance numbers.)

> > Thing is, once you've abused those filesytsems for a couple of
> > months, the files in ext4, btrfs and tux3 are not going to be laid
> > out perfectly on the outer edge of the disk. They'll be spread all
> > over the place and so all the filesystems will be seeing large seeks
> > on read. The thing is, XFS will have roughly the same performance as
> > when the filesystem is empty because the spreading of the allocation
> > allows it to maintain better locality and separation and hence
> > doesn't fragment free space nearly as badly as the oher filesystems.
> > Free space fragmentation is what leads to performance degradation in
> > filesystems, and all the other filesystem will have degraded to be
> > *much worse* than XFS.

In fact, ext4 doesn't actually lay out things perfectly on the outer
edge of the disk either, because we try to do spreading as well.
Worse, we use a random algorithm to try to do the spreading, so that
means that results from run to run on an empty file system will show a
lot more variation. I won't claim that we're best in class with
either our spreading techniques or our ability to manage free space
fragmentation, although we do a lot of work to manage free space
fragmentation as well.

One of the problems is that it's *hard* to get good benchmarking
numbers that take into account file system aging and measure how well
the free space has been fragmented over time. Most of the benchmark
results that I've seen do a really lousy job at this, and the vast
majority don't even try.

This is one of the reasons why I find head-to-head "competitions"
between file systems to be not very helpful for anything other than
benchmarketing. It's almost certain that the benchmark won't be
"fair" in some way, and it doesn't really matter whether the person
doing the benchmark was doing it with malice aforethought, or was just
incompetent and didn't understand the issues --- or did understand the
issues and didn't really care, because what they _really_ wanted to do
was to market their file system.

And even if the benchmark is fair, it might not match up with the end
user's hardware, or their use case. There will always be some use
case where file system A is better than file system B, for pretty much
any file system. Don't get me wrong --- I will do comparisons between
file systems, but only so I can figure out ways of making _my_ file
system better. And more often than not, it's comparisons of the same
file system before and after adding some new feature which is the most
interesting.

> That are the allocation groups. I always wondered how it can be beneficial
> to spread the allocations onto 4 areas of one partition on expensive seek
> media. Now that makes better sense for me. I always had the gut impression
> that XFS may not be the fastest in all cases, but it is one of the
> filesystem with the most consistent performance over time, but never was
> able to fully explain why that is.

Yep, pretty much all of the traditional update-in-place file systems
since the BSD FFS have done this, and for the same reason. For COW
file systems which are are constantly moving data and metadata blocks
around, they will need different strategies for trying to avoid the
free space fragmentation problem as the file system ages.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/