On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote:
Well, yes - I never claimed XFS is a general purpose filesystem. It
is a high performance filesystem. Is is also becoming more relevant
to general purpose systems as low cost storage gains capabilities
that used to be considered the domain of high performance storage...
OK. Well, Tux3 is general purpose and that means we care about single
spinning disk and small systems.
So, to demonstrate, I'll run the same tests but using a 256GB
samsung 840 EVO SSD and show how much the picture changes.
I will go you one better, I ran a series of fsync tests using
tmpfs, and I now have a very clear picture of how the picture
changes. The executive summary is: Tux3 is still way faster, and
still scales way better to large numbers of tasks. I have every
confidence that the same is true of SSD.
/dev/ramX can't be compared to an SSD. Yes, they both have low
seek/IO latency but they have very different dispatch and IO
concurrency models. One is synchronous, the other is fully
asynchronous.
I had ram available and no SSD handy to abuse. I was interested in
measuring the filesystem overhead with the device factored out. I
mounted loopback on a tmpfs file, which seems to be about the same as
/dev/ram, maybe slightly faster, but much easier to configure. I ran
some tests on a ramdisk just now and was mortified to find that I have
to reboot to empty the disk. It would take a compelling reason before
I do that again.
This is an important distinction, as we'll see later on....
I regard it as predictive of Tux3 performance on NVM.
Running the same thing on tmpfs, Tux3 is significantly faster:
Ext4: 1.40s
XFS: 1.10s
Btrfs: 1.56s
Tux3: 1.07s
3% is not "signficantly faster". It's within run to run variation!
You are right, XFS and Tux3 are within experimental error for single
syncs on the ram disk, while Ext4 and Btrfs are way slower:
Ext4: 1.59s
XFS: 1.11s
Btrfs: 1.70s
Tux3: 1.11s
A distinct performance gap appears between Tux3 and XFS as parallel
tasks increase.
You wish. In fact, Tux3 is a lot faster. ...
Yes, it's easy to be fast when you have simple, naive algorithms and
an empty filesystem.
No it isn't or the others would be fast too. In any case our algorithms
are far from naive, except for allocation. You can rest assured that
when allocation is brought up to a respectable standard in the fullness
of time, it will be competitive and will not harm our clean filesystem
performance at all.
There is no call for you to disparage our current achievements, which
are significant. I do not mind some healthy skepticism about the
allocation work, you know as well as anyone how hard it is. However your
denial of our current result is irritating and creates the impression
that you have an agenda. If you want to complain about something real,
complain that our current code drop is not done yet. I will humbly
apologize, and the same for enospc.
That's roughly 10x faster than your numbers. Can you describe your
test setup in detail? e.g. post the full log from block device
creation to benchmark completion so I can reproduce what you are
doing exactly?
Mine is a lame i5 minitower with 4GB from Fry's. Yours is clearly way
more substantial, so I can't compare my numbers directly to yours.
Clearly the curve is the same: your numbers increase 10x going from 100
to 1,000 tasks and 12x going from 1,000 to 10,000. The Tux3 curve is
significantly flatter and starts from a lower base, so it ends with a
really wide gap. You will need to take my word for that for now. I
promise that the beer is on me should you not find that reproducible.
The repository delay is just about not bothering Hirofumi for a merge
while he finishes up his inode table anti-fragmentation work.
Note: you should recheck your final number for Btrfs. I have seen
Btrfs fall off the rails and take wildly longer on some tests just
like that.
Completely reproducable...
I believe you. I found that Btrfs does that way too much. So does XFS
from time to time, when it gets up into lots of tasks. Read starvation
on XFS is much worse than Btrfs, and XFS also exhibits some very
undesirable behavior with initial file create. Note: Ext4 and Tux3 have
roughly zero read starvation in any of these tests, which pretty much
proves it is not just a block scheduler thing. I don't think this is
something you should dismiss.
I wouldn't be so sure about that...
Tasks: 8 16 32
Ext4: 93.06 MB/s 98.67 MB/s 102.16 MB/s
XFS: 81.10 MB/s 79.66 MB/s 73.27 MB/s
Btrfs: 43.77 MB/s 64.81 MB/s 90.35 MB/s ...
Ext4: 807.21 MB/s 1089.89 MB/s 867.55 MB/s
XFS: 997.77 MB/s 1011.51 MB/s 876.49 MB/s
Btrfs: 55.66 MB/s 56.77 MB/s 60.30 MB/s
Numbers are again very different for XFS and ext4 on /dev/ramX on my
system. Need to work out why yours are so low....
Your machine makes mine look like a PCjr.
I said then that when we
got around to a proper fsync it would be competitive. Now here it
is, so you want to change the topic. I understand.
I haven't changed the topic, just the storage medium. The simple
fact is that the world is moving away from slow sata storage at a
pretty rapid pace and it's mostly going solid state. Spinning disks
also changing - they are going to ZBC based SMR, which is a
compeltely different problem space which doesn't even appear to be
on the tux3 radar....
So where does tux3 fit into a storage future of byte addressable
persistent memory and ZBC based SMR devices?
You won't convince us to abandon spinning rust, it's going to be around
a lot longer than you think. Obviously, we care about SSD and I believe
you will find that Tux3 is more than competitive there. We lay things
out in a very erase block friendly way. We need to address the volume
wrap issue of course, and that is in progress. This is much easier than
spinning disk.
Tux3's redirect-on-write[1] is obviously a natural for SMR, however
I will not get excited about it unless a vendor waves money.