Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

From: David Lang
Date: Tue May 12 2015 - 17:31:16 EST


On Tue, 12 May 2015, Daniel Phillips wrote:

On 05/12/2015 11:39 AM, David Lang wrote:
On Mon, 11 May 2015, Daniel Phillips wrote:
...it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.

Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
"XFS Developer Takes Shots At Btrfs, EXT4"

umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for
trying to turn anything into click-bait by making it sound like a fight when it isn't.

Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.

Phoronix turns any correction or criticism into an attack.

You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. They assume that you are working in good faith (but are inexperienced compared to them), and you need to assume that they are working in good faith. If they ever do resort to underhanded means to sabotage you, Linus and the other kernel developers will take action. But pointing out limits in your current implementation, problems in your benchmarks based on how they are run, and concepts that are going to be difficult to merge is not underhanded, it's exactly the type of assistance that you should be greatful for in friendly competition.

You were the one who started crowing about how badly XFS performed. Dave gave a long and detailed explination about the reasons for the differences, and showing benchmarks on other hardware that showed that XFS works very well there. That's not an attack on EXT4 (or Tux3), it's an explination.

The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.

The linux development process is making code available, responding to concerns from the experts in
the community, and letting the code talk for itself.

Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?

I have my own concerns about how things are going to work (I've voiced some of them), but no, I haven't tried running Tux3 because you say it's not ready yet.

There have been many people pushing code for inclusion that has not gotten into the kernel, or has
not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted
that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what
tarnished it's reputation with many people was how much they were pushing the benchmarks that were
shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30
seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire
'benchmark' ran out of ram without ever touching the disk)

You know what to do about checking for faulty benchmarks.

That requires that the code be readily available, which last I heard, Tux3 wasn't. Has this been fixed?

So when Ted and Dave point out problems with the benchmark (the difference in behavior between a
single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be
better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start
attacking them as a result.

Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.

They pointed out problems with using ramdisk to simulate a SSD and huge differences between spinning rust and an SSD (or disk array). Those aren't FUD.

As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and
Memory Mangement folks you have to convince. You may need a little benchmarking to show that there
is a real advantage to be gained, but the real discussion is going to be on the impact that page
forking is going to have on everything else (both in complexity and in performance impact to other
things)

Yet he clearly wrote "we" as if he believes he is part of it.

He is part of the group of people who use and work with this stuff, so he is part of it.

Now that ENOSPC is done to a standard way beyond what Btrfs had
when it was merged, the next item on the agenda is writeback. That
involves us and VFS people as you say, and not Dave Chinner, who
only intends to obstruct the process as much as he possibly can. He
should get back to work on his own project. Nobody will miss his
posts if he doesn't make them. They contribute nothing of value,
create a lot of bad blood, and just serve to further besmirch the
famously tarnished reputation of LKML.

BTRFS is a perfect example of how not to introduce a new filesystem. Lots of hype, the presumption that is is going to replace all the existing filesystems because it's so much better (especially according to benchmarks). But then progress stalled before it was really ready, and it's still something most people avoid.

You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.

We wouldn't expect anyone developing a new filesystem to believe any differently.

It is not a matter of belief, it is a matter of testable fact. For
example, you can count the lines. You can run the same benchmarks.

Proving the data consistency claims would be a little harder, you
need tools for that, and some of those aren't built yet. Or, if you
have technical ability, you can read the code and the copious design
material that has been posted and convince yourself that, yes, there
is something cool here, why didn't anybody do it that way before?
But of course that starts to sound like work. Debating nontechnical
issues and playing politics seems so much more like fun.

why are you picking a fight? there was no attack in my statement?

If they didn't
believe this, why would they be working on the filesystem instead of just using an existing filesystem.

Right, and it is my job to convince you that what I believe for
perfectly valid, demonstrable technical reasons, is really true. I do
not see why you feel it is your job to convince me that the obviously
broken Linux community process is not in fact broken, and that a
certain person who obviously has an agenda, is not actually obstructing.

You will need to have a fully working, usable system before you can convince people that you are right. A partial system may look good, but how much is fixing the corner cases that you haven't gotten to yet going to hurt it? That there are going to be such cases is pretty much a given, and that changing things to add code to work around the pathalogical conditions is going to hurt the common case is pretty close to a given (it's one of those things that isn't mathamatically guaranteed, but happens on 99.99999+% of projects)

The ugly reality is that everyone's early versions of their new filesystem looks really good. The
problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as
opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you
may not be right, and nobody will know until you are to a usable state and other people can start
beating on it.

With ENOSPC we are at that state. Tux3 would get more testing and advance
faster if it was merged. Things like ifdefs, grandiose new schemes for
writeback infrastructure, dumb little hooks in the mkwrite path, those
are all just manufactured red herrings. Somebody wanted those to be
issues, so now they are issues. Fake ones.

Ok, so you are happy with your allocation strategy? you didn't seem to be a few e-mail ago.

but if you think it's ready for users, then start working to submit it in the next merge window. Dave said that except for one part, there was no reason not to merge it. That's pretty good. So you need to be discussing that one part with the the folks that Dave pointed you at.

Nobody is trying to trick you. Just stating a fact. You ought to be able
to figure out by now that Tux3 is worth merging.

You might possibly have an argument that merging a filesystem that
crashes as soon as it fills the disk is just sheer stupidity than can
only lead to embarrassment in the long run, but then you would need to
explain why Btrfs was merged. As I recall, it went something like, Chris
had it on a laptop, so it must be a filesystem, and wow look at that
feature list. Then it got merged in a completely unusable state and got
worked on. If it had not been merged, Btrfs would most likely be dead
right now. After all, who cares about an out of tree filesystem?

As I said above, Btrfs is a perfect example of how not to do things.

The other think you need to realize is that getting something in the kernel isn't a one-time effort, the code needs to be maintained over time (especially for a filesystem), and it's very possible for a developer/team/company to be so toxic and hostile to others that the Linux folks don't want to deal with the hassle of dealing with them. You are starting out on a path to put yourself into that category. Calm down and stop taking offense at everything. Your succeeding doesn't require that other people loose, so stop talking as if it's a zero sum game and you have to beat down the enemy to get your code accepted.

David Lang

By the way, I gave my Tux3 presentation at SCALE 7x in Los Angeles in
2009, with Tux3 running as my root filesystem. By the standard applied
to Btrfs, Tux3 should have been merged then, right? After all, our
nospace handling worked just as well as theirs at that time.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/