Re: pagecache locking (was: bcachefs status update) merged)

From: Dave Chinner
Date: Fri Jun 14 2019 - 03:36:42 EST


On Thu, Jun 13, 2019 at 04:30:36PM -1000, Linus Torvalds wrote:
> On Thu, Jun 13, 2019 at 1:56 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > That said, the page cache is still far, far slower than direct IO,
>
> Bullshit, Dave.
>
> You've made that claim before, and it's been complete bullshit before
> too, and I've called you out on it then too.

Yes, your last run of insulting rants on this topic resulted in me
pointing out your CoC violations because you were unable to listen
or discuss the subject matter in a civil manner. And you've started
right where you left off last time....

> Why do you continue to make this obviously garbage argument?
>
> The key word in the "page cache" name is "cache".
>
> Caches work, Dave.

Yes, they do, I see plenty of cases where the page cache works just
fine because it is still faster than most storage. But that's _not
what I said_.

Indeed, you haven't even bothered to ask me to clarify what I was
refering to in the statement you quoted. IOWs, you've taken _one
single statement_ I made from a huge email about complexities in
dealing with IO concurency, the page cache and architectural flaws n
the existing code, quoted it out of context, fabricated a completely
new context and started ranting about how I know nothing about how
caches or the page cache work.

Not very professional but, unfortunately, an entirely predictable
and _expected_ response.

Linus, nobody can talk about direct IO without you screaming and
tossing all your toys out of the crib. If you can't be civil or you
find yourself writing a some condescending "caching 101" explanation
to someone who has spent the last 15+ years working with filesystems
and caches, then you're far better off not saying anything.

---

So, in the interests of further _civil_ discussion, let me clarify
my statement for you: for a highly concurrent application that is
crunching through bulk data on large files on high throughput
storage, the page cache is still far, far slower than direct IO.

Which comes back to this statement you made:

> Is direct IO faster when you *know* it's not cached, and shouldn't
> be cached? Sure. But that/s actually quite rare.

This is where I think you get the wrong end of the stick, Linus.

The world I work in has a significant proportion of applications
where the data set is too large to be cached effectively or is
better cached by the application than the kernel. IOWs, data being
cached efficiently by the page cache is the exception rather than
the rule. Hence, they use direct IO because it is faster than the
page cache. This is common in applications like major enterprise
databases, HPC apps, data mining/analysis applications, etc. and
there's an awful lot of the world that runs on these apps....

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx