Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

From: Mel Gorman
Date: Wed Jun 10 2015 - 06:15:49 EST


On Wed, Jun 10, 2015 at 11:08:13AM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> > Stop this crap.
> >
> > I made a really clear and unambiguous chain of arguments:
> >
> > - I'm unconvinced about the benefits of INVLPG in general, and your patches adds
> > a whole new bunch of them. [...]
>
> ... and note that your claim that 'we were doing them before, this is just an
> equivalent transformation' is utter bullsh*t technically: what we were doing
> previously was a hideously expensive IPI combined with an INVLPG.
>

And replacing it with an INVLPG without excessive IPI transmission is
changing one major variable. Going straight to a full TLB flush is changing
two major variables. I thought the refill cost was high, parially based
on the estimate of 22,000 cycles in https://lkml.org/lkml/2014/7/31/825.
I've been told in these discussions that I'm wrong and the cost is not
high. As it'll always be variable, we can never be sure which is why
I do not see a value to building a complex test around it that will be
invalidated the instant we use a different CPU. When/if a workload shows
up that really cares about those refill costs then there will be a stable
test case to work from.

> The behavior was dominated by the huge overhead of the remote flushing IPI, which
> does not prove or disprove either your or my opinion!
>
> Preserving that old INVLPG logic without measuring its benefits _again_ would be
> cargo cult programming.
>
> So I think this should be measured, and I don't mind worst-case TLB trashing
> measurements, which would be relatively straightforward to construct and the
> results should be unambiguous.
>
> The batching limit (which you set to 32) should then be tuned by comparing it to a
> working full-flushing batching logic, not by comparing it to the previous single
> IPI per single flush approach!
>

We can decrease it easily but increasing it means we also have to change
SWAP_CLUSTER_MAX because otherwise enough pages are not unmapped for
flushes and it is a requirement that we flush before freeing the pages. That
changes another complex variable because at the very least, it alters LRU
lock hold times.

> ... and if the benefits of a complex algorithm are not measurable and if there are
> doubts about the cost/benefit tradeoff then frankly it should not exist in the
> kernel in the first place. It's not like the Linux TLB flushing code is too boring
> due to overwhelming simplicity.
>
> and yes, it's my job as a maintainer to request measurements justifying complexity
> and your ad hominem attacks against me are disgusting - you should know better.
>

It was not intended as an ad hominem attack and my apologies for that.
I wanted to express my frustration that a series that adjusted one variable
with known benefit will be rejected for a series that adjusts two major
variables instead with the second variable being very sensitive to
workload and CPU.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/