Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

From: Linus Torvalds
Date: Sat Sep 09 2017 - 14:03:28 EST


On Sat, Sep 9, 2017 at 10:49 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> Anyway, if I need change the behavior back, I can do it in one of two
> ways. I can just switch to init_mm instead of going lazy, which is
> expensive, but not *that* expensive on CPUs with PCID. Or I can do it
> the way we used to do it and send the flush IPI to lazy CPUs. The
> latter will only have a performance impact when a flush happens, but
> the performance hit is much higher when there's a flush.

Why not both?

Let's at least entertain the idea. In particular, we don't send IPI's
to *all* CPU's. We only send them to the set of CPU's that could have
that MM cached.

And that set _may_ be very limited. In the best case, it's just the
current CPU, and no IPI is needed at all.

Which means that maybe we can use that set of CPU's as guidance to how
we should treat lazy.

We can *also* take PCID support into account.

So what I would suggest is something like

- if we have PCID support, _and_ the set of CPU's is more than just
us, just switch to init_mm. The switch is cheaper than the IPI's.

- otherwise do what we used to do, with the IPI.

The exact heuristics could be tuned later, but considering Markus's
report, and considering that not so many people have really even
heavily tested the new code yet (so _one_ report now means that there
are probably a shitload of machines that would show it later), I
really think we need to steer back towards our old behavior. But at
the same time, I think we can take advantage of newer CPU's that _do_
have PCID.

Hmm?

Linus