Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

From: Linus Torvalds
Date: Sat Sep 09 2017 - 14:47:41 EST


On Sat, Sep 9, 2017 at 11:29 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Sat, Sep 09, 2017 at 11:26:27AM -0700, Linus Torvalds wrote:
>> But the fact that that fixes it for you does indicate that it's not
>> just a stale TLB entry or something, it really is some CPU using page
>> tables after they have been free'd and been re-allocated to something
>> else (and *then* they may point to garbage).
>
> Cool, I was trying to think of a good use case how we'd hit that. I
> guess you just gave one. :)

The thing is, even with the delayed TLB flushing, I don't think it
should be *so* delayed that we should be seeing a TLB fill from
garbage page tables.

But the part in Andy's patch that worries me the most is that

+ cpumask_clear_cpu(cpu, mm_cpumask(mm));

in enter_lazy_tlb(). It means that we won't be notified by peopel
invalidating the page tables, and while we then do re-validate the TLB
when we switch back from lazy mode, I still worry. I'm not entirely
convinced by that tlb_gen logic.

I can't actually see anything *wrong* in the tlb_gen logic, but it worries me.

Linus