Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

From: Andy Lutomirski
Date: Fri Sep 08 2017 - 19:08:21 EST


[Linus, I added you to get your opinion on whether the last bit here
is a problem.]

On Fri, Sep 8, 2017 at 2:56 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Fri, Sep 08, 2017 at 02:47:00PM -0700, Andy Lutomirski wrote:
>> Any chance you could test with CONFIG_DEBUG_VM=y? There are lots of
>> potentially useful assertions in that code.
>>
>> Can you also post your /proc/cpuinfo? And can you re-confirm that a
>> problematic guest kernel is causing problems in the *host*?
>
> Also, have you seen any MCEs during early boot, after the freezes?
>
> You probably wouldn't have because we don't log them on F10h due to
> broken BIOSen. So add "mce=bootlog" to your grub and warm-reset your box
> after one of those freezes and send me dmesg. It should have an MCE in
> there, if it happens what I think it happens.
>

Here's my theory as to what's happening.

Before my patch, flush_tlb_mm_range() guaranteed that the range would
be flushed on all CPUs prior to returning. With the patch, it only
promises that it will be flushed on all CPUs prior to anyone trying to
access it on the CPU in question. This has two consequences:

1. A kernel thread that accidentally reads or writes a user address
could hit a stale TLB entry. This seems harmless in the sense that
this can only happen if we already have a bug.

2. The CPU itself could see the TLB entry and do nefarious
architecturally invisible things with it.

I bet that #2 dramatically increases the chance that we hit erratum 383.

I can imagine a case where we have a problem even in the absence of an
erratum. Specifically, suppose we have some page mapped. CPU A
writes to it using combining (it's mapped WC or an explicit streaming
write is done). CPU B removes the TLB entry and does
flush_tlb_mm_range(). CPU B would expect that all writes to the page
are done, but CPU A's write is still sitting in the streaming buffers.

I *think* this is impossible because CPU A's mm_cpumask manipulations
are atomic and should therefore force out the streaming write buffers,
but maybe there's some other scenario where this matters.

--Andy