Re: [PATCH v3 01/11] x86/mm: Don't reenter flush_tlb_func_common()

From: Andy Lutomirski
Date: Wed Jun 21 2017 - 11:16:10 EST


On Wed, Jun 21, 2017 at 1:49 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Tue, Jun 20, 2017 at 10:22:07PM -0700, Andy Lutomirski wrote:
>> It was historically possible to have two concurrent TLB flushes
>> targetting the same CPU: one initiated locally and one initiated
>> remotely. This can now cause an OOPS in leave_mm() at
>> arch/x86/mm/tlb.c:47:
>>
>> if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
>> BUG();
>>
>> with this call trace:
>> flush_tlb_func_local arch/x86/mm/tlb.c:239 [inline]
>> flush_tlb_mm_range+0x26d/0x370 arch/x86/mm/tlb.c:317
>
> These line numbers would most likely mean nothing soon. I think you
> should rather explain why the bug can happen so that future lookers at
> that code can find the spot...
>

That's why I gave function names and the actual code :)

> I'm assuming this is going away in a future patch, as disabling IRQs
> around a TLB flush is kinda expensive. I guess I'll see if I continue
> reading...

No, it's still there. It's possible that it could be removed with
lots of care, but I'm not convinced it's worth it.
local_irq_disable() and local_irq_enable() are fast, though (3 cycles
each last time I benchmarked them?) -- it's local_irq_save() that
really hurts.

--Andy