Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition

From: Peter Zijlstra
Date: Fri Aug 24 2018 - 04:43:23 EST


On Wed, Aug 22, 2018 at 09:54:48PM -0700, Linus Torvalds wrote:

> It honored it for the *normal* case, which is why it took so long to
> notice that the TLB shootdown had been broken on x86 when it moved to
> the "generic" code. The *normal* case does this all right, and batches
> things up, and then when the batch fills up it does a
> tlb_table_flush() which does the TLB flush and schedules the actual
> freeing.
>
> But there were two cases that *didn't* do that. The special "I'm the
> only thread" fast case, and the "oops I ran out of memory, so now I'll
> fake it, and just synchronize with page twalkers manually, and then do
> that special direct remove without flushing the tlb".

The actual RCU batching case was also busted; there was no guarantee
that by the time we run the RCU callbacks the invalidate would've
happened. Exceedingly unlikely, but no guarantee.

So really, all 3 cases in tlb_remove_table() were busted in this
respect.