Re: [PATCH v2 3/4] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

From: Mel Gorman
Date: Tue Aug 01 2017 - 06:59:34 EST

On Tue, Aug 01, 2017 at 02:56:16PM +0900, Minchan Kim wrote:
> Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
> problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
> Quote from Mel Gorman
> "The race in question is CPU 0 running madv_free and updating some PTEs
> while CPU 1 is also running madv_free and looking at the same PTEs.
> CPU 1 may have writable TLB entries for a page but fail the pte_dirty
> check (because CPU 0 has updated it already) and potentially fail to flush.
> Hence, when madv_free on CPU 1 returns, there are still potentially writable
> TLB entries and the underlying PTE is still present so that a subsequent write
> does not necessarily propagate the dirty bit to the underlying PTE any more.
> Reclaim at some unknown time at the future may then see that the PTE is still
> clean and discard the page even though a write has happened in the meantime.
> I think this is possible but I could have missed some protection in madv_free
> that prevents it happening."
> This patch aims for solving both problems all at once and is ready for
> other problem with KSM, MADV_FREE and soft-dirty story[3].
> TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
> and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch
> there are parallel threads going on. In that case, forcefully, flush TLB
> to prevent for user to access memory via stale TLB entry although it fail
> to gather page table entry.
> I confiremd this patch works with [4] test program Nadav gave so this patch
> supersedes "mm: Always flush VMA ranges affected by zap_page_range v2"
> in current mmotm.
> This patch modifies arch-specific TLB gathering interface(x86, ia64,
> s390, sh, um). It seems most of architecture are straightforward but s390
> need to be careful because tlb_flush_mmu works only if mm->context.flush_mm
> is set to non-zero which happens only a pte entry really is cleared by
> ptep_get_and_clear and friends. However, this problem never changes the
> pte entries but need to flush to prevent memory access from stale tlb.
> Any thoughts?

Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

Mel Gorman