Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

From: Andrea Arcangeli
Date: Wed Aug 30 2017 - 13:28:07 EST


On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote:
> Therefore, IIUC, try_to_umap_one() should only call
> mmu_notifier_invalidate_range() after ptep_get_and_clear() and

That would trigger an unnecessarily double call to
->invalidate_range() both from mmu_notifier_invalidate_range() after
ptep_get_and_clear() and later at the end in
mmu_notifier_invalidate_range_end().

The only advantage of adding a mmu_notifier_invalidate_range() after
ptep_get_and_clear() is to flush the secondary MMU TLBs (keep in mind
the pagetables have to be shared with the primary MMU in order to use
the ->invalidate_range()) inside the PT lock.

So if mmu_notifier_invalidate_range() after ptep_get_and_clear() is
needed or not, again boils down to the issue if the old code calling
->invalidate_page outside the PT lock was always broken before or
not. I don't see why exactly it was broken, we even hold the page lock
there so I don't see a migration race possible either. Of course the
constraint to be safe is that the put_page in try_to_unmap_one cannot
be the last one, and that had to be enforced by the caller taking an
additional reference on it.

One can wonder if the primary MMU TLB flush in ptep_clear_flush
(should_defer_flush returning false) could be put after releasing the
PT lock too (because that's not different from deferring the secondary
MMU notifier TLB flush in ->invalidate_range down to
mmu_notifier_invalidate_range_end) even if TTU_BATCH_FLUSH isn't set,
which may be safe too for the same reasons.

When should_defer_flush returns true we already defer the primary MMU
TLB flush to much later to even mmu_notifier_invalidate_range_end, not
just after the PT lock release so at least when should_defer_flush is
true, it looks obviously safe to defer the secondary MMU TLB flush to
mmu_notifier_invalidate_range_end for the drivers implementing
->invalidate_range.

If I'm wrong and all TLB flushes must happen inside the PT lock, then
we should at least reconsider the implicit call to ->invalidate_range
method from mmu_notifier_invalidate_range_end or we would call it
twice unnecessarily which doesn't look optimal. Either ways this
doesn't look optimal. We would need to introduce a
mmu_notifier_invalidate_range_end_full that calls also
->invalidate_range in such case so we skip the ->invalidate_range call
in mmu_notifier_invalidate_range_end if we put an explicit
mmu_notifier_invalidate_range() after ptep_get_and_clear inside the PT
lock like you suggested above.

Thanks,
Andrea