some likely bugs in IOMMUv2 (in tlb_finish_mmu() nested flush and mremap())

From: Jann Horn
Date: Fri Sep 23 2022 - 11:39:03 EST


Hi!

I looked through some of the code related to IOMMUv2 (the thing where
the IOMMU walks the normal userspace page tables and TLB shootdowns
are replicated to the IOMMU through
mmu_notifier_ops::invalidate_range).

I think there's a bug in the interaction between tlb_finish_mmu() and
mmu_notifier_ops::invalidate_range: In the mm_tlb_flush_nested() case,
__tlb_reset_range() sets tlb->start and tlb->end *both* to ~0.
Afterwards, tlb_finish_mmu() calls
tlb_flush_mmu()->tlb_flush_mmu_tlbonly()->mmu_notifier_invalidate_range(),
which will pass those tlb->start and tlb->end values to
mmu_notifier_ops::invalidate_range callbacks. But those callbacks
don't know about this special case and then basically only flush
virtual address ~0, making the flush useless. (However, pretty much
every place that calls tlb_finish_mmu() first calls
mmu_notifier_invalidate_range_end() even though the appropriate thing
would probably be mmu_notifier_invalidate_range_only_end(); and I
think those two things probably cancel each other out?)

Also, from what I can tell, the mremap() code, in move_page_tables(),
only invokes mmu_notifier_ops::invalidate_range via the
mmu_notifier_invalidate_range_end() at the very end, long after TLB
flushes must have happened - sort of like the bug we had years ago
where mremap() was flushing the normal TLBs too late
(https://bugs.chromium.org/p/project-zero/issues/detail?id=1695).