Re: [PATCH] mm/khugepaged: Fix ->anon_vma race

From: Jann Horn
Date: Mon Jan 16 2023 - 07:56:32 EST


+cc mmu-gather maintainers

On Mon, Jan 16, 2023 at 1:34 PM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
> On Mon, Jan 16, 2023 at 01:06:59PM +0100, Jann Horn wrote:
> > On Sun, Jan 15, 2023 at 8:07 PM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
> > > BTW, I've noticied that you recently added tlb_remove_table_sync_one().
> > > I'm not sure why it is needed. Why IPI in pmdp_collapse_flush() in not
> > > good enough to serialize against GUP fast?
> >
> > If that sent an IPI, it would be good enough; but
> > pmdp_collapse_flush() is not guaranteed to send an IPI.
> > It does a TLB flush, but on some architectures (including arm64 and
> > also virtualized x86), a remote TLB flush can be done without an IPI.
> > For example, arm64 has some fancy hardware support for remote TLB
> > invalidation without IPIs ("broadcast TLB invalidation"), and
> > virtualized x86 has (depending on the hypervisor) things like TLB
> > shootdown hypercalls (under Hyper-V, see hyperv_flush_tlb_multi) or
> > TLB shootdown signalling for preempted CPUs through shared memory
> > (under KVM, see kvm_flush_tlb_multi).
>
> I think such architectures must provide proper pmdp_collapse_flush()
> with the required serialization.

FWIW, the IPI that I added is not unconditional;
tlb_remove_table_sync_one() is a no-op depending on
CONFIG_MMU_GATHER_RCU_TABLE_FREE, which an architecture can use to
signal that it uses "Semi RCU freeing of the page directories". The
kernel has arch-independent support for these semantics in the normal
TLB flushing code. But yeah, I guess you could move the
tlb_remove_table_sync_one() calls into pmdp_collapse_flush()
(including the generic version)? I'm CC-ing the mmu-gather maintainers
in case they have an opinion.

Anyway, I'm not going to do that refactor; feel free to do that if you want.

> Power and S390 already do that.

What's the call graph from pmdp_collapse_flush() to IPI on powerpc and s390?