Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork

From: Peter Xu
Date: Tue Nov 03 2020 - 12:03:37 EST


On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > > +++ b/mm/memory.c
> > > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > > > mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > > > 0, src_vma, src_mm, addr, end);
> > > > mmu_notifier_invalidate_range_start(&range);
> > > > + /*
> > > > + * The read side doesn't spin, it goes to the mmap_lock, so the
> > > > + * raw version is used to avoid disabling preemption here
> > > > + */
> > > > + mmap_assert_write_locked(src_mm);
> > > > + raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> > >
> > > Would raw_write_seqcount_begin() be better here?
> >
> > Hum..
> >
> > I felt no because it had the preempt stuff added into it, however it
> > would work - __seqcount_lock_preemptible() == false for the seqcount_t
> > case (see below)
> >
> > Looking more closely, maybe the right API to pick is
> > write_seqcount_t_begin() and write_seqcount_t_end() ??
> >
>
> No, that's not the right API: it is also internal to seqlock.h.
>
> Please stick with the official exported API: raw_write_seqcount_begin().
>
> It should satisfy your needs, and the raw_*() variant is created exactly
> for contexts wishing to avoid the lockdep checks (e.g. NMI handlers
> cannot invoke lockdep, etc.)

Ahmed, Jason, feel free to correct me - but I feel like what Jason wanted here
is indeed the version that does not require disabling of preemption, a.k.a.,
write_seqcount_t_begin() and write_seqcount_t_end(), since it's preempt-safe if
the read side does not retry. Not sure whether there's no "*_t_*" version of it.

Another idea is that maybe we can use the raw_write_seqcount_begin() version,
instead of in copy_page_range() but move it to copy_pte_range(). That would
not affect normal Linux on preemption I think, since when reach pte level we
should have disabled preemption already after all (by taking the pgtable spin
lock). But again there could be extra overhead since we'll need to take the
write seqcount very often (rather than once per fork(), so maybe there's some
perf influence), also that means it'll be an extra/real disable_preempt() for
the future RT code if it'll land some day (since again rt_spin_lock should not
need to disable preemption, iiuc, which is used here). So seems it's still
better to do in copy_page_range() as Jason proposed.

Thanks,

--
Peter Xu