Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned

From: Peter Xu
Date: Tue Sep 22 2020 - 13:54:26 EST


On Tue, Sep 22, 2020 at 01:10:46PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 11:17:36AM -0400, Peter Xu wrote:
>
> > > But it's admittedly a cosmetic point, combined with my perennial fear that
> > > I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
> >
> > Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
> > because I think they're cheaper than atomic operations, (which will, iiuc, lock
> > the bus).
>
> It is worth thinking a bit about racing fork with
> pin_user_pages(). The desired outcome is:
>
> If fork wins the page is write protected, and pin_user_pages_fast()
> will COW it.
>
> If pin_user_pages_fast() wins then fork must see the READ_ONCE and
> the pin.
>
> As get_user_pages_fast() is lockless it looks like the ordering has to
> be like this:
>
> pin_user_pages_fast() fork()
> atomic_set(has_pinned, 1);
> [..]
> atomic_add(page->_refcount)
> ordered check write protect()
> ordered set write protect()
> atomic_read(page->_refcount)
> atomic_read(has_pinned)
>
> Such that in all the degenerate racy cases the outcome is that both
> sides COW, never neither.
>
> Thus I think it does have to be atomics purely from an ordering
> perspective, observing an increased _refcount requires that has_pinned
> != 0 if we are pinning.
>
> So, to make this 100% this ordering will need to be touched up too.

Thanks for spotting this. So something like below should work, right?

diff --git a/mm/memory.c b/mm/memory.c
index 8f3521be80ca..6591f3f33299 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -888,8 +888,8 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* Because we'll need to release the locks before doing cow,
* pass this work to upper layer.
*/
- if (READ_ONCE(src_mm->has_pinned) && wp &&
- page_maybe_dma_pinned(page)) {
+ if (wp && page_maybe_dma_pinned(page) &&
+ READ_ONCE(src_mm->has_pinned)) {
/* We've got the page already; we're safe */
data->cow_old_page = page;
data->cow_oldpte = *src_pte;

I can also add some more comment to emphasize this.

I think the WRITE_ONCE/READ_ONCE can actually be kept, because atomic ops
should contain proper memory barriers already so the memory access orders
should be guaranteed (e.g., atomic_add() will have an implicit wmb(); rmb() for
the other side). However maybe it's even simpler to change has_pinned into
atomic as John suggested. Thanks,

--
Peter Xu