Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
From: Peter Xu
Date:  Wed Sep 16 2020 - 14:46:56 EST
On Wed, Sep 16, 2020 at 02:48:04PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 06:50:46PM -0700, John Hubbard wrote:
> > > 
> > > It seems very strange that a physical page exclusively owned by a
> > > process can become copied if pin_user_pages() is active and the
> > > process did fork() at some point.
> > > 
> > > Could the new pin_user_pages() logic help here? eg the
> > > GUP_PIN_COUNTING_BIAS stuff?
> > > 
> > > Could the COW code consider a refcount of GUP_PIN_COUNTING_BIAS + 1 as
> > > being owned by the current mm and not needing COW? The DMA pin would
> > > be 'invisible' for COW purposes?
> > 
> > 
> > Please do be careful to use the API, rather than the implementation. The
> > FOLL_PIN refcounting system results in being able to get a "maybe
> > DMA-pinned", or a "definitely not DMA-pinned", via this API call:
> 
> So, what I'm thinking is something like (untested):
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 469af373ae76e1..77f63183667e52 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2889,6 +2889,26 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf)
>  	return ret;
>  }
>  
> +static bool cow_needed(struct vm_fault *vmf)
> +{
> +	int total_map_swapcount;
> +
> +	if (!reuse_swap_page(vmf->page, &total_map_swapcount)) {
> +		unlock_page(vmf->page);
> +		return true;
> +	}
> +
> +	if (total_map_swapcount == 1) {
> +		/*
> +		 * The page is all ours. Move it to our anon_vma so the rmap
> +		 * code will not search our parent or siblings.  Protected
> +		 * against the rmap code by the page lock.
> +		 */
> +		page_move_anon_rmap(vmf->page, vmf->vma);
> +	}
> +	return false;
> +}
> +
>  /*
>   * This routine handles present pages, when users try to write
>   * to a shared page. It is done by copying the page to a new address
> @@ -2947,8 +2967,21 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>  		if (!trylock_page(page))
>  			goto copy;
>  		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
> +			bool cow = true;
> +
> +			/*
> +			 * If the page is DMA pinned we can't rely on the
> +			 * above to know if there are other CPU references as
> +			 * page_count() will be elevated by the
> +			 * pin. Needlessly copying the page will cause the DMA
> +			 * pin to break, try harder to avoid that.
> +			 */
> +			if (page_maybe_dma_pinned(page))
> +				cow = cow_needed(vmf);
> +
>  			unlock_page(page);
> -			goto copy;
> +			if (cow)
> +				goto copy;
>  		}
>  		/*
>  		 * Ok, we've got the only map reference, and the only
> 
> What do you think Peter? Is this remotely close?
My understanding is this may only work for the case when the fork()ed child
quitted before we reach here (so we still have mapcount==1 for the page).  What
if not?  Then mapcount will be greater than 1, and cow will still trigger.  Is
that what we want?
Another problem is that, aiui, one of the major change previous patch proposed
is to avoid using lock_page() so that we never block in this path.  However if
we need the page reuse logic to guarantee pinning pages, then it becomes a
correctness issue, so I'm afraid we'll start to make all things complicated
again. Maybe even more complicated, because "correctness" should be even harder
than "best effort reuse" since it can cause data corruption if we didn't do it
right...
-- 
Peter Xu