Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

From: Jerome Glisse
Date: Wed Jan 13 2021 - 21:37:35 EST


On Wed, Jan 13, 2021 at 07:39:36PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 13, 2021 at 04:56:38PM -0500, Jerome Glisse wrote:
>
> > is a broken model and the way GPU use GUP is less broken then RDMA. In
> > GPU driver GUP contract with userspace is that the data the GPU can
> > access is a snapshot of what the process memory was at the time you
> > asked for the GUP. Process can start using different pages right after.
> > There is no constant coherency contract (ie CPU and GPU can be working
> > on different pages).
>
> Look at the habana labs "totally not a GPU" driver, it doesn't work
> that way, GPU compute operations do want coherency.
>
> The mmu notifier hackery some of the other GPU drivers use to get
> coherency requires putting the kernel between every single work
> submission, and has all kinds of wonky issues and limitations - I
> think it is net worse approach than GUP, honestly.

Yes what GPU driver do today with GUP is wrong but it is only
use for texture upload/download. So that is a very limited
scope (amdkfd being an exception here).

Yes also to the fact that waiting on GPU fence from mmu notifier
callback is bad. We are thinking on how to solve this.

But what do matter is that hardware is moving in right direction
and we will no longer need GUP. So GUP is dying out in GPU
driver.

Cheers,
Jérôme