Re: [PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

From: Boris Brezillon
Date: Thu Mar 09 2023 - 04:49:19 EST


On Thu, 9 Mar 2023 10:12:43 +0100
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> Hi Danilo,
>
> On Fri, 17 Feb 2023 14:44:06 +0100
> Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
>
> > Changes in V2:
> > ==============
> > Nouveau:
> > - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> > signalling critical sections. Updates to the VA space are split up in three
> > separate stages, where only the 2. stage executes in a fence signalling
> > critical section:
> >
> > 1. update the VA space, allocate new structures and page tables
>
> Sorry for the silly question, but I didn't find where the page tables
> pre-allocation happens. Mind pointing it to me? It's also unclear when
> this step happens. Is this at bind-job submission time, when the job is
> not necessarily ready to run, potentially waiting for other deps to be
> signaled. Or is it done when all deps are met, as an extra step before
> jumping to step 2. If that's the former, then I don't see how the VA
> space update can happen, since the bind-job might depend on other
> bind-jobs modifying the same portion of the VA space (unbind ops might
> lead to intermediate page table levels disappearing while we were
> waiting for deps). If it's the latter, I wonder why this is not
> considered as an allocation in the fence signaling path (for the
> bind-job out-fence to be signaled, you need these allocations to
> succeed, unless failing to allocate page-tables is considered like a HW
> misbehavior and the fence is signaled with an error in that case).

Ok, so I just noticed you only have one bind queue per drm_file
(cli->sched_entity), and jobs are executed in-order on a given queue,
so I guess that allows you to modify the VA space at submit time
without risking any modifications to the VA space coming from other
bind-queues targeting the same VM. And, if I'm correct, synchronous
bind/unbind ops take the same path, so no risk for those to modify the
VA space either (just wonder if it's a good thing to have to sync
bind/unbind operations waiting on async ones, but that's a different
topic).

>
> Note that I'm not familiar at all with Nouveau or TTM, and it might
> be something that's solved by another component, or I'm just
> misunderstanding how the whole thing is supposed to work. This being
> said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
> the gpuva_manager infra you're proposing here, so please bare with me
> :-).
>
> > 2. (un-)map the requested memory bindings
> > 3. free structures and page tables
> >
> > - Separated generic job scheduler code from specific job implementations.
> > - Separated the EXEC and VM_BIND implementation of the UAPI.
> > - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> > (un-)map operations can be executed in fence signalling critical sections.
> >
>
> Regards,
>
> Boris
>