Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets
From: Rob Clark
Date: Fri Aug 30 2024 - 16:19:44 EST
On Fri, Aug 30, 2024 at 12:09 PM Connor Abbott <cwabbott0@xxxxxxxxx> wrote:
>
> On Fri, Aug 30, 2024 at 8:00 PM Rob Clark <robdclark@xxxxxxxxx> wrote:
> >
> > On Fri, Aug 30, 2024 at 11:54 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote:
> > >
> > > On Fri, Aug 30, 2024 at 7:08 PM Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
> > > > <antomani103@xxxxxxxxx> wrote:
> > > > >
> > > > > This patch implements preemption feature for A6xx targets, this allows
> > > > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > > > hardware as such supports multiple levels of preemption granularities,
> > > > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > > > such as draw-call level or a bin boundary level preemption. This patch
> > > > > enables the basic preemption level, with more fine grained preemption
> > > > > support to follow.
> > > > >
> > > > > Signed-off-by: Sharat Masetty <smasetty@xxxxxxxxxxxxxx>
> > > > > Signed-off-by: Antonino Maniscalco <antomani103@xxxxxxxxx>
> > > > > Tested-by: Neil Armstrong <neil.armstrong@xxxxxxxxxx> # on SM8650-QRD
> > > > > ---
> > > > > drivers/gpu/drm/msm/Makefile | 1 +
> > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +++++++++++++++++++++-
> > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 ++++++++++++
> > > > > drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 ++++++++++++++++++++++++++++++
> > > > > drivers/gpu/drm/msm/msm_ringbuffer.h | 7 +
> > > > > 5 files changed, 921 insertions(+), 9 deletions(-)
> > > > >
> > > >
> > > > [snip]
> > > >
> > > > > +
> > > > > +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
> > > > > + struct msm_gpu_submitqueue *queue)
> > > > > +{
> > > > > + void *ptr;
> > > > > +
> > > > > + /*
> > > > > + * Create a per submitqueue buffer for the CP to save and restore user
> > > > > + * specific information such as the VPC streamout data.
> > > > > + */
> > > > > + ptr = msm_gem_kernel_new(gpu->dev, A6XX_PREEMPT_USER_RECORD_SIZE,
> > > > > + MSM_BO_WC, gpu->aspace, &queue->bo, &queue->bo_iova);
> > > >
> > > > Can this be MSM_BO_MAP_PRIV? Otherwise it is visible (and writeable)
> > > > by other proceess's userspace generated cmdstream.
> > > >
> > > > And a similar question for the scratch_bo.. I'd have to give some
> > > > thought to what sort of mischief could be had, but generall kernel
> > > > mappings that are not MAP_PRIV are a thing to be careful about.
> > > >
> > >
> > > It seems like the idea behind this is that it's supposed to be
> > > per-context. kgsl allocates it as part of the context, as part of the
> > > userspace address space, and then in order to know which user record
> > > to use when preempting, before each submit (although really it only
> > > needs to be done when setting the pagetable) it does a CP_MEM_WRITE of
> > > the user record address to a scratch buffer holding an array of the
> > > current user record for each ring. Then when preempting it reads the
> > > address for the next ring from the scratch buffer and sets it. I think
> > > we need to do that dance too.
> >
> > Moving it into userspace's address space (vm) would be better.
> >
> > I assume the preempt record is where state is saved/restored? So
> > would need to be in kernel aspace/vm? Or is the fw changing ttbr0
> > after saving state but before restoring?
> >
> > BR,
> > -R
>
> The preempt record is split into a number of pieces, each with their
> own address. One of those pieces is the SMMU record with ttbr0 and
> other SMMU things. Another piece is the "private" context record with
> sensitive things like RB address/rptr/wptr, although actually the bulk
> of the registers are saved here. Then the user or "non-private" record
> is its own piece, which is presumably saved before switching ttbr0 and
> restored after the SMMU record is restored and ttbr0 is switched.
>
Ok, and all these are offsets in the preempt record.. but that part is
allocated with MAP_PRIV, so that part should be ok.
Why is the VPC streamout state handled differently?
BR,
-R