Re: [PATCH] drm/msm: Fix fence rollover issue

From: Rob Clark
Date: Thu Jun 16 2022 - 10:04:15 EST


On Thu, Jun 16, 2022 at 1:27 AM Dmitry Baryshkov
<dmitry.baryshkov@xxxxxxxxxx> wrote:
>
> On 15/06/2022 19:24, Rob Clark wrote:
> > From: Rob Clark <robdclark@xxxxxxxxxxxx>
> >
> > And while we are at it, let's start the fence counter close to the
> > rollover point so that if issues slip in, they are more obvious.
> >
> > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
>
> Should it also have
>
> Fixes: fde5de6cb461 ("drm/msm: move fence code to it's own file")
>
> Or maybe
>
> Fixes: 5f3aee4ceb5b ("drm/msm: Handle fence rollover")

arguably it fixes the first commit that added GPU support (and
finishes up a couple spots that the above commit missed)

I guess I could use the fixes tag just to indicate how far back it
would be reasonable to backport to stable branches.

> Otherwise:
>
> Reviewed: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>
>
>
> > ---
> > drivers/gpu/drm/msm/msm_fence.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index 3df255402a33..a35a6746c7cd 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -28,6 +28,14 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
> > fctx->fenceptr = fenceptr;
> > spin_lock_init(&fctx->spinlock);
> >
> > + /*
> > + * Start out close to the 32b fence rollover point, so we can
> > + * catch bugs with fence comparisons.
> > + */
> > + fctx->last_fence = 0xffffff00;
> > + fctx->completed_fence = fctx->last_fence;
> > + *fctx->fenceptr = fctx->last_fence;
>
> This looks like a debugging hack. But probably it's fine to have it, as
> it wouldn't cause any side effects.

I was originally going to add a modparam or kconfig to enable this..
but then thought, if there is a bug and thing are to go wrong, it's
best for that to happen ASAP rather than after 200-400 days of
uptime.. the latter case can be rather hard to reproduce bugs ;-)

IIRC the kernel does something similar with jiffies to ensure the
rollover point is hit quickly

BR,
-R

> > +
> > return fctx;
> > }
> >
> > @@ -46,11 +54,12 @@ bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence)
> > (int32_t)(*fctx->fenceptr - fence) >= 0;
> > }
> >
> > -/* called from workqueue */
> > +/* called from irq handler */
> > void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> > {
> > spin_lock(&fctx->spinlock);
> > - fctx->completed_fence = max(fence, fctx->completed_fence);
> > + if (fence_after(fence, fctx->completed_fence))
> > + fctx->completed_fence = fence;
> > spin_unlock(&fctx->spinlock);
> > }
> >
>
>
> --
> With best wishes
> Dmitry