Re: [PATCH v1 2/2] drm: Clear the fence pointer when writeback job signaled

From: Daniel Vetter
Date: Fri Aug 02 2019 - 05:50:35 EST


On Fri, Aug 2, 2019 at 11:43 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Fri, Aug 2, 2019 at 11:29 AM Brian Starkey <Brian.Starkey@xxxxxxx> wrote:
> >
> > Hi Lowry,
> >
> > On Thu, Aug 01, 2019 at 06:34:08AM +0000, Lowry Li (Arm Technology China) wrote:
> > > Hi Brian,
> > >
> > > On Wed, Jul 31, 2019 at 09:20:04PM +0800, Brian Starkey wrote:
> > > > Hi Lowry,
> > > >
> > > > Thanks for this cleanup.
> > > >
> > > > On Wed, Jul 31, 2019 at 11:04:45AM +0000, Lowry Li (Arm Technology China) wrote:
> > > > > During it signals the completion of a writeback job, after releasing
> > > > > the out_fence, we'd clear the pointer.
> > > > >
> > > > > Check if fence left over in drm_writeback_cleanup_job(), release it.
> > > > >
> > > > > Signed-off-by: Lowry Li (Arm Technology China) <lowry.li@xxxxxxx>
> > > > > ---
> > > > > drivers/gpu/drm/drm_writeback.c | 23 +++++++++++++++--------
> > > > > 1 file changed, 15 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c
> > > > > index ff138b6..43d9e3b 100644
> > > > > --- a/drivers/gpu/drm/drm_writeback.c
> > > > > +++ b/drivers/gpu/drm/drm_writeback.c
> > > > > @@ -324,6 +324,9 @@ void drm_writeback_cleanup_job(struct drm_writeback_job *job)
> > > > > if (job->fb)
> > > > > drm_framebuffer_put(job->fb);
> > > > >
> > > > > + if (job->out_fence)
> > > >
> > > > I'm thinking it might be a good idea to signal the fence with an error
> > > > here, if it's not already signaled. Otherwise, if there's someone
> > > > waiting (which there shouldn't be), they're going to be waiting a very
> > > > long time :-)
> > > >
> > > > Thanks,
> > > > -Brian
> > > >
> > > Here it happened at atomic_check failed and test only commit. For both
> > > cases, the commit has been dropped and it's only a clean up. So here better
> > > not be treated as an error case:)
> >
> > If anyone else has a reference on the fence, then IMO it absolutely is
> > an error to reach this point without the fence being signaled -
> > because it means that the fence will never be signaled.
> >
> > I don't think the API gives you a way to check if this is the last
> > reference, so it's safest to just make sure the fence is signalled
> > before dropping the reference.
> >
> > It just feels wrong to me to have the possibility of a dangling fence
> > which is never going to get signalled; and it's an easy defensive step
> > to make sure it can never happen.
> >
> > I know it _shouldn't_ happen, but we often put in handling for cases
> > which shouldn't happen, because they frequently do happen :-)
>
> We're not as paranoid with the vblank fences either, so not sure why
> we need to be this paranoid with writeback fences. If your driver
> grabs anything from the atomic state in ->atomic_check it's buggy
> anyway.
>
> If you want to fix this properly I think we need to move the call to
> prepare_signalling() in between atomic_check and atomic_commit. Then I
> think it makes sense to also force-complete the fence on error ...
>
> > > Since for userspace, it should have been failed or a test only case, so
> > > writebace fence should not be signaled.
> >
> > It's not only userspace that can wait on fences (and in fact this
> > fence will never even reach userspace if the commit fails), the driver
> > may have taken a copy to use for "something".

I forgot to add: you can check this by looking at the fence reference
count. A WARN_ON if that's more than 1 on cleanup (but also for the
out fences) could be a nice addition.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch