Re: [PATCH] drm/scheduler: Fix mem leak when last_scheduled signaled

From: Matthew Brost
Date: Tue Feb 25 2025 - 00:45:37 EST


On Mon, Feb 24, 2025 at 10:52:56AM +0100, Philipp Stanner wrote:
> Hello,
>
> subject line: please write "drm/sched" instead of "drm/scheduler". It
> has become the norm
>
> On Fri, 2025-02-21 at 14:27 +0800, qianyi liu wrote:
> > Problem: If prev(last_scheduled) was already signaled I encountred a
>
> prev(last_scheduled) almost reads like a function call. Maybe write
> "prev / last_scheduled"?
>
> > memory leak in drm_sched_entity_fini. This is because the
> > prev(last_scheduled) fence is not free properly.
>
> s/free/freed
>
> >
> > Fix: Balance the prev(last_scheduled) fence refcnt when
> > dma_fence_add_callback failed.
> >
> > Signed-off-by: qianyi liu <liuqianyi125@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/scheduler/sched_entity.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 69bcf0e99d57..1c0c14bcf726 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -259,9 +259,12 @@ static void drm_sched_entity_kill(struct
> > drm_sched_entity *entity)
> >   struct drm_sched_fence *s_fence = job->s_fence;
> >  
> >   dma_fence_get(&s_fence->finished);
> > - if (!prev || dma_fence_add_callback(prev, &job-
> > >finish_cb,
> > -   
> > drm_sched_entity_kill_jobs_cb))
> > + if (!prev ||
> > +     dma_fence_add_callback(prev, &job->finish_cb,
> > +   
> > drm_sched_entity_kill_jobs_cb)) {
> > + dma_fence_put(prev);
>
> But now the fence will also be put when prev == NULL. Is that

dma_fence_put(NULL) is a NOP [1].

[1] https://elixir.bootlin.com/linux/v6.13.4/source/include/linux/dma-fence.h#L290

> intentional? It doesn't seem correct to me from looking at the commit
> message, which states "Balance […] refcnt when dma_fence_add_callback
> failed"
>
> It didn't get clear to me immediately which dma_fence_get() your new
> dma_fence_put() balances. Can you ellaborate on that or maybe write a
> comment?


drm_sched_entity_kill_jobs_cb(prev, ...) - Calls put 'prev'

drm_sched_entity_kill_jobs_cb(NULL, ...) - Does not.

>
> But also be handy of could share the kmemleak trace.
>

Agree kmemleak trace would good, include in commit message, but the
patch looks correct to me.

I also think the commit message need a bit of work as Phillip suggests.

Matt

>
> Thanks
> P.
>
> >   drm_sched_entity_kill_jobs_cb(NULL, &job-
> > >finish_cb);
> > + }
> >  
> >   prev = &s_fence->finished;
> >   }
>