Re: [Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in ttm_transfered_destroy

From: Karol Herbst
Date: Wed Nov 03 2021 - 17:26:15 EST


On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac@xxxxxx> wrote:
>
> On 2021-11-03 21:32 +0100, Karol Herbst wrote:
>
> > On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst@xxxxxxxxxx> wrote:
> >>
> >> On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac@xxxxxx> wrote:
> >> >
> >> > On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote:
> >> >
> >> > > From: Christian König <christian.koenig@xxxxxxx>
> >> > >
> >> > > commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream.
> >> > >
> >> > > We need to cleanup the fences for ghost objects as well.
> >> > >
> >> > > Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> >> > > Reported-by: Erhard F. <erhard_f@xxxxxxxxxxx>
> >> > > Tested-by: Erhard F. <erhard_f@xxxxxxxxxxx>
> >> > > Reviewed-by: Huang Rui <ray.huang@xxxxxxx>
> >> > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214029
> >> > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214447
> >> > > CC: <stable@xxxxxxxxxxxxxxx>
> >> > > Link: https://patchwork.freedesktop.org/patch/msgid/20211020173211.2247-1-christian.koenig@xxxxxxx
> >> > > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> >> > > ---
> >> > > drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
> >> > > 1 file changed, 1 insertion(+)
> >> > >
> >> > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> >> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> >> > > @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc
> >> > > struct ttm_transfer_obj *fbo;
> >> > >
> >> > > fbo = container_of(bo, struct ttm_transfer_obj, base);
> >> > > + dma_resv_fini(&fbo->base.base._resv);
> >> > > ttm_bo_put(fbo->bo);
> >> > > kfree(fbo);
> >> > > }
> >> >
> >> > Alas, this innocuous looking commit causes one of my systems to lock up
> >> > as soon as run startx. This happens with the nouveau driver, two other
> >> > systems with radeon and intel graphics are not affected. Also I only
> >> > noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I
> >> > do not use 5.4 anymore.
> >> >
> >> > I am not familiar with nouveau's ttm management and what has changed
> >> > there between 5.10 and 5.14, but maybe one of their developers can shed
> >> > a light on this.
> >> >
> >> > Cheers,
> >> > Sven
> >> >
> >>
> >> could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1
> >
> > maybe not.. but I did remember there being a few tmm related patches
> > which only hurt nouveau :/ I guess one could do a git bisect to
> > figure out what change "fixes" it.
>
> Maybe, but since the memory leaks reported by Erhard only started to
> show up in 5.14 (if I read the bugzilla reports correctly), perhaps the
> patch should simply be reverted on earlier kernels?
>

Yeah, I think this is probably the right approach.

> > On which GPU do you see this problem?
>
> On an old GeForce 8500 GT, the whole PC is rather ancient.
>
> Cheers,
> Sven
>