Re: [Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in ttm_transfered_destroy

From: Greg Kroah-Hartman
Date: Thu Nov 04 2021 - 04:44:53 EST


On Thu, Nov 04, 2021 at 08:39:18AM +0100, Christian König wrote:
> Am 03.11.21 um 22:25 schrieb Karol Herbst:
> > On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac@xxxxxx> wrote:
> > > On 2021-11-03 21:32 +0100, Karol Herbst wrote:
> > >
> > > > On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst@xxxxxxxxxx> wrote:
> > > > > On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac@xxxxxx> wrote:
> > > > > > On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote:
> > > > > >
> > > > > > > From: Christian König <christian.koenig@xxxxxxx>
> > > > > > >
> > > > > > > commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream.
> > > > > > >
> > > > > > > We need to cleanup the fences for ghost objects as well.
> > > > > > >
> > > > > > > Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> > > > > > > Reported-by: Erhard F. <erhard_f@xxxxxxxxxxx>
> > > > > > > Tested-by: Erhard F. <erhard_f@xxxxxxxxxxx>
> > > > > > > Reviewed-by: Huang Rui <ray.huang@xxxxxxx>
> > > > > > > Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214029&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806624439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UIo0hw0OHeLlGL%2Bcj%2Fjt%2FgTwniaJoNmhgDHSFvymhCc%3D&amp;reserved=0
> > > > > > > Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214447&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TIAUb6AdYm2Bo0%2BvFZUFPS8yu55orjnfxMLCmUgC%2FDk%3D&amp;reserved=0
> > > > > > > CC: <stable@xxxxxxxxxxxxxxx>
> > > > > > > Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20211020173211.2247-1-christian.koenig%40amd.com&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c9i7AR44MVUyZuXHZkLOCBx2%2BZeetq8alGtbz0Wgqzk%3D&amp;reserved=0
> > > > > > > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > > > > ---
> > > > > > > drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
> > > > > > > 1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc
> > > > > > > struct ttm_transfer_obj *fbo;
> > > > > > >
> > > > > > > fbo = container_of(bo, struct ttm_transfer_obj, base);
> > > > > > > + dma_resv_fini(&fbo->base.base._resv);
> > > > > > > ttm_bo_put(fbo->bo);
> > > > > > > kfree(fbo);
> > > > > > > }
> > > > > > Alas, this innocuous looking commit causes one of my systems to lock up
> > > > > > as soon as run startx. This happens with the nouveau driver, two other
> > > > > > systems with radeon and intel graphics are not affected. Also I only
> > > > > > noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I
> > > > > > do not use 5.4 anymore.
> > > > > >
> > > > > > I am not familiar with nouveau's ttm management and what has changed
> > > > > > there between 5.10 and 5.14, but maybe one of their developers can shed
> > > > > > a light on this.
> > > > > >
> > > > > > Cheers,
> > > > > > Sven
> > > > > >
> > > > > could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1
> > > > maybe not.. but I did remember there being a few tmm related patches
> > > > which only hurt nouveau :/ I guess one could do a git bisect to
> > > > figure out what change "fixes" it.
> > > Maybe, but since the memory leaks reported by Erhard only started to
> > > show up in 5.14 (if I read the bugzilla reports correctly), perhaps the
> > > patch should simply be reverted on earlier kernels?
> > >
> > Yeah, I think this is probably the right approach.
>
> I agree. The problem is this memory leak could potentially happen with 5.10
> as wel, just much much much less likely.
>
> But my guess is that 5.10 is so buggy that when the leak does NOT happen we
> double free and obviously causing a crash.
>
> So for the sake of stability please don't apply this patch to 5.10. I'm
> going to comment on the original bug report as well.

Now reverted from 5.10 and 5.4 kernels, thanks,

greg k-h