Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

From: Greg Kroah-Hartman
Date: Wed Mar 09 2016 - 11:39:15 EST


On Wed, Mar 09, 2016 at 11:31:54AM -0500, Nicolai Hähnle wrote:
> On 09.03.2016 08:56, Luis Henriques wrote:
> >On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
> >>On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian König wrote:
> >>>Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
> >>>>On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
> >>>>>The following patch to radeon_sa_bo_new that
> >>>>>went into 3.10.99
> >>>>>
> >>>>> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
> >>>>> Author: Nicolai Hähnle <nicolai.haehnle@xxxxxxx>
> >>>>> Date: Fri Feb 5 14:35:53 2016 -0500
> >>>>> drm/radeon: hold reference to fences in radeon_sa_bo_new
> >>>>> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
> >>>>>
> >>>>>is triggering an Oops for me right when xscreensaver
> >>>>>first began doing 3D stuff. After reverting this
> >>>>>patch, xscreensaver has been happily running 3D stuff.
> >>>>>
> >>>>>Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >>>>>Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
> >>>>>Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
> >>>>>
> >>>>>Mar 6 18:00:43 sage kernel: Stack:
> >>>>>Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
> >>>>>Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
> >>>>>Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
> >>>>>Mar 6 18:00:43 sage kernel: Call Trace:
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
> >>>>>
> >>>>>$ lspci | grep VGA
> >>>>>03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>>>>[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
> >>>>Next time, please cc: the people responsible for that patch as well...
> >>>>
> >>>>I can revert it, but maybe something else is going on here? Do you have
> >>>>this same problem on 3.14, and 4.5-rc7?
> >>>
> >>>Hi Greg,
> >>>
> >>>yes that's an already known issue. Feel free to revert that one for now.
> >>>
> >>>I got it on my TODO list to provide a fixed patch for older kernel, but that
> >>>can take a while.
> >>>
> >>>For the background Nicolais patch is correct, but assumes that
> >>>radeon_fence_unref() can safely take NULL as the fence which is not the case
> >>>for older kernels.
>
> Actually, the call to radeon_fence_ref() is the culprit.
>
> >>
> >>Ok, thanks, now reverted.
> >>
> >
> >And looks like a few more kernels may be affected as well. I'll
> >revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
> >the CC list.
>
> Kernels that contain commit 954605ca "drm/radeon: use common fence
> implementation for fences, v4" are safe, older kernels require a
> NULL-pointer check around the call to radeon_fence_ref.
>
> This means kernels 3.17 and older are affected and need the additional NULL
> pointer check that I've sent out already on a different thread (I'm
> attaching it again, hoping that Erik gets a chance to test it).
>
> It would be nice to get a confirmation that this really does fix the
> observed bug, then I can prepare a fixed version of the patch for 3.17 and
> older (i.e. squash the original bad commit with the attached patch).

Don't "squash" anything together, just send the needed patches
backported, we want to keep things to match Linus's tree as much as
possible.

thanks,

greg k-h