Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

From: Nicolai HÃhnle
Date: Wed Mar 09 2016 - 14:06:01 EST


On 09.03.2016 08:56, Luis Henriques wrote:
On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian König wrote:
Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
The following patch to radeon_sa_bo_new that
went into 3.10.99

commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
Author: Nicolai Hähnle <nicolai.haehnle@xxxxxxx>
Date: Fri Feb 5 14:35:53 2016 -0500
drm/radeon: hold reference to fences in radeon_sa_bo_new
commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.

is triggering an Oops for me right when xscreensaver
first began doing 3D stuff. After reverting this
patch, xscreensaver has been happily running 3D stuff.

Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP

Mar 6 18:00:43 sage kernel: Stack:
Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
Mar 6 18:00:43 sage kernel: Call Trace:
Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6

$ lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
Next time, please cc: the people responsible for that patch as well...

I can revert it, but maybe something else is going on here? Do you have
this same problem on 3.14, and 4.5-rc7?

Hi Greg,

yes that's an already known issue. Feel free to revert that one for now.

I got it on my TODO list to provide a fixed patch for older kernel, but that
can take a while.

For the background Nicolais patch is correct, but assumes that
radeon_fence_unref() can safely take NULL as the fence which is not the case
for older kernels.

Actually, the call to radeon_fence_ref() is the culprit.


Ok, thanks, now reverted.


And looks like a few more kernels may be affected as well. I'll
revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
the CC list.

Kernels that contain commit 954605ca "drm/radeon: use common fence implementation for fences, v4" are safe, older kernels require a NULL-pointer check around the call to radeon_fence_ref.

This means kernels 3.17 and older are affected and need the additional NULL pointer check that I've sent out already on a different thread (I'm attaching it again, hoping that Erik gets a chance to test it).

It would be nice to get a confirmation that this really does fix the observed bug, then I can prepare a fixed version of the patch for 3.17 and older (i.e. squash the original bad commit with the attached patch).

Cheers,
Nicolai


Cheers,
--
Luís

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html