Re: [git pull] drm fixes for 5.14-rc4

From: Linus Torvalds
Date: Thu Aug 05 2021 - 14:14:44 EST


This might possibly have been fixed already by the previous drm pull,
but I wanted to report it anyway, just in case.

It happened after an uptime of over a week, so it might not be trivial
to reproduce.

It's a NULL pointer dereference in dc_stream_retain() with the code being

lock xadd %eax,0x390(%rdi) <-- trapping instruction

and that's just the

kref_get(&stream->refcount);

with a NULL 'stream' argument.

Call Trace:
dc_resource_state_copy_construct+0x13f/0x190 [amdgpu]
amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu]
commit_tail+0x97/0x180 [drm_kms_helper]
process_one_work+0x1df/0x3a0

the oops is followed by a stream of

[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1]
hw_done or flip_done timed out

and the machine was not usable afterwards.

lspci says this is a

49:00.0 VGA compatible controller [0300]:
Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere
[Radeon RX 470/480/570/570X/580/580X/590]
[1002:67df] (rev e7) (prog-if 00 [VGA controller])

Full oops in the attachment, but I think the above is all the really
salient details.

Linus

Attachment: amd-gpu-ooops
Description: Binary data