Re: [PATCH] drm/amd/display: Clear dm_state for fast updates

From: Christian KÃnig
Date: Mon Jul 27 2020 - 15:28:30 EST

Am 27.07.20 um 16:05 schrieb Kazlauskas, Nicholas:
On 2020-07-27 9:39 a.m., Christian KÃnig wrote:
Am 27.07.20 um 07:40 schrieb Mazin Rezk:
This patch fixes a race condition that causes a use-after-free during
amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits
are requested and the second one finishes before the first. Essentially,
this bug occurs when the following sequence of events happens:

1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is
deferred to the workqueue.

2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is
deferred to the workqueue.

3. Commit #2 starts before commit #1, dm_state #1 is used in the
commit_tail and commit #2 completes, freeing dm_state #1.

4. Commit #1 starts after commit #2 completes, uses the freed dm_state
1 and dereferences a freelist pointer while setting the context.

Well I only have a one mile high view on this, but why don't you let the work items execute in order?

That would be better anyway cause this way we don't trigger a cache line ping pong between CPUs.


We use the DRM helpers for managing drm_atomic_commit_state and those helpers internally push non-blocking commit work into the system unbound work queue.

Mhm, well if you send those helper atomic commits in the order A,B and they execute it in the order B,A I would call that a bug :)

While we could duplicate a copy of that code with nothing but the workqueue changed that isn't something I'd really like to maintain going forward.

I'm not talking about duplicating the code, I'm talking about fixing the helpers. I don't know that code well, but from the outside it sounds like a bug there.

And executing work items in the order they are submitted is trivial.

Had anybody pinged Daniel or other people familiar with the helper code about it?


Nicholas Kazlauskas

Since this bug has only been spotted with fast commits, this patch fixes
the bug by clearing the dm_state instead of using the old dc_state for
fast updates. In addition, since dm_state is only used for its dc_state
and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found,
removing the dm_state should not have any consequences in fast updates.

This use-after-free bug has existed for a while now, but only caused a
noticeable issue starting from 5.7-rc1 due to 3202fa62f ("slub: relocate
freelist pointer to middle of object") moving the freelist pointer from
dm_state->base (which was unused) to dm_state->context (which is

Fixes: bd200d190f45 ("drm/amd/display: Don't replace the dc_state for fast updates")
Reported-by: Duncan <1i5t5.duncan@xxxxxxx>
Signed-off-by: Mazin Rezk <mnrzk@xxxxxxxxxxxxxx>
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++++++++++++++-----
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 86ffa0c2880f..710edc70e37e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8717,20 +8717,38 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
ÂÂÂÂÂÂÂÂÂÂ * the same resource. If we have a new DC context as part of
ÂÂÂÂÂÂÂÂÂÂ * the DM atomic state from validation we need to free it and
ÂÂÂÂÂÂÂÂÂÂ * retain the existing one instead.
+ÂÂÂÂÂÂÂÂ * Furthermore, since the DM atomic state only contains the DC
+ÂÂÂÂÂÂÂÂ * context and can safely be annulled, we can free the state
+ÂÂÂÂÂÂÂÂ * and clear the associated private object now to free
+ÂÂÂÂÂÂÂÂ * some memory and avoid a possible use-after-free later.
-ÂÂÂÂÂÂÂ struct dm_atomic_state *new_dm_state, *old_dm_state;

-ÂÂÂÂÂÂÂ new_dm_state = dm_atomic_get_new_state(state);
-ÂÂÂÂÂÂÂ old_dm_state = dm_atomic_get_old_state(state);
+ÂÂÂÂÂÂÂ for (i = 0; i < state->num_private_objs; i++) {
+ÂÂÂÂÂÂÂÂÂÂÂ struct drm_private_obj *obj = state->private_objs[i].ptr;

-ÂÂÂÂÂÂÂ if (new_dm_state && old_dm_state) {
-ÂÂÂÂÂÂÂÂÂÂÂ if (new_dm_state->context)
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dc_release_state(new_dm_state->context);
+ÂÂÂÂÂÂÂÂÂÂÂ if (obj->funcs == adev->dm.atomic_obj.funcs) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int j = state->num_private_objs-1;

-ÂÂÂÂÂÂÂÂÂÂÂ new_dm_state->context = old_dm_state->context;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dm_atomic_destroy_state(obj,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[i].state);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* If i is not at the end of the array then the
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * last element needs to be moved to where i was
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * before the array can safely be truncated.
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[i] =
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[j];

-ÂÂÂÂÂÂÂÂÂÂÂ if (old_dm_state->context)
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dc_retain_state(old_dm_state->context);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[j].ptr = NULL;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[j].state = NULL;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[j].old_state = NULL;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->private_objs[j].new_state = NULL;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ state->num_private_objs = j;