Re: [PATCH] drm/amd/display: Clear dm_state for fast updates

From: Daniel Vetter
Date: Mon Jul 27 2020 - 17:09:44 EST


On Mon, Jul 27, 2020 at 10:29 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Mon, Jul 27, 2020 at 9:28 PM Christian KÃnig
> <christian.koenig@xxxxxxx> wrote:
> >
> > Am 27.07.20 um 16:05 schrieb Kazlauskas, Nicholas:
> > > On 2020-07-27 9:39 a.m., Christian KÃnig wrote:
> > >> Am 27.07.20 um 07:40 schrieb Mazin Rezk:
> > >>> This patch fixes a race condition that causes a use-after-free during
> > >>> amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking
> > >>> commits
> > >>> are requested and the second one finishes before the first.
> > >>> Essentially,
> > >>> this bug occurs when the following sequence of events happens:
> > >>>
> > >>> 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is
> > >>> deferred to the workqueue.
> > >>>
> > >>> 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is
> > >>> deferred to the workqueue.
> > >>>
> > >>> 3. Commit #2 starts before commit #1, dm_state #1 is used in the
> > >>> commit_tail and commit #2 completes, freeing dm_state #1.
> > >>>
> > >>> 4. Commit #1 starts after commit #2 completes, uses the freed dm_state
> > >>> 1 and dereferences a freelist pointer while setting the context.
> > >>
> > >> Well I only have a one mile high view on this, but why don't you let
> > >> the work items execute in order?
> > >>
> > >> That would be better anyway cause this way we don't trigger a cache
> > >> line ping pong between CPUs.
> > >>
> > >> Christian.
> > >
> > > We use the DRM helpers for managing drm_atomic_commit_state and those
> > > helpers internally push non-blocking commit work into the system
> > > unbound work queue.
> >
> > Mhm, well if you send those helper atomic commits in the order A,B and
> > they execute it in the order B,A I would call that a bug :)
>
> The way it works is it pushes all commits into unbound work queue, but
> then forces serialization as needed. We do _not_ want e.g. updates on
> different CRTC to be serialized, that would result in lots of judder.
> And hw is funny enough that there's all kinds of dependencies.
>
> The way you force synchronization is by adding other CRTC state
> objects. So if DC is busted and can only handle a single update per
> work item, then I guess you always need all CRTC states and everything
> will be run in order. But that also totally kills modern multi-screen
> compositors. Xorg isn't modern, just in case that's not clear :-)
>
> Lucking at the code it seems like you indeed have only a single dm
> state, so yeah global sync is what you'll need as immediate fix, and
> then maybe fix up DM to not be quite so silly ... or at least only do
> the dm state stuff when really needed.

Just looked a bit more at this struct dc_state, and that looks a lot
like an atomic side-wagon. I don't think that works as a private
state, this should probably be embedded into a subclass of
drm_atomic_state.

And probably a lot of these pointers moved to other places I think, or
I'm not entirely clear on what exactly this stuff is needed for ...

dc_state is also refcounted, which is definitely rather funny for a
state structure.

Feels like this entire thing (how the overall dc state machinery is
glued into atomic) isn't quite thought thru just yet :-/
-Daniel

> We could also sprinkle the drm_crtc_commit structure around a bit
> (it's the glue that provides the synchronization across commits), but
> since your dm state is global just grabbing all crtc states
> unconditionally as part of that is probably best.
>
> > > While we could duplicate a copy of that code with nothing but the
> > > workqueue changed that isn't something I'd really like to maintain
> > > going forward.
> >
> > I'm not talking about duplicating the code, I'm talking about fixing the
> > helpers. I don't know that code well, but from the outside it sounds
> > like a bug there.
> >
> > And executing work items in the order they are submitted is trivial.
> >
> > Had anybody pinged Daniel or other people familiar with the helper code
> > about it?
>
> Yeah something is wrong here, and the fix looks horrible :-)
>
> Aside, I've also seen some recent discussion flare up about
> drm_atomic_state_get/put used to paper over some other use-after-free,
> but this time related to interrupt handlers. Maybe a few rules about
> that:
> - dont
> - especially not when it's interrupt handlers, because you can't call
> drm_atomic_state_put from interrupt handlers.
>
> Instead have an spin_lock_irq to protect the shared date with your
> interrupt handler, and _copy_ the date over. This is e.g. what
> drm_crtc_arm_vblank_event does.
>
> Cheers, Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Regards,
> > > Nicholas Kazlauskas
> > >
> > >>
> > >>>
> > >>> Since this bug has only been spotted with fast commits, this patch
> > >>> fixes
> > >>> the bug by clearing the dm_state instead of using the old dc_state for
> > >>> fast updates. In addition, since dm_state is only used for its dc_state
> > >>> and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is
> > >>> found,
> > >>> removing the dm_state should not have any consequences in fast updates.
> > >>>
> > >>> This use-after-free bug has existed for a while now, but only caused a
> > >>> noticeable issue starting from 5.7-rc1 due to 3202fa62f ("slub:
> > >>> relocate
> > >>> freelist pointer to middle of object") moving the freelist pointer from
> > >>> dm_state->base (which was unused) to dm_state->context (which is
> > >>> dereferenced).
> > >>>
> > >>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383
> > >>> Fixes: bd200d190f45 ("drm/amd/display: Don't replace the dc_state
> > >>> for fast updates")
> > >>> Reported-by: Duncan <1i5t5.duncan@xxxxxxx>
> > >>> Signed-off-by: Mazin Rezk <mnrzk@xxxxxxxxxxxxxx>
> > >>> ---
> > >>> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36
> > >>> ++++++++++++++-----
> > >>> 1 file changed, 27 insertions(+), 9 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > >>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > >>> index 86ffa0c2880f..710edc70e37e 100644
> > >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > >>> @@ -8717,20 +8717,38 @@ static int amdgpu_dm_atomic_check(struct
> > >>> drm_device *dev,
> > >>> * the same resource. If we have a new DC context as part of
> > >>> * the DM atomic state from validation we need to free it and
> > >>> * retain the existing one instead.
> > >>> + *
> > >>> + * Furthermore, since the DM atomic state only contains the DC
> > >>> + * context and can safely be annulled, we can free the state
> > >>> + * and clear the associated private object now to free
> > >>> + * some memory and avoid a possible use-after-free later.
> > >>> */
> > >>> - struct dm_atomic_state *new_dm_state, *old_dm_state;
> > >>>
> > >>> - new_dm_state = dm_atomic_get_new_state(state);
> > >>> - old_dm_state = dm_atomic_get_old_state(state);
> > >>> + for (i = 0; i < state->num_private_objs; i++) {
> > >>> + struct drm_private_obj *obj = state->private_objs[i].ptr;
> > >>>
> > >>> - if (new_dm_state && old_dm_state) {
> > >>> - if (new_dm_state->context)
> > >>> - dc_release_state(new_dm_state->context);
> > >>> + if (obj->funcs == adev->dm.atomic_obj.funcs) {
> > >>> + int j = state->num_private_objs-1;
> > >>>
> > >>> - new_dm_state->context = old_dm_state->context;
> > >>> + dm_atomic_destroy_state(obj,
> > >>> + state->private_objs[i].state);
> > >>> +
> > >>> + /* If i is not at the end of the array then the
> > >>> + * last element needs to be moved to where i was
> > >>> + * before the array can safely be truncated.
> > >>> + */
> > >>> + if (i != j)
> > >>> + state->private_objs[i] =
> > >>> + state->private_objs[j];
> > >>>
> > >>> - if (old_dm_state->context)
> > >>> - dc_retain_state(old_dm_state->context);
> > >>> + state->private_objs[j].ptr = NULL;
> > >>> + state->private_objs[j].state = NULL;
> > >>> + state->private_objs[j].old_state = NULL;
> > >>> + state->private_objs[j].new_state = NULL;
> > >>> +
> > >>> + state->num_private_objs = j;
> > >>> + break;
> > >>> + }
> > >>> }
> > >>> }
> > >>>
> > >>> --
> > >>> 2.27.0
> > >>>
> > >>
> > >
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch