Re: [PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

From: Daniel Vetter
Date: Thu Apr 06 2023 - 04:44:56 EST


On Thu, Apr 06, 2023 at 10:29:57AM +0200, Christian König wrote:
> Am 05.04.23 um 18:34 schrieb Asahi Lina:
> > A signaled scheduler fence can outlive its scheduler, since fences are
> > independently reference counted.
>
> Well that is actually not correct. Schedulers are supposed to stay around
> until the hw they have been driving is no longer present.
>
> E.g. the reference was scheduler_fence->hw_fence->driver->scheduler.
>
> Your use case is now completely different to that and this won't work any
> more.

This is why I'm a bit a broken record suggesting that for the fw scheduler
case, where we have drm_sched_entity:drm_scheduler 1:1 and created at
runtime, we really should rework the interface exposed to drivers:

- drm_scheduler stays the thing that's per-engine and stays around for as
long as the driver

- We split out a drm_sched_internal, which is either tied to drm_scheduler
(ringbuffer scheduler mode) or drm_sched_entity (fw ctx scheduling
mode).

- drm/sched internals are updated to dtrt in all these cases. And there's
a lot, stuff like drm_sched_job is quite tricky if each driver needs to
protect against concurrent ctx/entity creation/destruction, and I really
don't like the idea that drivers hand-roll this kind of tricky state
transition code that's used in the exceptional tdr/gpu/fw-death
situation all themselves.

> This here might just be the first case where that breaks.

Yeah I expect there's going to be a solid stream of these, and we're just
going to random-walk in circles if this effort doesn't come with at least
some amount of design.

Thus far no one really comment on the above plan though, so I'm not sure
what the consensu plan is among all the various fw-scheduling driver
efforts ...
-Daniel

>
> Regards,
> Christian.
>
> > Therefore, we can't reference the
> > scheduler in the get_timeline_name() implementation.
> >
> > Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
> > dma-bufs reference fences from GPU schedulers that no longer exist.
> >
> > Signed-off-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > ---
> > drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
> > drivers/gpu/drm/scheduler/sched_fence.c | 4 +++-
> > include/drm/gpu_scheduler.h | 5 +++++
> > 3 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 15d04a0ec623..8b3b949b2ce8 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -368,7 +368,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> > /*
> > * Fence is from the same scheduler, only need to wait for
> > - * it to be scheduled
> > + * it to be scheduled.
> > + *
> > + * Note: s_fence->sched could have been freed and reallocated
> > + * as another scheduler. This false positive case is okay, as if
> > + * the old scheduler was freed all of its jobs must have
> > + * signaled their completion fences.
> > */
> > fence = dma_fence_get(&s_fence->scheduled);
> > dma_fence_put(entity->dependency);
> > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > index 7fd869520ef2..33b145dfa38c 100644
> > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > @@ -66,7 +66,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence)
> > static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> > {
> > struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > - return (const char *)fence->sched->name;
> > + return (const char *)fence->sched_name;
> > }
> > static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
> > @@ -168,6 +168,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
> > unsigned seq;
> > fence->sched = entity->rq->sched;
> > + strlcpy(fence->sched_name, entity->rq->sched->name,
> > + sizeof(fence->sched_name));
> > seq = atomic_inc_return(&entity->fence_seq);
> > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > &fence->lock, entity->fence_context, seq);
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 9db9e5e504ee..49f019731891 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -295,6 +295,11 @@ struct drm_sched_fence {
> > * @lock: the lock used by the scheduled and the finished fences.
> > */
> > spinlock_t lock;
> > + /**
> > + * @sched_name: the name of the scheduler that owns this fence. We
> > + * keep a copy here since fences can outlive their scheduler.
> > + */
> > + char sched_name[16];
> > /**
> > * @owner: job owner for debugging
> > */
> >
> > ---
> > base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
> > change-id: 20230406-scheduler-uaf-1-994ec34cac93
> >
> > Thank you,
> > ~~ Lina
> >
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch