Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

From: Philipp Stanner
Date: Fri Sep 13 2024 - 08:30:33 EST


On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 28/08/2024 10:41, Philipp Stanner wrote:
> > drm_sched_job_init() has no control over how users allocate struct
> > drm_sched_job. Unfortunately, the function can also not set some
> > struct
> > members such as job->sched.
>
> job->sched usage from within looks like a bug. But not related to the
> memset you add.
>
> For this one something like this looks easiest for a start:
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..877113b01af2 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>                   * or worse--a blank screen--leave a trail in the
>                   * logs, so this can be debugged easier.
>                   */
> -               drm_err(job->sched, "%s: entity has no rq!\n",
> __func__);
> +               pr_err("%s: entity has no rq!\n", __func__);
>                  return -ENOENT;
>          }
>
> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to
> variable
> number of run-queues")
> Cc: <stable@xxxxxxxxxxxxxxx> # v6.7+

Danilo and I already solved that:

https://lore.kernel.org/all/20240827074521.12828-2-pstanner@xxxxxxxxxx/


>
> > This could theoretically lead to UB by users dereferencing the
> > struct's
> > pointer members too early.
>
> Hmm if drm_sched_job_init returned an error callers should not
> dereference anything. What was actually the issue you were debugging?

I was learning about the scheduler, wrote a dummy driver and had
awkward behavior. Turned out it was this pointer not being initialized.
I would have seen it immediately if it were NULL.

The actual issue was and is IMO that a function called
drm_sched_job_init() initializes the job. But it doesn't, it only
partially initializes it. Only after drm_sched_job_arm() ran you're
actually ready to go.

>
> Adding a memset is I think not the best solution since it is very
> likely
> redundant to someone doing a kzalloc in the first place.

It is redundant in most cases, but it is effectively for free. I
measured the runtime with 1e6 jobs with and without memset and there
was no difference.


P.

>
> Regards,
>
> Tvrtko
>
> > It is easier to debug such issues if these pointers are initialized
> > to
> > NULL, so dereferencing them causes a NULL pointer exception.
> > Accordingly, drm_sched_entity_init() does precisely that and
> > initializes
> > its struct with memset().
> >
> > Initialize parameter "job" to 0 in drm_sched_job_init().
> >
> > Signed-off-by: Philipp Stanner <pstanner@xxxxxxxxxx>
> > ---
> > No changes in v2.
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 356c30fa24a8..b0c8ad10b419 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job
> > *job,
> >    return -EINVAL;
> >    }
> >  
> > + /*
> > + * We don't know for sure how the user has allocated.
> > Thus, zero the
> > + * struct so that unallowed (i.e., too early) usage of
> > pointers that
> > + * this function does not set is guaranteed to lead to a
> > NULL pointer
> > + * exception instead of UB.
> > + */
> > + memset(job, 0, sizeof(*job));
> > +
> >    job->entity = entity;
> >    job->credits = credits;
> >    job->s_fence = drm_sched_fence_alloc(entity, owner);
>