On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote:
Hi,
On 28/08/2024 10:41, Philipp Stanner wrote:
drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some
struct
members such as job->sched.
job->sched usage from within looks like a bug. But not related to the
memset you add.
For this one something like this looks easiest for a start:
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..877113b01af2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
* or worse--a blank screen--leave a trail in the
* logs, so this can be debugged easier.
*/
- drm_err(job->sched, "%s: entity has no rq!\n",
__func__);
+ pr_err("%s: entity has no rq!\n", __func__);
return -ENOENT;
}
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to
variable
number of run-queues")
Cc: <stable@xxxxxxxxxxxxxxx> # v6.7+
Danilo and I already solved that:
https://lore.kernel.org/all/20240827074521.12828-2-pstanner@xxxxxxxxxx/
This could theoretically lead to UB by users dereferencing the
struct's
pointer members too early.
Hmm if drm_sched_job_init returned an error callers should not
dereference anything. What was actually the issue you were debugging?
I was learning about the scheduler, wrote a dummy driver and had
awkward behavior. Turned out it was this pointer not being initialized.
I would have seen it immediately if it were NULL.
The actual issue was and is IMO that a function called
drm_sched_job_init() initializes the job. But it doesn't, it only
partially initializes it. Only after drm_sched_job_arm() ran you're
actually ready to go.
Adding a memset is I think not the best solution since it is very
likely
redundant to someone doing a kzalloc in the first place.
It is redundant in most cases, but it is effectively for free. I
measured the runtime with 1e6 jobs with and without memset and there
was no difference.
It is easier to debug such issues if these pointers are initialized
to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and
initializes
its struct with memset().
Initialize parameter "job" to 0 in drm_sched_job_init().
Signed-off-by: Philipp Stanner <pstanner@xxxxxxxxxx>
---
No changes in v2.
---
drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 356c30fa24a8..b0c8ad10b419 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job
*job,
return -EINVAL;
}
+ /*
+ * We don't know for sure how the user has allocated.
Thus, zero the
+ * struct so that unallowed (i.e., too early) usage of
pointers that
+ * this function does not set is guaranteed to lead to a
NULL pointer
+ * exception instead of UB.
+ */
+ memset(job, 0, sizeof(*job));
+
job->entity = entity;
job->credits = credits;
job->s_fence = drm_sched_fence_alloc(entity, owner);