Re: [PATCH v2 2/2] drm/sched: warn about drm_sched_job_init()'s partial init

From: Tvrtko Ursulin
Date: Fri Sep 13 2024 - 08:21:49 EST



On 28/08/2024 10:41, Philipp Stanner wrote:
drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner <pstanner@xxxxxxxxxx>
---
Changes in v2:
- Change grammar in the new comments a bit.
---
drivers/gpu/drm/scheduler/sched_main.c | 4 ++++
include/drm/gpu_scheduler.h | 7 +++++++
2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index b0c8ad10b419..721373938c1e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
* Drivers must make sure drm_sched_job_cleanup() if this function returns
* successfully, even when @job is aborted before drm_sched_job_arm() is called.
*
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.

First sentence is fine, but the second I don't see the those details in struct drm_sched_job. (And I am not saying that they must be listed. IMO at some point it is better to have a high level overview than describe the lifetime rules with individual members.)

+ *
* WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
* has died, which can mean that there's no valid runqueue for a @entity.
* This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5acc64954a88..04a268cd22f1 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
struct drm_sched_job {
struct spsc_node queue_node;
struct list_head list;
+
+ /*
+ * The scheduler this job is or will be scheduled on.
+ *
+ * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+ * callback "free_job()" has been called.

This is interesting - I was not sure where lifetime for job->sched is defined and couldn't find it browsing around. Where did you find the clues to tie it to the free_job() callback?

Regards,

Tvrtko

+ */
struct drm_gpu_scheduler *sched;
struct drm_sched_fence *s_fence;