[SNIP]
"further"? There was no discussion at all,
you just started off like
that. If you think somebody misses that connection, you can point out
to documentation/videos whatever so the contributor can understand
what's wrong with an approach. You did that, so that's fine. It's just
starting off _any_ discussion with a "Well complete NAK" is terrible
style. I'd feel uncomfortable if that happened to me and I'm sure
there are enough people like that that we should be more reasonable
with our replies. Just.. don't.
We are all humans here and people react negatively to such things. And
if people do it on purpose it just makes it worse.
fyi:Yeah, that's the problematic part. We have documented this veryThis is clearly going against the idea of having jobs only depend onI'm sure it's all documented and there is a design document on how
fences and nothing else which is mandatory for correct memory management.
things have to look like you can point out? Might help to get a better
understanding on how things should be.
extensively:
https://www.kernel.org/doc/html/v5.9/driver-api/dma-buf.html#indefinite-dma-fences
And both Jason and Daniel gave talks about the underlying problem and
s/Jason/Faith/g
try to come up with patches to raise warnings when that happens, butYes, and we'll have to tell them over and over again. Nothing wrong
people still keep coming up with the same idea over and over again.
with that. That's just part of maintaining such a big subsystem. And
that's definitely not a valid reason to phrase things like above.
It's just that the technical relationship between preventing jobs fromSure, but that's just part of the job. And pointing out fundamental
running and with that preventing dma_fences from signaling and the core
memory management with page faults and shrinkers waiting for those
fences is absolutely not obvious.
We had at least 10 different teams from different companies falling into
the same trap already and either the patches were rejected of hand or
had to painfully reverted or mitigated later on.
mistakes early on is important, but the situation won't get any better
by being like that. Yes, we'll have to repeat the same words over and
over again, and yes that might be annoying, but that's just how it is.
Regards,
Christian.
If the hw is busy with something you need to return the fence for this
from the prepare_job callback so that the scheduler can be notified when
the hw is available again.
Regards,
Christian.
Signed-off-by: Asahi Lina <lina@xxxxxxxxxxxxx>
---
drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++
include/drm/gpu_scheduler.h | 8 ++++++++
2 files changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 4e6ad6e122bc..5c0add2c7546 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1001,6 +1001,16 @@ static int drm_sched_main(void *param)
if (!entity)
continue;
+ if (sched->ops->can_run_job) {
+ sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+ if (!sched_job) {
+ complete_all(&entity->entity_idle);
+ continue;
+ }
+ if (!sched->ops->can_run_job(sched_job))
+ continue;
+ }
+
sched_job = drm_sched_entity_pop_job(entity);
if (!sched_job) {
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9db9e5e504ee..bd89ea9507b9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -396,6 +396,14 @@ struct drm_sched_backend_ops {
struct dma_fence *(*prepare_job)(struct drm_sched_job *sched_job,
struct drm_sched_entity *s_entity);
+ /**
+ * @can_run_job: Called before job execution to check whether the
+ * hardware is free enough to run the job. This can be used to
+ * implement more complex hardware resource policies than the
+ * hw_submission limit.
+ */
+ bool (*can_run_job)(struct drm_sched_job *sched_job);
+
/**
* @run_job: Called to execute the job once all of the dependencies
* have been resolved. This may be called multiple times, if