Re: [PATCH v3 4/9] drm/scheduler: Add fence deadline support

From: Christian König
Date: Tue Sep 21 2021 - 12:45:30 EST


Am 21.09.21 um 18:35 schrieb Rob Clark:
On Tue, Sep 21, 2021 at 8:57 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
On Wed, Sep 8, 2021 at 10:45 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
On Fri, Sep 03, 2021 at 11:47:55AM -0700, Rob Clark wrote:
From: Rob Clark <robdclark@xxxxxxxxxxxx>

As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.

v2: Split into drm_sched_fence_set_parent() (ckoenig)

Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
---
drivers/gpu/drm/scheduler/sched_fence.c | 34 +++++++++++++++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 2 +-
include/drm/gpu_scheduler.h | 8 ++++++
3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index bcea035cf4c6..4fc41a71d1c7 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f)
dma_fence_put(&fence->scheduled);
}

+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
+ ktime_t deadline)
+{
+ struct drm_sched_fence *fence = to_drm_sched_fence(f);
+ unsigned long flags;
+
+ spin_lock_irqsave(&fence->lock, flags);
+
+ /* If we already have an earlier deadline, keep it: */
+ if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
+ ktime_before(fence->deadline, deadline)) {
+ spin_unlock_irqrestore(&fence->lock, flags);
+ return;
+ }
+
+ fence->deadline = deadline;
+ set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
+
+ spin_unlock_irqrestore(&fence->lock, flags);
+
+ if (fence->parent)
+ dma_fence_set_deadline(fence->parent, deadline);
+}
+
static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
.release = drm_sched_fence_release_finished,
+ .set_deadline = drm_sched_fence_set_deadline_finished,
};

struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
@@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
}
EXPORT_SYMBOL(to_drm_sched_fence);

+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
+ struct dma_fence *fence)
+{
+ s_fence->parent = dma_fence_get(fence);
+ if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
+ &s_fence->finished.flags))
Don't you need the spinlock here too to avoid races? test_bit is very
unordered, so guarantees nothing. Spinlock would need to be both around
->parent = and the test_bit.

Entirely aside, but there's discussions going on to preallocate the hw
fence somehow. If we do that we could make the deadline forwarding
lockless here. Having a spinlock just to set the parent is a bit annoying
...

Alternative is that you do this locklessly with barriers and a _lot_ of
comments. Would be good to benchmark whether the overhead matters though
first.
So, my thinking is that very few (well no) guarantees are made to the
fence implementor that their ->set_deadline() is not called multiple
times, from multiple threads, etc. And no guarantee that a later
deadline is set after an earlier deadline has been set. It is all up
to the set_deadline() implementation to deal with these cases.

So that means we just need the appropriate barrier-fu to ensure
another thread calling drm_sched_fence_set_deadline_finished() sees
fence->parent set before the test_bit. It could mean that the backend
implementation sees the same deadline set twice, but that is fine.

something like:

Of hand I think that this should work.

Or rather say I can't see anything wrong with that.

Christian.


-----
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
b/drivers/gpu/drm/scheduler/sched_fence.c
index 4fc41a71d1c7..7f2af6d1777c 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -132,6 +132,7 @@ static void
drm_sched_fence_set_deadline_finished(struct dma_fence *f,
ktime_t deadline)
{
struct drm_sched_fence *fence = to_drm_sched_fence(f);
+ struct dma_fence *parent;
unsigned long flags;

spin_lock_irqsave(&fence->lock, flags);
@@ -148,8 +149,9 @@ static void
drm_sched_fence_set_deadline_finished(struct dma_fence *f,

spin_unlock_irqrestore(&fence->lock, flags);

- if (fence->parent)
- dma_fence_set_deadline(fence->parent, deadline);
+ parent = smp_load_acquire(&fence->parent);
+ if (parent)
+ dma_fence_set_deadline(parent, deadline);
}

static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
@@ -180,7 +182,7 @@ EXPORT_SYMBOL(to_drm_sched_fence);
void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
struct dma_fence *fence)
{
- s_fence->parent = dma_fence_get(fence);
+ smp_store_release(&s_fence->parent, dma_fence_get(fence));
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
&s_fence->finished.flags))
dma_fence_set_deadline(fence, s_fence->deadline);
-----

BR,
-R