Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
From: Peter Zijlstra
Date: Fri Oct 06 2017 - 07:11:04 EST
On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote:
> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.
synchronize_*() provides an smp_mb() on the calling CPU and ensures an
smp_mb() on all other CPUs before completion, such that everybody agrees
on the state prior to calling syncrhonize_rcu(). So yes, no additional
ordering requirements.
> drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++--------------
> drivers/gpu/drm/i915/i915_gem_request.c | 2 ++
> drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 ++
> 3 files changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> intel_engine_init_global_seqno(request->engine, request->global_seqno);
> }
>
> +static void engine_complete_requests(struct intel_engine_cs *engine)
> {
> /* Mark all executing requests as skipped */
> engine->cancel_requests(engine);
>
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
> intel_engine_last_submit(engine));
> }
>
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
> {
> struct intel_engine_cs *engine;
> enum intel_engine_id id;
>
> for_each_engine(engine, i915, id)
> + engine->submit_request = nop_submit_request;
>
> + /* Make sure no one is running the old callback before we proceed with
> + * cancelling requests and resetting the completion tracking. Otherwise
> + * we might submit a request to the hardware which never completes.
> + */
ARGH @ horrid comment style..
http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html
:-)
> + synchronize_rcu();
>
> + for_each_engine(engine, i915, id)
> + engine_complete_requests(engine);
>
> + set_bit(I915_WEDGED, &i915->gpu_error.flags);
> + wake_up_all(&i915->gpu_error.reset_queue);
> }
>
> bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> switch (state) {
> case FENCE_COMPLETE:
> trace_i915_gem_request_submit(request);
> + rcu_read_lock();
> request->engine->submit_request(request);
> + rcu_read_unlock();
> break;
>
> case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
> }
> i915_gem_request_get(vip);
> i915_add_request(vip);
> + rcu_read_lock();
> request->engine->submit_request(request);
> + rcu_read_unlock();
>
> mutex_unlock(&i915->drm.struct_mutex);
Yes, this is a correct and good replacement, however, you said:
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
This means that you could simply do synchronize_sched() without the
addition of rcu_read_lock()s and still be fine.