Re: [PATCH v2 1/2] drm/panfrost: Handle resetting on timeout better

From: Rob Herring
Date: Wed Oct 09 2019 - 14:40:11 EST

On Wed, Oct 9, 2019 at 4:45 AM Steven Price <steven.price@xxxxxxx> wrote:
> Panfrost uses multiple schedulers (one for each slot, so 2 in reality),
> and on a timeout has to stop all the schedulers to safely perform a
> reset. However more than one scheduler can trigger a timeout at the same
> time. This race condition results in jobs being freed while they are
> still in use.
> When stopping other slots use cancel_delayed_work_sync() to ensure that
> any timeout started for that slot has completed. Also use
> mutex_trylock() to obtain reset_lock. This means that only one thread
> attempts the reset, the other threads will simply complete without doing
> anything (the first thread will wait for this in the call to
> cancel_delayed_work_sync()).
> While we're here and since the function is already dependent on
> sched_job not being NULL, let's remove the unnecessary checks.
> Fixes: aa20236784ab ("drm/panfrost: Prevent concurrent resets")
> Tested-by: Neil Armstrong <narmstrong@xxxxxxxxxxxx>
> Signed-off-by: Steven Price <steven.price@xxxxxxx>
> ---
> v2:
> * Added fixes and tested-by tags
> * Moved cleanup of panfrost_core_dump() comment to separate patch
> drivers/gpu/drm/panfrost/panfrost_job.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)

Both patches applied.