Re: [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()

From: Mathieu Poirier
Date: Mon Mar 09 2020 - 16:56:39 EST


On Fri, Feb 28, 2020 at 12:33:56PM -0600, Alex Elder wrote:
> Two places call rproc_trigger_recovery():
> - rproc_crash_handler_work() sets rproc->state to CRASHED under
> protection of the mutex, then calls it if recovery is not
> disabled. This function is called in workqueue context when
> scheduled in rproc_report_crash().
> - rproc_recovery_write() calls it in two spots, both of which
> the only call it if the rproc->state is CRASHED.
>
> The mutex is taken right away in rproc_trigger_recovery(). However,
> by the time the mutex is acquired, something else might have changed
> rproc->state to something other than CRASHED.

I'm interested in the "something might have changed" part. The only thing I can
see is if rproc_trigger_recovery() has been called from debugfs between the time
the mutex is released but just before rproc_trigger_recovery() is called in
rproc_crash_handler_work(). In this case we would be done twice, something your
patch prevents. Have you found other scenarios?

Thanks,
Mathieu

>
> The work that follows that is only appropriate for a remoteproc in
> CRASHED state. So check the state after acquiring the mutex, and
> only proceed with the recovery work if the remoteproc is still in
> CRASHED state.
>
> Delay reporting that recovering has begun until after we hold the
> mutex and we know the remote processor is in CRASHED state.
>
> Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
> ---
> drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..d327cb31d5c8 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> struct device *dev = &rproc->dev;
> int ret;
>
> + ret = mutex_lock_interruptible(&rproc->lock);
> + if (ret)
> + return ret;
> +
> + /* State could have changed before we got the mutex */
> + if (rproc->state != RPROC_CRASHED)
> + goto unlock_mutex;
> +
> dev_err(dev, "recovering %s\n", rproc->name);
>
> - ret = mutex_lock_interruptible(&rproc->lock);
> - if (ret)
> - return ret;
> -
> ret = rproc_stop(rproc, true);
> if (ret)
> goto unlock_mutex;
> --
> 2.20.1
>