Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing
From: Arnaud POULIQUEN
Date: Wed Apr 29 2020 - 03:44:15 EST
Hi Mathieu,
On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Refactor function rproc_trigger_recovery() in order to avoid
> reloading the firmware image when synchronising with a remote
> processor rather than booting it. Also part of the process,
> properly set the synchronisation flag in order to properly
> recover the system.
>
> Signed-off-by: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>
> ---
> drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index ef88d3e84bfb..3a84a38ba37b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> */
> int rproc_trigger_recovery(struct rproc *rproc)
> {
> - const struct firmware *firmware_p;
> + const struct firmware *firmware_p = NULL;
> struct device *dev = &rproc->dev;
> int ret;
>
> @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> /* generate coredump */
> rproc_coredump(rproc);
>
> - /* load firmware */
> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> - if (ret < 0) {
> - dev_err(dev, "request_firmware failed: %d\n", ret);
> - goto unlock_mutex;
> + /* load firmware if need be */
> + if (!rproc_needs_syncing(rproc)) {
> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> + if (ret < 0) {
> + dev_err(dev, "request_firmware failed: %d\n", ret);
> + goto unlock_mutex;
> + }
If we started in syncing mode then rpoc->firmware is null
rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc)
false.
In this case here we fail the recovery an leave in RPROC_STOP state.
As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery
failed.
[1]https://lkml.org/lkml/2020/3/11/402
Regards,
Arnaud
> }
>
> - /* boot the remote processor up again */
> + /* boot up or synchronise with the remote processor again */
> ret = rproc_start(rproc, firmware_p);
>
> release_firmware(firmware_p);
> @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> rproc->name);
>
> + /*
> + * The remote processor has crashed - tell the core what operation
> + * to use from hereon, i.e whether an external entity will reboot
> + * the MCU or it is now the remoteproc core's responsability.
> + */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
> +
> mutex_unlock(&rproc->lock);
>
> if (!rproc->recovery_disabled)
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 3985c084b184..61500981155c 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> struct rproc_mem_entry trace_mem;
> };
>
> +/*
> + * enum rproc_sync_states - remote processsor sync states
> + *
> + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> + * has crashed but has not been recovered by
> + * the remoteproc core yet.
> + *
> + * Keeping these separate from the enum rproc_state in order to avoid
> + * introducing coupling between the state of the MCU and the synchronisation
> + * operation to use.
> + */
> +enum rproc_sync_states {
> + RPROC_SYNC_STATE_CRASHED,
> +};
> +
> +static inline void rproc_set_sync_flag(struct rproc *rproc,
> + enum rproc_sync_states state)
> +{
> + switch (state) {
> + case RPROC_SYNC_STATE_CRASHED:
> + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> + break;
> + default:
> + break;
> + }
> +}
> +
> /* from remoteproc_core.c */
> void rproc_release(struct kref *kref);
> irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
>