Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing
From: Mathieu Poirier
Date: Fri May 08 2020 - 17:48:03 EST
On Tue, May 05, 2020 at 06:01:56PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > Refactor function rproc_trigger_recovery() in order to avoid
> > reloading the firmware image when synchronising with a remote
> > processor rather than booting it. Also part of the process,
> > properly set the synchronisation flag in order to properly
> > recover the system.
> >
> > Signed-off-by: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> > 2 files changed, 43 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index ef88d3e84bfb..3a84a38ba37b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> > */
> > int rproc_trigger_recovery(struct rproc *rproc)
> > {
> > - const struct firmware *firmware_p;
> > + const struct firmware *firmware_p = NULL;
> > struct device *dev = &rproc->dev;
> > int ret;
> >
> > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> > /* generate coredump */
> > rproc_coredump(rproc);
> >
> > - /* load firmware */
> > - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > - if (ret < 0) {
> > - dev_err(dev, "request_firmware failed: %d\n", ret);
> > - goto unlock_mutex;
> > + /* load firmware if need be */
> > + if (!rproc_needs_syncing(rproc)) {
> > + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > + if (ret < 0) {
> > + dev_err(dev, "request_firmware failed: %d\n", ret);
> > + goto unlock_mutex;
> > + }
> > }
> >
> > - /* boot the remote processor up again */
> > + /* boot up or synchronise with the remote processor again */
> > ret = rproc_start(rproc, firmware_p);
> >
> > release_firmware(firmware_p);
> > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> > rproc->name);
> >
> > + /*
> > + * The remote processor has crashed - tell the core what operation
> > + * to use from hereon, i.e whether an external entity will reboot
> > + * the MCU or it is now the remoteproc core's responsability.
> > + */
> > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
>
> If I follow the logic correctly, you're essentially using
> rproc->sync_with_rproc to pass an additional parameter down through
> rproc_trigger_recovery() to tell everyone below to "load firmware and
> boot the core or not".
I am using the value of rproc::sync_flags::after_crash to set
rproc->sync_with_rproc. That way the core can know whether it should boot the
remote processor or synchronise with it.
>
> And given that the comment alludes to some unknown logic determining the
> continuation I think it would be much preferable to essentially just
> pass rproc->sync_flags.after_crash down through these functions.
>
The only thing we need to do is set the value of rproc->sync_with_rproc
properly, which rproc_set_sync_flag() does. I have decided to use a wrapper
function to allow us to change how the rproc->sync_with_rproc is handled without
touching anything else in the code.
>
> And per my comment on a previous patch, is there any synchronization
> with the remote controller when this happens?
I can't seem to find that comment - can you indicate which patch that was? As
it is today the core doesn't provide synchronisation, it is up to the platform
driver to probe the remote processor to make sure it is up. I suppose
sync_ops::start() would be a perfect candidate for that.
>
> Regards,
> Bjorn
>
> > +
> > mutex_unlock(&rproc->lock);
> >
> > if (!rproc->recovery_disabled)
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 3985c084b184..61500981155c 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> > struct rproc_mem_entry trace_mem;
> > };
> >
> > +/*
> > + * enum rproc_sync_states - remote processsor sync states
> > + *
> > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> > + * has crashed but has not been recovered by
> > + * the remoteproc core yet.
> > + *
> > + * Keeping these separate from the enum rproc_state in order to avoid
> > + * introducing coupling between the state of the MCU and the synchronisation
> > + * operation to use.
> > + */
> > +enum rproc_sync_states {
> > + RPROC_SYNC_STATE_CRASHED,
> > +};
> > +
> > +static inline void rproc_set_sync_flag(struct rproc *rproc,
> > + enum rproc_sync_states state)
> > +{
> > + switch (state) {
> > + case RPROC_SYNC_STATE_CRASHED:
> > + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> > + break;
> > + default:
> > + break;
> > + }
> > +}
> > +
> > /* from remoteproc_core.c */
> > void rproc_release(struct kref *kref);
> > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
> > --
> > 2.20.1
> >