RE: [PATCH V2 2/2] remoteproc: support attach recovery after rproc crash

From: Peng Fan
Date: Mon Mar 14 2022 - 03:32:01 EST


> Subject: Re: [PATCH V2 2/2] remoteproc: support attach recovery after rproc
> crash
>
> On Tue 08 Mar 00:48 CST 2022, Peng Fan (OSS) wrote:
>
> > From: Peng Fan <peng.fan@xxxxxxx>
> >
> > Current logic only support main processor to stop/start the remote
> > processor after rproc crash. However to SoC, such as i.MX8QM/QXP, the
> > remote processor could do attach recovery after crash and trigger
> > watchdog
>
> Does it really do something called "attach recovery and trigger watchdog
> reboot"? Doesn't it just reboot itself and Linux needs to detach and reattach
> to get something (what?) reset?

I mean the remote processor could re-run without linux to load firmware/stop/
start. Linux side needs to detach/attach to communicate with remote processor.

>
> > reboot. It does not need main processor to load image, or stop/start
> > M4 core.
> >
> > Introduce two functions: rproc_attach_recovery,
> > rproc_firmware_recovery for the two cases. Firmware recovery is as
> > before, let main processor to help recovery, while attach recovery is recover
> itself withou help.
> > To attach recovery, we only do detach and attach.
> >
> > Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
> > ---
> >
> > V2:
> > use rproc_has_feature in patch 1/2
> >
> > drivers/remoteproc/remoteproc_core.c | 67
> > ++++++++++++++++++++--------
> > 1 file changed, 48 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c
> > b/drivers/remoteproc/remoteproc_core.c
> > index 69f51acf235e..366fad475898 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1887,6 +1887,50 @@ static int __rproc_detach(struct rproc *rproc)
> > return 0;
> > }
> >
> > +static int rproc_attach_recovery(struct rproc *rproc) {
> > + int ret;
> > +
> > + mutex_unlock(&rproc->lock);
> > + ret = rproc_detach(rproc);
> > + mutex_lock(&rproc->lock);
> > + if (ret)
> > + return ret;
> > +
> > + if (atomic_inc_return(&rproc->power) > 1)
>
> In the stop/coredump/start path the code _will_ attempt to recover the
> remote processor. With rproc_detach() and rproc_attach() fiddling with the
> rproc->power refcount this might do something, or it might not do something.
> And with the mutex_unlock() it's likely that you're opening of up for various
> race conditions inbetween.

Rproc_boot will inc rproc->power.
Rproc_detach will decrease rproc->power
Rproc_attach not touch rproc->power.

When do attach recovery, the logic is detach->attach. So I add one
inc rproc->power check to avoid count mis-usage.

>
>
> PS. Does anyone actually use this refcount, or are we just all holding our
> breath for it never going beyond 1?

I think latter usage.

Thanks,
Peng.

>
> Regards,
> Bjorn
>
> > + return 0;
> > +
> > + return rproc_attach(rproc);
> > +}
> > +
> > +static int rproc_firmware_recovery(struct rproc *rproc) {
> > + const struct firmware *firmware_p;
> > + struct device *dev = &rproc->dev;
> > + int ret;
> > +
> > + ret = rproc_stop(rproc, true);
> > + if (ret)
> > + return ret;
> > +
> > + /* generate coredump */
> > + rproc->ops->coredump(rproc);
> > +
> > + /* load firmware */
> > + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > + if (ret < 0) {
> > + dev_err(dev, "request_firmware failed: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + /* boot the remote processor up again */
> > + ret = rproc_start(rproc, firmware_p);
> > +
> > + release_firmware(firmware_p);
> > +
> > + return ret;
> > +}
> > +
> > /**
> > * rproc_trigger_recovery() - recover a remoteproc
> > * @rproc: the remote processor
> > @@ -1901,7 +1945,6 @@ static int __rproc_detach(struct rproc *rproc)
> > */
> > int rproc_trigger_recovery(struct rproc *rproc) {
> > - const struct firmware *firmware_p;
> > struct device *dev = &rproc->dev;
> > int ret;
> >
> > @@ -1915,24 +1958,10 @@ int rproc_trigger_recovery(struct rproc
> > *rproc)
> >
> > dev_err(dev, "recovering %s\n", rproc->name);
> >
> > - ret = rproc_stop(rproc, true);
> > - if (ret)
> > - goto unlock_mutex;
> > -
> > - /* generate coredump */
> > - rproc->ops->coredump(rproc);
> > -
> > - /* load firmware */
> > - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > - if (ret < 0) {
> > - dev_err(dev, "request_firmware failed: %d\n", ret);
> > - goto unlock_mutex;
> > - }
> > -
> > - /* boot the remote processor up again */
> > - ret = rproc_start(rproc, firmware_p);
> > -
> > - release_firmware(firmware_p);
> > + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_RECOVERY))
> > + ret = rproc_attach_recovery(rproc);
> > + else
> > + ret = rproc_firmware_recovery(rproc);
> >
> > unlock_mutex:
> > mutex_unlock(&rproc->lock);
> > --
> > 2.30.0
> >