Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc crash

From: Arnaud POULIQUEN
Date: Tue Sep 27 2022 - 04:14:48 EST


Hi,

On 9/27/22 05:03, Peng Fan wrote:
> Hi Mathieu,
>
>> Subject: Re: [PATCH V7 2/2] remoteproc: support attach recovery after rproc
>> crash
>>
>> On Tue, Jul 05, 2022 at 09:15:27AM +0800, Peng Fan (OSS) wrote:
>>> From: Peng Fan <peng.fan@xxxxxxx>
>>>
>>> Current logic only support main processor to stop/start the remote
>>> processor after crash. However to SoC, such as i.MX8QM/QXP, the remote
>>> processor could do attach recovery after crash and trigger watchdog to
>>> reboot itself. It does not need main processor to load image, or
>>> stop/start remote processor.
>>>
>>> Introduce two functions: rproc_attach_recovery, rproc_boot_recovery
>>> for the two cases. Boot recovery is as before, let main processor to
>>> help recovery, while attach recovery is to recover itself without help.
>>> To attach recovery, we only do detach and attach.
>>>
>>> Acked-by: Arnaud Pouliquen <arnaud.pouliquen@xxxxxxxxxxx>
>>> Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 62
>>> +++++++++++++++++++---------
>>> 1 file changed, 43 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>>> b/drivers/remoteproc/remoteproc_core.c
>>> index ed374c8bf14a..ef5b9310bc83 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -1884,6 +1884,45 @@ static int __rproc_detach(struct rproc *rproc)
>>> return 0;
>>> }
>>>
>>> +static int rproc_attach_recovery(struct rproc *rproc) {
>>> + int ret;
>>> +
>>> + ret = __rproc_detach(rproc);
>>> + if (ret)
>>> + return ret;
>>
>> I thought there was a specific reason to _not_ call rproc->ops->coredump()
>> for processors that have been attached to but looking at the STM32 and
>> IMX_DSP now, it would seem logical to do so. Am I missing something?
>
> ATTACH RECOVERY is to support recovery without help from Linux,
>
> STM32 and IMX_DSP, both require linux to load image and start remote
> core. So the two cases should not enable feature:
> RPROC_FEAT_ATTACH_ON_RECOVERY
>
> Also considering the recovery is out of linux control, actually when linux
> start dump, remote core may already recovered.

I asked myself the same question. Indeed how to manage a core dump if the
remote processor restarts autonomously.
The answer doesn't seem obvious because it seems to be platform specific.

For time being on STM32 we consider that the remoteproc memory can be corrupted
so we don't plan to enable the feature by default even if the hardware allows it.

If we implement it, I would see 2 use cases:
- no core dump, the remote processor restart autonomously (need to manage the
VIRTIO_CONFIG_S_NEEDS_RESET in resource table vdev for resynchronization)
- core dump and the Linux stm32 driver handle the reset of the remote
processor core to be able to perform the core dump (no firmware loading)

What about dealing with the coredump in a separate thread, based on a concrete
use case/need?

Regards,
Arnaud

>
>>
>> And this set will need a rebase.
>
> I'll do the rebase and send V8 if the upper explanation could eliminate
> your concern.
>
> Thanks,
> Peng.
>
>>
>> Thanks,
>> Mathieu
>>
>>> +
>>> + return __rproc_attach(rproc);
>>> +}
>>> +
>>> +static int rproc_boot_recovery(struct rproc *rproc) {
>>> + const struct firmware *firmware_p;
>>> + struct device *dev = &rproc->dev;
>>> + int ret;
>>> +
>>> + ret = rproc_stop(rproc, true);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /* generate coredump */
>>> + rproc->ops->coredump(rproc);
>>> +
>>> + /* load firmware */
>>> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>> + if (ret < 0) {
>>> + dev_err(dev, "request_firmware failed: %d\n", ret);
>>> + return ret;
>>> + }
>>> +
>>> + /* boot the remote processor up again */
>>> + ret = rproc_start(rproc, firmware_p);
>>> +
>>> + release_firmware(firmware_p);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> /**
>>> * rproc_trigger_recovery() - recover a remoteproc
>>> * @rproc: the remote processor
>>> @@ -1898,7 +1937,6 @@ static int __rproc_detach(struct rproc *rproc)
>>> */
>>> int rproc_trigger_recovery(struct rproc *rproc) {
>>> - const struct firmware *firmware_p;
>>> struct device *dev = &rproc->dev;
>>> int ret;
>>>
>>> @@ -1912,24 +1950,10 @@ int rproc_trigger_recovery(struct rproc
>>> *rproc)
>>>
>>> dev_err(dev, "recovering %s\n", rproc->name);
>>>
>>> - ret = rproc_stop(rproc, true);
>>> - if (ret)
>>> - goto unlock_mutex;
>>> -
>>> - /* generate coredump */
>>> - rproc->ops->coredump(rproc);
>>> -
>>> - /* load firmware */
>>> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
>>> - if (ret < 0) {
>>> - dev_err(dev, "request_firmware failed: %d\n", ret);
>>> - goto unlock_mutex;
>>> - }
>>> -
>>> - /* boot the remote processor up again */
>>> - ret = rproc_start(rproc, firmware_p);
>>> -
>>> - release_firmware(firmware_p);
>>> + if (rproc_has_feature(rproc, RPROC_FEAT_ATTACH_ON_RECOVERY))
>>> + ret = rproc_attach_recovery(rproc);
>>> + else
>>> + ret = rproc_boot_recovery(rproc);
>>>
>>> unlock_mutex:
>>> mutex_unlock(&rproc->lock);
>>> --
>>> 2.25.1
>>>