[PATCH v2] remoteproc: Add a new remoteproc state RPROC_DEFUNCT
From: Mukesh Ojha
Date: Mon Oct 14 2024 - 16:31:51 EST
Multiple call to glink_subdev_stop() for the same remoteproc can happen
if rproc_stop() fails from Process-A that leaves the rproc state to
RPROC_CRASHED state later a call to recovery_store from user space in
Process B triggers rproc_trigger_recovery() of the same remoteproc to
recover it results in NULL pointer dereference issue in
qcom_glink_smem_unregister().
There is other side to this issue if we want to fix this via adding a
NULL check on glink->edge which does not guarantees that the remoteproc
will recover in second call from Process B as it has failed in the first
Process A during SMC shutdown call and may again fail at the same call
and rproc can not recover for such case.
Add a new rproc state RPROC_DEFUNCT i.e., non recoverable state of
remoteproc and the only way to recover from it via system restart.
Process-A Process-B
fatal error interrupt happens
rproc_crash_handler_work()
mutex_lock_interruptible(&rproc->lock);
...
rproc->state = RPROC_CRASHED;
...
mutex_unlock(&rproc->lock);
rproc_trigger_recovery()
mutex_lock_interruptible(&rproc->lock);
adsp_stop()
qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
remoteproc remoteproc3: can't stop rproc: -22
mutex_unlock(&rproc->lock);
echo enabled > /sys/class/remoteproc/remoteprocX/recovery
recovery_store()
rproc_trigger_recovery()
mutex_lock_interruptible(&rproc->lock);
rproc_stop()
glink_subdev_stop()
qcom_glink_smem_unregister() ==|
|
V
Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000358
Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
Changes in v2:
- Removed NULL pointer check instead added a new state to signify
non-recoverable state of remoteproc.
drivers/remoteproc/remoteproc_core.c | 3 ++-
drivers/remoteproc/remoteproc_sysfs.c | 1 +
include/linux/remoteproc.h | 5 ++++-
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index f276956f2c5c..494c8fcc63ca 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1727,6 +1727,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
/* power off the remote processor */
ret = rproc->ops->stop(rproc);
if (ret) {
+ rproc->state = RPROC_DEFUNCT;
dev_err(dev, "can't stop rproc: %d\n", ret);
return ret;
}
@@ -1839,7 +1840,7 @@ int rproc_trigger_recovery(struct rproc *rproc)
return ret;
/* State could have changed before we got the mutex */
- if (rproc->state != RPROC_CRASHED)
+ if (rproc_start == RPROC_DEFUNCT || rproc->state != RPROC_CRASHED)
goto unlock_mutex;
dev_err(dev, "recovering %s\n", rproc->name);
diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
index 138e752c5e4e..5f722b4576b2 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c
@@ -171,6 +171,7 @@ static const char * const rproc_state_string[] = {
[RPROC_DELETED] = "deleted",
[RPROC_ATTACHED] = "attached",
[RPROC_DETACHED] = "detached",
+ [RPROC_DEFUNCT] = "defunct",
[RPROC_LAST] = "invalid",
};
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index b4795698d8c2..3e4ba06c6a9a 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -417,6 +417,8 @@ struct rproc_ops {
* has attached to it
* @RPROC_DETACHED: device has been booted by another entity and waiting
* for the core to attach to it
+ * @RPROC_DEFUNCT: device neither crashed nor responding to any of the
+ * requests and can only recover on system restart.
* @RPROC_LAST: just keep this one at the end
*
* Please note that the values of these states are used as indices
@@ -433,7 +435,8 @@ enum rproc_state {
RPROC_DELETED = 4,
RPROC_ATTACHED = 5,
RPROC_DETACHED = 6,
- RPROC_LAST = 7,
+ RPROC_DEFUNCT = 7,
+ RPROC_LAST = 8,
};
/**
--
2.34.1