Re: [PATCH] remoteproc: k3-r5: Decouple firmware booting from probe routine

From: Beleswar Prasad Padhi
Date: Mon Sep 09 2024 - 00:16:32 EST



On 07/09/24 15:41, Kumar, Udit wrote:

On 9/6/2024 3:10 PM, Beleswar Padhi wrote:
The current implementation of the waiting mechanism in probe() waits for
the 'released_from_reset' flag to be set which is done in
k3_r5_rproc_prepare() as part of rproc_fw_boot(). This causes unexpected
failures in cases where the firmware is unavailable at boot time,
resulting in probe failure and removal of the remoteproc handles in the
sysfs paths.

I won't say failure, I will say this is behavior of driver.

Driver expect firmware to be available , but I agree driver should be able to execute

with/without firmware availability.


To address this, the waiting mechanism is refactored out of the probe
routine into the appropriate k3_r5_rproc_prepare/unprepare() and
k3_r5_rproc_start/stop() functions. This allows the probe routine to
complete without depending on firmware booting, while still maintaining
the required power-synchronization between cores.

Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1")
Signed-off-by: Beleswar Padhi <b-padhi@xxxxxx>
---
Posted this as a Fix as this was breaking usecases where we wanted to load a
firmware by writing to sysfs handles in userspace.

  drivers/remoteproc/ti_k3_r5_remoteproc.c | 170 ++++++++++++++++-------
  1 file changed, 118 insertions(+), 52 deletions(-)

diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
[..]
+    core0 = list_first_entry(&cluster->cores, struct k3_r5_core, elem);
+    core1 = list_last_entry(&cluster->cores, struct k3_r5_core, elem);
+    if (cluster->mode == CLUSTER_MODE_SPLIT && core == core1 &&
+        core0->released_from_reset == false) {
+        ret = wait_event_interruptible_timeout(cluster->core_transition,
+                               core0->released_from_reset,
+                               msecs_to_jiffies(2000));
only one wait in start should be good enough,


Won't that cause race conditions, where prepare for core1 can be called before core0?

+        if (ret <= 0) {
+            dev_err(dev, "can not power up core1 before core0");
+            return -EPERM;
+        }
+    }
+
      ret = ti_sci_proc_get_status(core->tsp, &boot_vec, &cfg, &ctrl, &stat);
      if (ret < 0)
          return ret;
@@ -470,6 +492,12 @@ static int k3_r5_rproc_prepare(struct rproc *rproc)
          return ret;
      }
  +    /* Notify all threads in the wait queue when core state has changed so
+     * that threads waiting for this condition can be executed.
+     */
+    core->released_from_reset = true;
+    wake_up_interruptible(&cluster->core_transition);
+
      /*
       * Newer IP revisions like on J7200 SoCs support h/w auto-initialization
       * of TCMs, so there is no need to perform the s/w memzero. This bit is
@@ -515,14 +543,46 @@ static int k3_r5_rproc_unprepare(struct rproc *rproc)
  {
      struct k3_r5_rproc *kproc = rproc->priv;
      struct k3_r5_cluster *cluster = kproc->cluster;
-    struct k3_r5_core *core = kproc->core;
+    struct k3_r5_core *core0, *core1, *core = kproc->core;


why you need wait in unprepare/stop,

In case you are failing during firmware load or so then already we are in bad state.

if this is call from user land then, i don't except anyone will auto trigger stopping of core-0

IMO, in unprepare/stop, if action is attempted on core1 with core-0 ON, simply return error.


Yes, you are correct. Will include this change in revision. Thanks!



[..]
@@ -629,7 +702,7 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
      struct k3_r5_rproc *kproc = rproc->priv;
      struct k3_r5_cluster *cluster = kproc->cluster;
      struct device *dev = kproc->dev;