Re: [PATCH 1/2] firmware: smccc: add timeout, touch wdt
From: Andre Przywara
Date: Tue Feb 24 2026 - 05:58:42 EST
Hi Veda,
On 2/10/26 23:40, Vedashree Vidwans wrote:
Enhance PRIME/ACTIVATION functions to touch watchdog and implement
timeout mechanism. This update ensures that any potential hangs are
detected promptly and that the LFA process is allocated sufficient
execution time before the watchdog timer expires. These changes improve
overall system reliability by reducing the risk of undetected process
stalls and unexpected watchdog resets.
Many thanks for that, I think it's a very good idea to take care of the watchdog and to avoid an infinite loop in the AGAIN case.
I have some comments about some details below ....
Signed-off-by: Vedashree Vidwans <vvidwans@xxxxxxxxxx>
---
drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
index da6b54fe1685..b0ace6fc8dac 100644
--- a/drivers/firmware/smccc/lfa_fw.c
+++ b/drivers/firmware/smccc/lfa_fw.c
@@ -17,6 +17,9 @@
#include <linux/array_size.h>
#include <linux/list.h>
#include <linux/mutex.h>
+#include <linux/nmi.h>
+#include <linux/ktime.h>
+#include <linux/delay.h>
#undef pr_fmt
#define pr_fmt(fmt) "Arm LFA: " fmt
@@ -37,6 +40,14 @@
#define LFA_PRIME_CALL_AGAIN BIT(0)
#define LFA_ACTIVATE_CALL_AGAIN BIT(0)
+/* Prime loop limits, TODO: tune after testing */
+#define LFA_PRIME_BUDGET_US 30000000 /* 30s cap */
+#define LFA_PRIME_POLL_DELAY_US 10 /* 10us between polls */
+
+/* Activation loop limits, TODO: tune after testing */
+#define LFA_ACTIVATE_BUDGET_US 20000000 /* 20s cap */
+#define LFA_ACTIVATE_POLL_DELAY_US 10 /* 10us between polls */
+
/* LFA return values */
#define LFA_SUCCESS 0
#define LFA_NOT_SUPPORTED 1
@@ -219,6 +230,7 @@ static int call_lfa_activate(void *data)
struct image_props *attrs = data;
struct arm_smccc_1_2_regs args = { 0 };
struct arm_smccc_1_2_regs res = { 0 };
+ ktime_t end = ktime_add_us(ktime_get(), LFA_ACTIVATE_BUDGET_US);
args.a0 = LFA_1_0_FN_ACTIVATE;
args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
@@ -232,6 +244,8 @@ static int call_lfa_activate(void *data)
args.a2 = !(attrs->cpu_rendezvous_forced || attrs->cpu_rendezvous);
for (;;) {
+ /* Touch watchdog, ACTIVATE shouldn't take longer than watchdog_thresh */
+ touch_nmi_watchdog();
arm_smccc_1_2_invoke(&args, &res);
if ((long)res.a0 < 0) {
@@ -241,6 +255,15 @@ static int call_lfa_activate(void *data)
}
if (!(res.a1 & LFA_ACTIVATE_CALL_AGAIN))
break; /* ACTIVATE successful */
+
+ /* SMC returned with call_again flag set */
+ if (ktime_before(ktime_get(), end)) {
+ udelay(LFA_ACTIVATE_POLL_DELAY_US);
I don't think we should wait here at all, and definitely not with udelay: https://docs.kernel.org/timers/delay_sleep_functions.html
Instead we should move the "call again" (and timeout) mechanism out of this function, into activate_fw_image(), so that we exit the stop_machine(). Otherwise we would still block everything. Doing it there, where we should be preemptible, would give the kernel a chance to do some housekeeping. If there is nothing for the kernel to do, then I think it's fine to immediately call lfa_activate() again, after a cond_resched(), for instance.
+ continue;
+ }
+
+ pr_err("ACTIVATE for image %s timed out", attrs->image_name);
+ return -ETIMEDOUT;
}
return res.a0;
@@ -290,6 +313,7 @@ static int prime_fw_image(struct image_props *attrs)
{
struct arm_smccc_1_2_regs args = { 0 };
struct arm_smccc_1_2_regs res = { 0 };
+ ktime_t end = ktime_add_us(ktime_get(), LFA_PRIME_BUDGET_US);
int ret;
mutex_lock(&lfa_lock);
@@ -317,6 +341,8 @@ static int prime_fw_image(struct image_props *attrs)
args.a0 = LFA_1_0_FN_PRIME;
args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
for (;;) {
+ /* Touch watchdog, PRIME shouldn't take longer than watchdog_thresh */
+ touch_nmi_watchdog();
arm_smccc_1_2_invoke(&args, &res);
if ((long)res.a0 < 0) {
@@ -328,6 +354,20 @@ static int prime_fw_image(struct image_props *attrs)
}
if (!(res.a1 & LFA_PRIME_CALL_AGAIN))
break; /* PRIME successful */
+
+ /* SMC returned with call_again flag set */
+ if (ktime_before(ktime_get(), end)) {
+ udelay(LFA_PRIME_POLL_DELAY_US);
same comment here, please no udelay().
This should also avoid the discussion about the exact values of the sleep periods.
I'd just have one generous timeout (a few seconds, basically what your BUDGET values do above), to avoid looping forever in case of a firmware bug, for instance.
Cheers,
Andre
+ continue;
+ }
+
+ pr_err("LFA_PRIME for image %s timed out", attrs->image_name);
+ mutex_unlock(&lfa_lock);
+
+ ret = lfa_cancel(attrs);
+ if (ret != 0)
+ return ret;
+ return -ETIMEDOUT;
}
mutex_unlock(&lfa_lock);