[RFC PATCH v2 02/10] rv/da: fix per-task da_monitor_destroy() ordering and sync

From: wen . yang

Date: Mon May 11 2026 - 14:25:41 EST


From: Wen Yang <wen.yang@xxxxxxxxx>

The following two paths race:

CPU 0 (disable_stall/__rv_disable_monitor) CPU 1 (wwnr probe handler)
------------------------------------------ -----------------------------
disable_stall()
da_monitor_destroy()
da_monitor_reset_all() <------ [task T: monitoring=0]
da_monitor_start(&T->rv[n])
/* no timer_setup */
monitoring=1 <----
tracepoint_synchronize_unregister()
// CPU 1 probe has already returned; sync returns

Later, enable_stall() acquires the same slot and calls da_monitor_init():

da_monitor_reset_all()
da_monitor_reset(&T->rv[slot]) // monitoring=1, timer.function==0
ha_monitor_reset_env()
ha_cancel_timer()
timer_delete(&ha_mon->timer) // ODEBUG: timer never initialised

ODEBUG: assert_init not available (active state 0)
object type: timer_list
Call trace: timer_delete <- da_monitor_reset_all <- enable_stall

Call tracepoint_synchronize_unregister() inside da_monitor_destroy()
before da_monitor_reset_all(). The unregister_trace_xxx() calls in the
monitor's disable() have already disconnected the tracepoints; the sync
here drains any handler still in flight, so no new monitoring=1 can
appear after da_monitor_reset_all() clears the slot.

Also fix the slot release ordering: release the slot only after
reset_all() to avoid accessing rv[] with an out-of-bounds index.

Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Signed-off-by: Wen Yang <wen.yang@xxxxxxxxx>
---
include/rv/da_monitor.h | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 00ded3d5ab3f..d04bb3229c75 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -304,6 +304,20 @@ static int da_monitor_init(void)

/*
* da_monitor_destroy - return the allocated slot
+ *
+ * Call tracepoint_synchronize_unregister() before reset_all() to close
+ * the race where an in-flight non-HA probe handler sets monitoring=1
+ * (without calling timer_setup()) after da_monitor_reset_all() has
+ * already cleared the slot but before the caller's own sync completes.
+ * Without this barrier, an HA_TIMER_WHEEL monitor that later acquires
+ * the same slot would call timer_delete() on a never-initialised
+ * timer_list, triggering ODEBUG warnings.
+ *
+ * Note: tracepoint_synchronize_unregister() is a system-wide barrier
+ * that waits for all CPUs to finish any in-flight tracepoint handlers.
+ * The caller's own __rv_disable_monitor() issues a second sync after
+ * returning from disable(); that redundant call is harmless on the
+ * infrequent admin (enable/disable) path.
*/
static inline void da_monitor_destroy(void)
{
@@ -311,10 +325,10 @@ static inline void da_monitor_destroy(void)
WARN_ONCE(1, "Disabling a disabled monitor: " __stringify(MONITOR_NAME));
return;
}
+ tracepoint_synchronize_unregister();
+ da_monitor_reset_all();
rv_put_task_monitor_slot(task_mon_slot);
task_mon_slot = RV_PER_TASK_MONITOR_INIT;
-
- da_monitor_reset_all();
}

#elif RV_MON_TYPE == RV_MON_PER_OBJ
--
2.25.1