Re: [PATCH V2] hrtimer: Check running timer state
From: enlin.mu
Date: Sun Nov 16 2025 - 18:59:27 EST
On 2025/11/15 02:34, Thomas Gleixner wrote:
On Fri, Nov 14 2025 at 19:42, Enlin Mu wrote:
When the running timer is not NULL, print debugging information.
What for?
The test code is roughly as follows:
static struct hrtimer serial_timer;
enum hrtimer_restart serial_timer_handler(struct hrtimer * timer)
{
local_irq_disable();
What is this for?
Hi Thomas
This code comes from my colleauge who is not familiar with the
running machanism of hrtimer, which is why there is this logical error.
......
do_someting();
copy_data_with_dma();
......
hrtimer_forward_now(*serial_timer, ns_to_ktime(1000*2000));
local_irq_enable();
And this?
return HRTIMER_RESTART;
}
static int serial_start(struct uart_port *port)
{
......
......
hrtime_init(&serial_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
That function does not exist.
ktime = no_to_ktime(1000*2000);
serial_timer.function = serial_timer_handler;
hrtimer_start(&serial_timer, ktime, HRTIMER_MODE_REL);
......
return 0;
}
static void serial_shutdown(struct uart_port *port)
{
......
hrtimer_cancle(&serial_timer);
......
serial_release_dma(port);
......
}
The cpu6 canceled serial_timer and released dma,but the
serial_timer still ran many times on CPU7 until a panic occurred.
The reason for the panic is that serial_timer accessed the
released dma,But the serial_timer had been canceled for
some time now on cpu6.
You still fail to explain how the timer can still run after being
canceled.
Hi Thomas
The serial_timer on cpu6 has not been properly cancelled.
The serial_timer is once again entered into the active queue on cpu7.
---------------------------
on cpu7
step 1: preempted by hrtimer_usb
static void __run_hrtimer(.....)
{
......
base->running = timer; //timer is serial_timer;
.....
.....
restart = fn(timer); // fn is serial_timer_handler
.... ==> irq is enable. serial_timer is preempted by hrtimer_usb
....
}
step 2: hrtimer_usb return
static void __run_hrtimer(.....)
{
......
base->running = timer; //timer is hrtimer_usb;
.....
.....
restart = fn(timer);
....
}
step 3: serial_timer continues to execute
static void __run_hrtimer(.....)
{
......
(restart != HRTIMER_NORESTART &&
timer->state & HRTIMER_STATE_ENQUEUED))
(timer, base, HRTIMER_MODE_ABS); //timer is serial_timer
}
step 4: serial_timer run again.
.......
........
step n: serial_timer run again.
.......
step n+1: serial_timer run again.
.......
panic from serial_timer;
---------------------------
on cpu6
step 1:
hrtimer_cancel
step 2:
hrtimer_try_to_cancel
step 3:
hrtimer_active return false, because running timer is hrtimer_usb
on cpu7(step 2 on cpu7)
step 4:
hrtimer_cancel returns 0
step 5:
serial_timer is added active queue on cpu 7
---------------------------
Due to English not being my native language, I am unable to
explain this exception with a more complex description, and
can only provide a simple step-by-step explanation.
The cpu6 can successfully cancel the serial_timer because the
running timer has changed and it is another timer(such as
hrtimer_usb).
After that the timer _cannot_ be running anymore unless some other code
re-arms it afterwards.
When the serial_timer is enable to interrupt, the next hrtimer
(such as hrtimer_usb) on cpu7 preempts the return of ther serial_timer,
causing a change in the running timer.
Then fix your timer callback. The callback is invoked in hard interrupt
context and the callback enables interrupts, which is a NONO. You
clearly never ran your code with lockdep enabled. It would have told you
so.
Signed-off-by: Enlin Mu <enlin.mu@xxxxxxxxx>
Signed-off-by: Enlin Mu <enlin.mu@xxxxxxxxxx>
Signed-off-by: Enlin Mu <enlin.mu@xxxxxxxxx>
Interesting Signed-off-by chain. Seems you're co-developing this patch
with your Alter ego.
Thanks,
tglx