Re: ~90s shutdown delay with v6.19 and PREEMPT_RT

From: Steven Rostedt

Date: Mon Feb 23 2026 - 03:23:16 EST


On Mon, 23 Feb 2026 01:35:36 +0100
Bert Karwatzki <spasswolf@xxxxxx> wrote:

> So the time to was is ~120s with PREEMPT_RT and 7s without.
>
> The interesting difference between these two traces is that the second one only
> contains messages with "status" d..2. while the first also contains some with different status
> (191 of 265126). Could these be the reason for the delay.
>
> $ grep -v d..2. trace.txt
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 265126/265126 #P:16
> #
> # _-----=> irqs-off/BH-disabled
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> <...>-1584 [011] D..22 62.779670: sched_switch: prev_comm=ntpd prev_pid=0x630 (1584) prev_prio=0x78 (120) prev_state=0x100 (256)
> next_comm=mt76-tx phy0 next_pid=0x5fb (1531) next_prio=0x62 (98)

The 'D' means both interrupts 'd' and softirqs 'b' are disabled.

The last number is migrate disable which means the task is pinned to a
CPU. That may be an issue if the system is trying to take down a CPU
and there's a task pinned to it.

Now that we know that the persistent ring buffer works, we can add even
more debugging. We could see where things are stuck...

cd /sys/kernel/tracing/instances/boot_map
echo 'stacktrace if prev_state & 3' > events/sched/sched_switch/trigger

That will do a stacktrace at every location that schedules out in a
non-running state. That way we can see what is waiting for something to
finish.

Then in a separate boot, we may want to see where things are pinned.

echo 'stacktrace if common_flags & 0xf00' > events/sched/sched_switch/trigger

That will do a stacktrace every time a task schedules out with
migration disabled.

-- Steve