Re: ~90s reboot delay with v6.19 and PREEMPT_RT

From: Sebastian Andrzej Siewior

Date: Wed Feb 25 2026 - 17:31:54 EST


On 2026-02-25 17:59:55 [+0100], To Bert Karwatzki wrote:
> On 2026-02-25 17:37:56 [+0100], Bert Karwatzki wrote:
> > I've done two testruns with this debug patch (The persistant log buffer works now, thanks
> > again to Steven Rostedt):
>
>
> > This if from PREEMPT_RT log, there*s a long pause in which cgroup_task_dead() is not called
>
> Yeah, I don't know why. The irq-work is invoked slightly delayed so that
> part is working.

In the good case I have
| systemd-1818 3....2 605751us : cgroup_notify_populated: root=0 id=2382 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket val=1
|systemct-1911 1....1 620046us : cgroup_attach_task: dst_root=0 dst_id=2382 dst_level=5 dst_path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket pid=1911 comm=(ystemctl)
|systemct-1911 1..... 642680us : sched_process_exit: comm=systemctl pid=1911 prio=120 group_dead=true
|systemct-1911 1....2 643423us : signal_generate: sig=17 errno=0 code=1 comm=systemd pid=1818 grp=1 res=0
|systemct-1911 1d..2. 643432us : sched_switch: prev_comm=systemctl prev_pid=1911 prev_prio=120 prev_state=Z ==> next_comm=systemd next_pid=1818 next_prio=120

|irq_work-29 1....2 643450us : cgroup_notify_populated: root=0 id=2382 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket val=0
|irq_work-29 1....2 643457us : cgroup_notify_populated: root=0 id=2229 level=4 path=/user.slice/user-0.slice/user@0.service/app.slice val=0
| systemd-1818 1....1 644548us : cgroup_rmdir: root=0 id=2382 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket
| systemd-1818 1..... 644784us : sched_process_wait: comm=systemd pid=1911 prio=120

and in the bad case

| systemd-1828 3....2 312877us : cgroup_notify_populated: root=0 id=2419 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket val=1
|systemct-1929 2....1 321916us : cgroup_attach_task: dst_root=0 dst_id=2419 dst_level=5 dst_path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket pid=1929 comm =(ystemctl)
|systemct-1929 1..... 341432us : sched_process_exit: comm=systemctl pid=1929 prio=120 group_dead=true
|systemct-1929 1.l..2 342623us : signal_generate: sig=17 errno=0 code=1 comm=systemd pid=1828 grp=1 res=0
|systemct-1929 1d..2. 342637us : sched_switch: prev_comm=systemctl prev_pid=1929 prev_prio=120 prev_state=Z ==> next_comm=systemd next_pid=1828 next_prio=120

| systemd-1828 1....1 343099us : signal_generate: sig=15 errno=0 code=0 comm=systemctl pid=1929 grp=1 res=1
| systemd-1828 1....1 343102us : signal_generate: sig=18 errno=0 code=0 comm=systemctl pid=1929 grp=1 res=1
| systemd-1828 1..... 343292us : sched_process_wait: comm=systemd pid=1929 prio=120
| systemd-1828 1..... 343442us : sched_process_wait: comm=systemd pid=0 prio=120

|irq_work-29 1....2 343725us : cgroup_notify_populated: root=0 id=2419 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket val=0
| systemd-1828 1....1 90457054us : cgroup_rmdir: root=0 id=2419 level=5 path=/user.slice/user-0.slice/user@0.service/app.slice/ssh-agent.socket

Until the sched_switch, everything is the same. But then systemd-1828
(the one with the cgroup_notify_populated event) seems to get impatient
and sends a SIGTERM+SIGCONT. It gets the exit code, the
cgroup_notify_populated event is there later and just once. The
app.slice notify is missing. And the rmdir gets in much later.

Did systemd kill the app.slice at level=4? Is this relevant? I don't see
any immediate wake up or signal from within irq_work or
kernfs_notify_workfn() later on.

Sebastian