Re: [patch V2 00/12] rseq: Implement time slice extension mechanism

From: Thomas Gleixner

Date: Mon Oct 27 2025 - 14:53:28 EST


On Mon, Oct 27 2025 at 18:30, Sebastian Andrzej Siewior wrote:

> | slice_test-2903 [001] d..2. 2313.285484: hrtimer_cancel: hrtimer=0000000030a688cc
> extension granted, timer started and revoked and set need resched.
>
> | slice_test-2903 [001] dN.2. 2313.285487: sched_stat_runtime: comm=slice_test pid=2903 runtime=36886 [ns]
> This is coming from schedule() already. It took me a while since I was
> hunting a missing clear of need-resched.
>
> | slice_test-2903 [001] d..2. 2313.285489: sched_switch: prev_comm=slice_test prev_pid=2903 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/1 next_pid=32 next_prio=120
> | ksoftirqd/1-32 [001] ..s.1 2313.285490: softirq_entry: vec=7 [action=SCHED]
> | ksoftirqd/1-32 [001] ..s.1 2313.285501: softirq_exit: vec=7 [action=SCHED]
> | ksoftirqd/1-32 [001] d..2. 2313.285502: sched_stat_runtime: comm=ksoftirqd/1 pid=32 runtime=16438 [ns]
> | ksoftirqd/1-32 [001] d..2. 2313.285503: sched_switch: prev_comm=ksoftirqd/1 prev_pid=32 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2904 next_prio=120
> | slice_test-2904 [001] ..... 2313.285507: sys_enter: NR 230 (1, 0, 7f4692c7baa0, 0, 0, 0)
> | slice_test-2904 [001] ..... 2313.285507: hrtimer_setup: hrtimer=00000000f2d53899 clockid=CLOCK_MONOTONIC mode=REL
> | slice_test-2904 [001] d..1. 2313.285507: hrtimer_start: hrtimer=00000000f2d53899 function=hrtimer_wakeup expires=2313208168792 softexpires=2313208118792 mode=REL
> | slice_test-2904 [001] d..2. 2313.285508: sched_stat_runtime: comm=slice_test pid=2904 runtime=6149 [ns]
> | slice_test-2904 [001] d..2. 2313.285510: sched_switch: prev_comm=slice_test prev_pid=2904 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2903 next_prio=120
> | slice_test-2903 [001] ..... 2313.285510: sys_enter: NR 470 (7fffc04f1ff0, c350, 11a0e0, 0, 7f4692e99000, 0)
>
> slice_test-2903 enters _now_ rseq_slice_yield() so it must have been in
> userland during the suppressed wake up at 2313.285457.
> But a few iterations later it turns at out this trace event is recorded
> _after_ the rseq magic happens at sys_enter time. We entered
> rseq_slice_yield() a few cycles after the extension was granted. Buh.
> So it seems to work as intended but it is not obvious tell from tracing
> why it does not work.

Tracing of the syscall happens _after_ syscall_trace_enter() invoked
rseq_syscall_enter_work() which canceled the timer and set
NEED_RESCHED. That immediately rescheduled _after_ the preempt enable:

syscall()
do_syscall_64()
syscall_enter_from_user_mode() {
syscall_enter_from_user_mode_work()
syscall_trace_enter()
rseq_syscall_enter_work()
preempt_disable()
hrtimer_try_to_cancel()
remove_hrtimer() <- tracepoint
set_need_resched()
preempt_enable()
schedule()
...
trace_sys_enter() <- tracepoint

Even if it would not reschedule immediately the ordering would be
reverse.

Thanks,

tglx