Re: [PATCH] sched/fork: Fix timer_slack_ns inheritance for RT tasks

From: K Prateek Nayak

Date: Wed May 20 2026 - 11:09:34 EST


Hello Xiao,

On 5/20/2026 6:37 PM, Xiao Feng via B4 Relay wrote:
> Both problems prevent timer coalescing for these CFS tasks, causing
> unnecessary wakeups and increased power consumption. Writing 0 to
> /proc/pid/timerslack_ns also cannot restore a proper default.
>
> Fix both issues:
>
> 1. In copy_process(), inherit default_timer_slack_ns from the parent's
> default_timer_slack_ns (which is preserved across RT transitions)
> instead of timer_slack_ns (which is 0 for RT tasks).

man page for fork() [1] reads:

The default timer slack value is set to the parent's current
timer slack value. See the description of PR_SET_TIMERSLACK in
prctl(2).

And that description in man page for PR_SET_TIMERSLACK [2] reads:

When a new thread is created, the two timer slack values are made
the same as the "current" value of the creating thread.


The two timer slack value that the man page refers to above is the
"default" and the "current" value as is describes in the opening
statement.

>From a documentation standpoint, it is doing the right thing. A
RT thread returns 0 for PR_GET_TIMERSLACK and the same is set as
"default" and "current" for its children.

[1] https://man7.org/linux/man-pages/man2/fork.2.html
[2] https://man7.org/linux/man-pages/man2/pr_set_timerslack.2const.html

>
> 2. In sched_fork(), when sched_reset_on_fork demotes RT/DL to CFS,
> explicitly restore timer_slack_ns from the parent's
> default_timer_slack_ns, falling back to 50us if it is also 0.

As for SCHED_FLAG_RESET_ON_FORK, man page for sched() [3] reads:

More precisely, if the reset-on-fork flag is set, the following
rules apply for subsequently created children:

- If the calling thread has a scheduling policy of SCHED_FIFO or
SCHED_RR, the policy is reset to SCHED_OTHER in child
processes.

- If the calling process has a negative nice value, the nice
value is reset to zero in child processes.

After the reset-on-fork flag has been enabled, it can be reset
only if the thread has the CAP_SYS_NICE capability. This flag is
disabled in child processes created by fork(2).


Nowhere it says anything other than the scheduling policy is affected by
this flag. How is timer_slack any special?

[3] https://man7.org/linux/man-pages/man7/sched.7.html

>
> Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")

Based on my reading, this is not fixing anything but instead introducing
a behavior change contrary to what has been currently documented.

If it is acceptable, at the very least, the man pages need to be updated
stating this new behavior and the kernel version that introduces it.

I'll let others comment since they know these bits better than me.

--
Thanks and Regards,
Prateek