Re: [PATCH] sched: Add a new version sysctl to control child runs first

From: CGEL
Date: Tue Sep 14 2021 - 00:05:31 EST


esOn Mon, Sep 13, 2021 at 03:42:45PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 13, 2021 at 11:37:31AM +0000, CGEL wrote:
> > On Mon, Sep 13, 2021 at 10:13:54AM +0200, Peter Zijlstra wrote:
> > > On Sun, Sep 12, 2021 at 04:12:23AM +0000, cgel.zte@xxxxxxxxx wrote:
> > > > From: Yang Yang <yang.yang29@xxxxxxxxxx>
> > > >
> > > > The old version sysctl has some problems. First, it allows set value
> > > > bigger than 1, which is unnecessary. Second, it didn't follow the
> > > > rule of capabilities. Thirdly, it didn't use static key. This new
> > > > version fixes all the problems.
> > >
> > > Does any of that actually matter?
> >
> > For the first problem, I think the reason why sysctl_schedstats() only
> > accepts 0 or 1, is suitbale for sysctl_child_runs_first(). Since
> > task_fork_fair() only need sysctl_sched_child_runs_first to be
> > zero or non-zero.
>
> This could potentially break people that already write a larger value in
> it -- by accident or otherwise.

Thanks for reply!

You mean it's right to set sched_child_runs_first 0 or 1, but consider about
compatibility, just leave it?
Should stable/longterm branches keep compatibility, but linux-next fixes it?

Let's take a look at negative influence about unnecessary values of sysctl.
Some tune tools will automatic to set different values of sysctl to see
performance impact. So invalid values may waste tune tools's time, specially
when the range of values is big.

For example A-Tune, see below:
https://docs.openeuler.org/zh/docs/20.03_LTS/docs/A-Tune/%E8%AE%A4%E8%AF%86A-Tune.html
Since it's wroten in Chinese, I try to explain it in short.
A-Tune modeling sysctls first(what values sysctls accept), then automatic to iterate
different values to find the best combination of sysctl values for the workload.

>
> > For the second problem, I remember there is a rule: try to
> > administration system through capilities but not depends on
> > root identity. Just like sysctl_schedstats() or other
> > sysctl_xx().
>
> It seems entirely daft to me; those files are already 644, if root opens
> the file and passes it along, it gets to keep the pieces.
>

I think it's indeed a little tricky: root may drop it's own capabilites.
Let's see another example of netdev_store(), root can't modify netdev
attribute without CAP_NET_ADMIN, even it pass the 644 DAC check.

> > For the thirdly problem, sysctl_child_runs_first maynot changes
> > often, but may accessed often, like static_key delayacct_key
> > controlled by sysctl_delayacct().
>
> Can you actually show it makes a performance difference in a fork
> micro-bench? Given the amount of gunk fork() already does, I don't think
> it'll matter one way or the other, and in that case, simpler is better.

With 5.14-rc6 and gcc6.2.0, this patch will reduce test instruct in
task_fork_fair() as Documentation/staging/static-keys.rst said.
Since task_fork_fair() may called often, I think it's OK to use static
key, actually there are quit a lot static keys in kernel/xx.

When talk about simply, maybe keep in consistent with other sysctls like
task_delayacct() is also a kind of simply in code style.

Before this patch:
ffff810a5c60 <task_fork_fair>:
..
ffffffff810a5cf3: e8 a8 b3 ff ff callq ffffffff810a10a0 <place_entity>
ffffffff810a5cf8: 8b 05 e2 b5 5d 01 mov 0x15db5e2(%rip),%eax # ffffffff826812e0 <sysctl_sched_child_runs_first>
ffffffff810a5cfe: 85 c0 test %eax,%eax
ffffffff810a5d00: 74 5b je ffffffff810a5d5d <task_fork_fair+0xfd>
ffffffff810a5d02: 49 8b 55 50 mov 0x50(%r13),%rdx
ffffffff810a5d06: 49 8b 84 24 10 01 00 mov 0x110(%r12),%rax
ffffffff810a5d0d: 00
ffffffff810a5d0e: 48 39 c2 cmp %rax,%rdx
ffffffff810a5d11: 78 36 js ffffffff810a5d49 <task_fork_fair+0xe9>
ffffffff810a5d13: 48 2b 45 28 sub 0x28(%rbp),%rax

After this patch:
ffffffff810a5c60 <task_fork_fair>:
..
ffffffff810a5cf3: e8 a8 b3 ff ff callq ffffffff810a10a0 <place_entity>
ffffffff810a5cf8: 66 90 xchg %ax,%ax
ffffffff810a5cfa: 49 8b 84 24 10 01 00 mov 0x110(%r12),%rax
ffffffff810a5d01: 00
ffffffff810a5d02: 48 2b 45 28 sub 0x28(%rbp),%rax

Thanks!