Re: [PATCH v4 6/8] sched/fair: Add sched group latency support
From: Vincent Guittot
Date: Wed Sep 21 2022 - 03:48:49 EST
On Tue, 20 Sept 2022 at 20:17, Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> On 19/09/2022 17:49, Vincent Guittot wrote:
> > On Mon, 19 Sept 2022 at 13:55, Dietmar Eggemann
> > <dietmar.eggemann@xxxxxxx> wrote:
> >>
> >> s/valentin.schneider@xxxxxxx//
> >>
> >> On 16/09/2022 10:03, Vincent Guittot wrote:
> >>> Task can set its latency priority, which is then used to decide to preempt
> >>> the current running entity of the cfs, but sched group entities still have
> >>> the default latency offset.
> >>>
> >>> Add a latency field in task group to set the latency offset of the
> >>> sched_eneities of the group, which will be used against other entities in
> >>
> >> s/sched_eneities/sched_entity
> >>
> >>> the parent cfs when deciding which entity to schedule first.
> >>
> >> So latency for cgroups does not follow any (existing) Resource
> >> Distribution Model/Scheme (Documentation/admin-guide/cgroup-v2.rst)?
> >> Latency values are only used to compare sched entities at the same level.
> >
> > Just like share/cpu.weight value does for time sharing
>
> But for this we define it as following the `Weights` scheme. That's why
> I was asking,
>
> >> [...]
> >>
> >>> +static int cpu_latency_write_s64(struct cgroup_subsys_state *css,
> >>> + struct cftype *cft, s64 latency)
> >>> +{
> >>
> >> There is no [MIN, MAX] checking?
> >
> > This is done is sched_group_set_latency() which checks that
> > abs(latency) < sysctl_sched_latency
>
> I see. Nit-picking: Wouldn't this allow to specify a latency offset
> value for the non-existent `nice = 20`? Highest nice value 19 maps to
> `973/1024 * sysctl_sched_latency`.
yes, but the same applies for tg->shares and cpu.weight as we can set
a tg->shares of 104,857,600 whereas the max shares for nice -20 is
90,891,264. Furthermore, I don't see a real problem with the ability
to set a latency offset up to sysctl_sched_latency because it's about
being even more nice with other task and not the opposite
>
> >
> >>
> >> min_weight = sched_latency_to_weight[0] = -1024
> >> max_weight = sched_latency_to_weight[39] = 973
> >>
> >> [MIN, MAX] = [sysctl_sched_latency * min_weight >> NICE_LATENCY_SHIFT,
> >> sysctl_sched_latency * max_weight >> NICE_LATENCY_SHIFT]
> >>
> >>
> >> With the `cpu.latency` knob user would have to know for example that the
> >> value is -24,000,000ns to get the same behaviour as for a task latency
> >> nice = -20 (latency prio = 0) (w/ sysctl_sched_latency = 24ms)?
> >
> > Yes, Tejun raised some concerns about adding an interface like nice in
> > the task group in v2 so I have removed it.
> >
> >>
> >> For `nice` we have `cpu.weight.nice` next to `cpu.weight` in cgroup v2 ?
> >
> > If everybody is ok, I can add back the cpu.latency.nice interface in
> > the v5 in addition to the cpu.latency
>
> cpu.weight/cpu.weight.nice interface:
>
> echo X > cpu.weight tg->shares
>
> 1 10,240
> 100 1,048,576
> 10000 104,857,600
>
> echo X > cpu.weight.nice
>
> -20 90,891,264
> 0 1,048,576
> 19 15,360
>
> Wouldn't then a similar interface for cpu.latency [1..100..10000] and
> cpu.latency.nice [-20..0..19] make most sense?
We need at least a signed value for cpu.latency to make the difference
between a sensitivity to the latency or a not careness
>
> Raw latency_offset values at interface level are not portable.
I can use [-1000:1000] but I' not sure it's better than the raw value at the end
>
>