Re: [PATCH] sched: cgroup SCHED_IDLE support

From: Josh Don
Date: Fri Jun 11 2021 - 19:36:00 EST


On Fri, Jun 11, 2021 at 9:43 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> On 10/06/2021 21:14, Josh Don wrote:
> > Hey Dietmar,
> >
> > On Thu, Jun 10, 2021 at 5:53 AM Dietmar Eggemann
> > <dietmar.eggemann@xxxxxxx> wrote:
> >>
> >> Any reason why this should only work on cgroup-v2?
> >
> > My (perhaps incorrect) assumption that new development should not
> > extend v1. I'd actually prefer making this work on v1 as well; I'll
> > add that support.
> >
> >> struct cftype cpu_legacy_files[] vs. cpu_files[]
> >>
> >> [...]
> >>
> >>> @@ -11340,10 +11408,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
> >>>
> >>> static DEFINE_MUTEX(shares_mutex);
> >>>
> >>> -int sched_group_set_shares(struct task_group *tg, unsigned long shares)
> >>> +#define IDLE_WEIGHT sched_prio_to_weight[ARRAY_SIZE(sched_prio_to_weight) - 1]
> >>
> >> Why not 3 ? Like for tasks (WEIGHT_IDLEPRIO)?
> >>
> >> [...]
> >
> > Went back and forth on this; on second look, I do think it makes sense
> > to use the IDLEPRIO weight of 3 here. This gets converted to a 0,
> > rather than a 1 for display of cpu.weight, which is also actually a
> > nice property.
>
> I'm struggling to see the benefit here.
>
> For a taskgroup A: Why setting A/cpu.idle=1 to force a minimum A->shares
> when you can set it directly via A/cpu.weight (to 1 (minimum))?
>
> WEIGHT cpu.weight tg->shares
>
> 3 0 3072
>
> 15 1 15360
>
> 1 10240
>
> `A/cpu.weight` follows cgroup-v2's `weights` `resource distribution
> model`* but I can only see `A/cpu.idle` as a layer on top of it forcing
> `A/cpu.weight` to get its minimum value?
>
> *Documentation/admin-guide/cgroup-v2.rst

Setting cpu.idle carries additional properties in addition to just the
weight. Currently, it primarily includes (a) special wakeup preemption
handling, and (b) contribution to idle_h_nr_running for the purpose of
marking a cpu as a sched_idle_cpu(). Essentially, the current
SCHED_IDLE mechanics. I've also discussed with Peter a potential
extension to SCHED_IDLE to manipulate vruntime.

We set the cgroup weight here, since by definition SCHED_IDLE entities
have the least scheduling weight. From the perspective of your
question, the analogous statement for tasks would be that we set task
weight to the min when doing setsched(SCHED_IDLE), even though we
already have a renice mechanism.