Re: [PATCH 2/2] sched: adjust SCHED_IDLE interactions

From: Josh Don
Date: Fri Aug 13 2021 - 19:55:50 EST


On Fri, Aug 13, 2021 at 5:43 AM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
[snip]
> > >
> > > The 1ms of your test comes from the tick which could be a good
> > > candidate for a min value or the
> > > normalized_sysctl_sched_min_granularity which has the advantage of not
> > > increasing with number of CPU
> >
> > Fair point, this shouldn't completely ignore min granularity. Something like
> >
> > unsigned int sysctl_sched_idle_min_granularity = NSEC_PER_MSEC;
> >
> > (and still only using this value instead of the default
> > min_granularity when the SCHED_IDLE entity is competing with normal
> > entities)
>
> Yes that looks like a good option
>
> Also note that with a NSEC_PER_MSEC default value, the sched_idle
> entity will most probably run 2 ticks instead of the 1 tick (HZ=1000)
> that you have with your proposal because a bit less than a full tick
> is accounted to the running thread (the time spent in interrupt is not
> accounted as an example) so sysctl_sched_idle_min_granularity of 1ms
> with HZ=1000 will most propably run 2 ticks. Instead you could reuse
> the default 750000ULL value of sched_idle_min_granularity

Yes, great point. That's a better value here, with sufficient margin.

> That being said sysctl_sched_idle_min_granularity =
> normalized_sysctl_sched_min_granularity * scale_factor which means
> that normalized_sysctl_sched_min_granularity stays the same
> (750000ULL) whatever the number of cpus
>
> >
> > > > @@ -4216,7 +4228,15 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > > if (sched_feat(GENTLE_FAIR_SLEEPERS))
> > > > thresh >>= 1;
> > > >
> > > > - vruntime -= thresh;
> > > > + /*
> > > > + * Don't give sleep credit to a SCHED_IDLE entity if we're
> > > > + * placing it onto a cfs_rq with non SCHED_IDLE entities.
> > > > + */
> > > > + if (!se_is_idle(se) ||
> > > > + cfs_rq->h_nr_running == cfs_rq->idle_h_nr_running)
> > >
> > > Can't this condition above create unfairness between idle entities ?
> > > idle thread 1 wake up while normal thread is running
> > > normal thread thread sleeps immediately after
> > > idle thread 2 wakes up just after and gets some credits compared to the 1st one.
> >
> > Yes, this sacrifices some idle<->idle fairness when there is a normal
> > thread that comes and goes. One alternative is to simply further
> > reduce thresh for idle entities. That will interfere with idle<->idle
> > fairness when there are no normal threads, which is why I opted for
> > the former. On second thought though, the former fairness issue seems
> > more problematic. Thoughts on applying a smaller sleep credit
> > threshold universally to idle entities?
>
> This one is a bit more complex to set.
> With adding 1, you favor the already runnable tasks by ensuring that
> they have or will run a slice during this period before sched_idle
> task
> But as soon as you subtract something to min_vruntime, the task will
> most probably be scheduled at the next tick if other tasks already run
> for a while (at least a sched period). If we use
> sysctl_sched_min_granularity for sched_idle tasks that wake up instead
> of sysctl_sched_latency, we will ensure that a sched_idle task will
> not preempt a normal task, which woke up few ms before, and we will
> keep some fairness for sched_idle task that sleeps compare to other.
>
> so a thresh of sysctl_sched_min_granularity (3.75ms with 16 cpus )
> should not disturb your UC and keep some benefit for newly wake up
> sched_ide task

If the normal task has already been running for at least a period, it
should be ok to preempt.
A thresh around the min_granularity seems like a good order of
magnitude; I'll experiment a bit.