Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running

From: Mel Gorman
Date: Wed Sep 22 2021 - 14:57:31 EST


On Wed, Sep 22, 2021 at 08:22:43PM +0200, Vincent Guittot wrote:
> > > > > In
> > > > > your case, you want hackbench threads to not preempt each others
> > > > > because they tries to use same resources so it's probably better to
> > > > > let the current one to move forward but that's not a universal policy.
> > > > >
> > > >
> > > > No, but have you a better suggestion? hackbench might be stupid but it's
> > > > an example of where a workload can excessively preempt itself. While
> > > > overloading an entire machine is stupid, it could also potentially occurs
> > > > for applications running within a constrained cpumask.
> > >
> > > But this is property that is specific to each application. Some can
> > > have a lot of running threads but few wakes up which have to preempt
> > > current threads quickly but others just want the opposite
> > > So because it is a application specific property we should define it
> > > this way instead of trying to guess
> >
> > I'm not seeing an alternative suggestion that could be turned into
> > an implementation. The current value for sched_wakeup_granularity
> > was set 12 years ago was exposed for tuning which is no longer
> > the case. The intent was to allow some dynamic adjustment between
> > sysctl_sched_wakeup_granularity and sysctl_sched_latency to reduce
> > over-scheduling in the worst case without disabling preemption entirely
> > (which the first version did).
> >
> > Should we just ignore this problem and hope it goes away or just let
> > people keep poking silly values into debugfs via tuned?
>
> We should certainly not add a bandaid because people will continue to
> poke silly value at the end. And increasing
> sysctl_sched_wakeup_granularity based on the number of running threads
> is not the right solution. According to the description of your
> problem that the current task doesn't get enough time to move forward,
> sysctl_sched_min_granularity should be part of the solution. Something
> like below will ensure that current got a chance to move forward
>

That's a very interesting idea! I've queued it up for further testing
and as a comparison to the bandaid.

--
Mel Gorman
SUSE Labs