Re: [PATCH 4/4] sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS

From: Srinivas Pandruvada
Date: Mon Feb 05 2018 - 12:04:35 EST


On Mon, 2018-02-05 at 11:10 +0000, Mel Gorman wrote:
> On Fri, Feb 02, 2018 at 12:01:37PM -0800, Srinivas Pandruvada wrote:
> > > Sure, but the lack on detection when tasks are low utilisation
> > > but
> > > still
> > > latency/throughput sensitive is problematic. Users shouldn't have
> > > to
> > > know they need to disable HWP or set performance goernor out of
> > > the
> > > box.
> > > It's only going to get worse as sockets get larger.
> >
> > I am not saying that we shouldn't do anything. Can you give me some
> > workloads which you care the most?
> >
>
> The proprietary workloads I'm aware of are useless to the discussion
> as they cannot be trivially reproduced and are typically only
> available
> under NDA. However, hints can be gotten by looking at the number of
> cases
> where recommended tunings limits C-states, set the performance
> governor,
> alter intel_pstate setpoint (if not HWP) etc.
>
> For the purposes of illustration, dbench at low thread counts does
> a reasonable job even though it's not that interesting a workload in
> general. With ext4 in particular, the journalling thread interactions
> bounce tasks around the machine and the short sleeps for IO both
> combine
> to have relatively low utilisation on individual CPUs. It's less
> pronounced
> on xfs as it bounces less due to using kworkers instead of kthreads.
>
> > >
> > > > There are totally different way HWP is handled in client an
> > > > servers.
> > > > If you set desired all heuristics they collected will be
> > > > dumped, so
> > > > they suggest don't set desired when you are in autonomous mode.
> > > > If
> > > > we
> > > > really want a boost set the EPP. We know that EPP makes lots of
> > > > measurable difference.
> > > >
> > >
> > > Sure boosting EPP makes a difference -- it's essentially what the
> > > performance
> > > goveror does and I know that can be done by a user but it's still
> > > basically a
> > > cop-out. Default performance for low utilisation or lightly
> > > loaded
> > > machines
> > > is poor. Maybe it should be set based on the ACPI preferred
> > > profile
> > > but
> > > that information is not always available. It would be nice if
> > > *some*
> > > sort of hint about new migrations or tasks waking from IO would
> > > be
> > > desirable.
> >
> > EPP is a range not a single value. So you don't need to make EPP=0
> > as a
> > performance governor. PeterZ gave me some scheduler change to
> > experiment, which can be used as hint to play with EPP.Â
> >
>
> I know EPP is a range, default from bios usually appear to be 6 or 7
> but
> I didn't do much experiementation to see if there is another value
> that
> works better. Even if there is, the default may need to change as not
> many
> people even know what EPP is or how it should be tuned.
I think you are talking about EPB not EPP because of ranges you
mentioned here. EPP is a value from 0 to 255. EPP is part of
HWP_REQUEST MSR.
EPB with HWP is used only in Broadwell server. I think you are using
Skylake here.

Thanks,
Srinivas