Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr_running

From: Mike Galbraith
Date: Mon Sep 27 2021 - 10:18:02 EST


On Mon, 2021-09-27 at 12:17 +0100, Mel Gorman wrote:
> On Thu, Sep 23, 2021 at 02:41:06PM +0200, Vincent Guittot wrote:
> > On Thu, 23 Sept 2021 at 11:22, Mike Galbraith <efault@xxxxxx> wrote:
> > >
> > > On Thu, 2021-09-23 at 10:40 +0200, Vincent Guittot wrote:
> > > >
> > > > a 100us value should even be enough to fix Mel's problem without
> > > > impacting common wakeup preemption cases.
> > >
> > > It'd be nice if it turn out to be something that simple, but color me
> > > skeptical.  I've tried various preemption throttling schemes, and while
> >
> > Let's see what the results will show. I tend to agree that this will
> > not be enough to cover all use cases and I don't see any other way to
> > cover all cases than getting some inputs from the threads about their
> > latency fairness which bring us back to some kind of latency niceness
> > value
> >
>
> Unfortunately, I didn't get a complete set of results but enough to work
> with. The missing tests have been requeued. The figures below are based
> on a single-socket Skylake machine with 8 CPUs as it had the most set of
> results and is the basic case.

There's something missing, namely how does whatever load you measure
perform when facing dissimilar competition. Instead of only scaling
loads running solo from underutilized to heavily over-committed, give
them competition. eg something switch heavy, say tbench, TCP_RR et al
(latency bound load) pairs=CPUS vs something hefty like make -j CPUS or
such.

With your "pick a load number and roll preemption down" approach and
competing (modest) loads running on some headless enterprise test array
box, there will likely be little impact, but do the same on a desktop
box from the GUI, according to my box, you're likely to see something
entirely different.

Seems the "if current hasn't consumed at least 100us, go away" approach
should impact fast movers facing competition both a lot more and more
consistently (given modest load for both methods).

Below is my box nfs mounting itself for no other reason than I was
curious to see what running my repo update script from an nfs mount
would look like (hey, it's still more realistic than hackbench;). It's
sorted by switches, but those at the top are also where the most cycles
landed. I wouldn't expect throughput of a load that switch happy to
hold up well at all when faced with enforced 100us latency.

Switch happy loads don't do so just to be mean, nor do they matter less
than whatever beefier loads they may encounter in a box.

-----------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
-----------------------------------------------------------------------------------------------------------------
nfsd:2246 | 31297.564 ms | 1514806 | avg: 0.002 ms | max: 555.543 ms | max at: 9130.660504 s
nfsd:2245 | 9536.364 ms | 1475992 | avg: 0.002 ms | max: 902.341 ms | max at: 9083.907053 s
kworker/u17:2-x:21933 | 3629.267 ms | 768463 | avg: 0.009 ms | max: 5536.206 ms | max at: 9088.540916 s
kworker/u17:3:7082 | 3426.631 ms | 701947 | avg: 0.010 ms | max: 5543.659 ms | max at: 9088.540901 s
git:7100 | 12708.278 ms | 573828 | avg: 0.001 ms | max: 3.520 ms | max at: 9066.125757 s
git:7704 | 11620.355 ms | 517010 | avg: 0.001 ms | max: 4.070 ms | max at: 9113.894832 s
kworker/u17:0:7075 | 1812.581 ms | 397601 | avg: 0.002 ms | max: 620.321 ms | max at: 9114.655685 s
nfsd:2244 | 4930.826 ms | 370473 | avg: 0.008 ms | max: 910.646 ms | max at: 9083.915441 s
kworker/u16:6:7094 | 2870.424 ms | 335848 | avg: 0.005 ms | max: 580.871 ms | max at: 9114.616479 s
nfsd:2243 | 3424.996 ms | 257274 | avg: 0.033 ms | max: 3843.339 ms | max at: 9086.848829 s
kworker/u17:1-x:30183 | 1310.614 ms | 255990 | avg: 0.001 ms | max: 1.817 ms | max at: 9089.173217 s
kworker/u16:60-:6124 | 2253.058 ms | 225931 | avg: 0.050 ms | max:10128.140 ms | max at: 9108.375040 s
kworker/u16:5:7092 | 1831.385 ms | 211923 | avg: 0.007 ms | max: 905.513 ms | max at: 9083.911630 s
kworker/u16:7:7101 | 1606.258 ms | 194944 | avg: 0.002 ms | max: 11.576 ms | max at: 9082.789700 s
kworker/u16:4:7090 | 1484.687 ms | 189197 | avg: 0.100 ms | max:12112.172 ms | max at: 9110.360308 s
kworker/u16:59-:6123 | 1707.858 ms | 183464 | avg: 0.073 ms | max: 6135.816 ms | max at: 9120.173398 s
kworker/u16:3:7074 | 1528.375 ms | 173089 | avg: 0.098 ms | max:15196.567 ms | max at: 9098.202355 s
kworker/u16:0-r:7009 | 1336.814 ms | 166043 | avg: 0.002 ms | max: 12.381 ms | max at: 9082.839130 s
nfsd:2242 | 1876.802 ms | 154855 | avg: 0.073 ms | max: 3844.877 ms | max at: 9086.848848 s
kworker/u16:1:7072 | 1214.642 ms | 151420 | avg: 0.002 ms | max: 6.433 ms | max at: 9075.581713 s
kworker/u16:2:7073 | 1302.996 ms | 150863 | avg: 0.002 ms | max: 12.119 ms | max at: 9082.839133 s