Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Aubrey Li
Date: Tue Sep 17 2019 - 21:34:38 EST


On Sun, Sep 15, 2019 at 10:14 PM Aaron Lu <aaron.lu@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Sep 13, 2019 at 07:12:52AM +0800, Aubrey Li wrote:
> > On Thu, Sep 12, 2019 at 8:04 PM Aaron Lu <aaron.lu@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Sep 11, 2019 at 09:19:02AM -0700, Tim Chen wrote:
> > > > On 9/11/19 7:02 AM, Aaron Lu wrote:
> > > > I think Julien's result show that my patches did not do as well as
> > > > your patches for fairness. Aubrey did some other testing with the same
> > > > conclusion. So I think keeping the forced idle time balanced is not
> > > > enough for maintaining fairness.
> > >
> > > Well, I have done following tests:
> > > 1 Julien's test script: https://paste.debian.net/plainh/834cf45c
> > > 2 start two tagged will-it-scale/page_fault1, see how each performs;
> > > 3 Aubrey's mysql test: https://github.com/aubreyli/coresched_bench.git
> > >
> > > They all show your patchset performs equally well...And consider what
> > > the patch does, I think they are really doing the same thing in
> > > different ways.
> >
> > It looks like we are not on the same page, if you don't mind, can both of
> > you rebase your patchset onto v5.3-rc8 and provide a public branch so I
> > can fetch and test it at least by my benchmark?
>
> I'm using the following branch as base which is v5.1.5 based:
> https://github.com/digitalocean/linux-coresched coresched-v3-v5.1.5-test
>
> And I have pushed Tim's branch to:
> https://github.com/aaronlu/linux coresched-v3-v5.1.5-test-tim
>
> Mine:
> https://github.com/aaronlu/linux coresched-v3-v5.1.5-test-core_vruntime
>
> The two branches both have two patches I have sent previouslly:
> https://lore.kernel.org/lkml/20190810141556.GA73644@aaronlu/
> Although it has some potential performance loss as pointed out by
> Vineeth, I haven't got time to rework it yet.

In terms of these two branches, we tested two cases:

1) 32 AVX threads and 32 mysql threads on one core(2 HT)
2) 192 AVX threads and 192 mysql threads on 96 cores(192 HTs)

For case 1), we saw two branches is on par

Branch: coresched-v3-v5.1.5-test-core_vruntime
-Avg throughput:: 1865.62 (std: 20.6%)
-Avg latency: 26.43 (std: 8.3%)

Branch: coresched-v3-v5.1.5-test-tim
- Avg throughput: 1804.88 (std: 20.1%)
- Avg latency: 29.78 (std: 11.8%)

For case 2), we saw core vruntime performs better than counting forced
idle time

Branch: coresched-v3-v5.1.5-test-core_vruntime
- Avg throughput: 5528.56 (std: 44.2%)
- Avg latency: 165.99 (std: 45.2%)

Branch: coresched-v3-v5.1.5-test-tim
-Avg throughput:: 3842.33 (std: 35.1%)
-Avg latency: 306.99 (std: 72.9%)

As Aaron pointed out, vruntime is with se's weight, which could be a reason
for the difference.

So should we go with core vruntime approach?
Or Tim - do you want to improve forced idle time approach?

Thanks,
-Aubrey