Re: [PATCH v4 11/11] sched/fair: rework find_idlest_group

From: Vincent Guittot
Date: Wed Nov 20 2019 - 11:54:03 EST


On Wed, 20 Nov 2019 at 14:21, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> Hi Qais,
>
> On Wed, 20 Nov 2019 at 12:58, Qais Yousef <qais.yousef@xxxxxxx> wrote:
> >
> > Hi Vincent
> >
> > On 10/18/19 15:26, Vincent Guittot wrote:
> > > The slow wake up path computes per sched_group statisics to select the
> > > idlest group, which is quite similar to what load_balance() is doing
> > > for selecting busiest group. Rework find_idlest_group() to classify the
> > > sched_group and select the idlest one following the same steps as
> > > load_balance().
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > ---
> >
> > LTP test has caught a regression in perf_event_open02 test on linux-next and I
> > bisected it to this patch.
> >
> > That is checking out next-20191119 tag and reverting this patch on top the test
> > passes. Without the revert the test fails.

I haven't tried linux-next yet but LTP test is passed with
tip/sched/core, which includes this patch, on hikey960 which is arm64
too.

Have you tried tip/sched/core on your juno ? this could help to
understand if it's only for juno or if this patch interact with
another branch merged in linux next

Thanks
Vincent

> >
> > I think this patch disturbs this part of the test:
> >
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/perf_event_open/perf_event_open02.c#L209
> >
> > When I revert this patch count_hardware_counters() returns a non zero value.
> > But with it applied it returns 0 which indicates that the condition terminates
> > earlier than what the test expects.
>
> Thanks for the report and starting analysing it
>
> >
> > I'm failing to see the connection yet, but since I spent enough time bisecting
> > it I thought I'll throw this out before I continue to bottom it out in hope it
> > rings a bell for you or someone else.
>
> I will try to reproduce the problem and understand why it's failing
> because i don't have any clue of the relation between both for now
>
> >
> > The problem was consistently reproducible on Juno-r2.
> >
> > LTP was compiled from 20190930 tag using
> >
> > ./configure --host=aarch64-linux-gnu --prefix=~/arm64-ltp/
> > make && make install
> >
> >
> >
> > *** Output of the test when it fails ***
> >
> > # ./perf_event_open02 -v
> > at iteration:0 value:254410384 time_enabled:195570320 time_running:156044100
> > perf_event_open02 0 TINFO : overall task clock: 166935520
> > perf_event_open02 0 TINFO : hw sum: 1200812256, task clock sum: 667703360
> > hw counters: 300202518 300202881 300203246 300203611
> > task clock counters: 166927400 166926780 166925660 166923520
> > perf_event_open02 0 TINFO : ratio: 3.999768
> > perf_event_open02 0 TINFO : nhw: 0.000100 /* I added this extra line for debug */
> > perf_event_open02 1 TFAIL : perf_event_open02.c:370: test failed (ratio was greater than )
> >
> >
> >
> > *** Output of the test when it passes (this patch reverted) ***
> >
> > # ./perf_event_open02 -v
> > at iteration:0 value:300271482 time_enabled:177756080 time_running:177756080
> > at iteration:1 value:300252655 time_enabled:166939100 time_running:166939100
> > at iteration:2 value:300252877 time_enabled:166924920 time_running:166924920
> > at iteration:3 value:300242545 time_enabled:166909620 time_running:166909620
> > at iteration:4 value:300250779 time_enabled:166918540 time_running:166918540
> > at iteration:5 value:300250660 time_enabled:166922180 time_running:166922180
> > at iteration:6 value:258369655 time_enabled:167388920 time_running:143996600
> > perf_event_open02 0 TINFO : overall task clock: 167540640
> > perf_event_open02 0 TINFO : hw sum: 1801473873, task clock sum: 1005046160
> > hw counters: 177971955 185132938 185488818 185488199 185480943 185477118 179657001 172499668 172137672 172139561
> > task clock counters: 99299900 103293440 103503840 103502040 103499020 103496160 100224320 96227620 95999400 96000420
> > perf_event_open02 0 TINFO : ratio: 5.998820
> > perf_event_open02 0 TINFO : nhw: 6.000100 /* I added this extra line for debug */
> > perf_event_open02 1 TPASS : test passed
> >
> > Thanks
> >
> > --
> > Qais Yousef