Re: [PATCH] sched: select 'idle' cfs_rq per task-group to prevent tg-internal imbalance

From: Michael wang
Date: Mon Jun 23 2014 - 23:35:27 EST

Next message: Sachin Kamat: "Re: [PATCH 1/1] pinctrl: bcm281xx: Staticize bcm281xx_pinctrl_probe"
Previous message: Tejun Heo: "Re: On-stack work item completion race? (was Re: XFS crash?)"
In reply to: Peter Zijlstra: "Re: [PATCH] sched: select 'idle' cfs_rq per task-group to prevent tg-internal imbalance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06/23/2014 05:42 PM, Peter Zijlstra wrote:
[snip]
>> +}
>
> Still completely hate this, it doesn't make sense conceptual sense what
> so ever.

Yeah... and now I agree your opinion that this could not address all the
cases after all the testing these days...

Just wondering could we make this another scheduler feature?

I mean by logical, this will make tasks spread on each CPU inside
task-group, meanwhile follow the load-balance decision, the testing show
that the patch achieved that goal well.

Currently the scheduler haven't provide a good way to achieve that, correct?

And it do help a lot in our testing for workload like dbench and
transaction workload when they are fighting with stress likely workload,
combined with GENTLE_FAIR_SLEEPERS, we could make the cpu-shares works
again, here is some real numbers of 'dbench 6 -t 60' in our testing:

Without the patch:

Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 1281241 0.036 62.872
Close 941274 0.002 13.298
Rename 54249 0.120 19.340
Unlink 258686 0.156 37.155
Deltree 36 8.514 41.904
Mkdir 18 0.003 0.003
Qpathinfo 1161327 0.016 40.130
Qfileinfo 203648 0.001 7.118
Qfsinfo 212896 0.004 11.084
Sfileinfo 104385 0.067 55.990
Find 448958 0.033 23.150
WriteX 639464 0.069 55.452
ReadX 2008086 0.009 24.466
LockX 4174 0.012 14.127
UnlockX 4174 0.006 7.357
Flush 89787 1.533 56.925

Throughput 666.318 MB/sec 6 clients 6 procs max_latency=62.875 ms

With the patch applied:

Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 2601876 0.025 52.339
Close 1911248 0.001 0.133
Rename 110195 0.080 6.739
Unlink 525476 0.070 52.359
Deltree 62 6.143 19.919
Mkdir 31 0.003 0.003
Qpathinfo 2358482 0.009 52.355
Qfileinfo 413190 0.001 0.092
Qfsinfo 432513 0.003 0.790
Sfileinfo 211934 0.027 13.830
Find 911874 0.021 5.969
WriteX 1296646 0.038 52.348
ReadX 4079453 0.006 52.247
LockX 8476 0.003 0.050
UnlockX 8476 0.001 0.045
Flush 182342 0.536 55.953

Throughput 1360.74 MB/sec 6 clients 6 procs max_latency=55.970 ms

And the share works normally, the CPU% resources was managed well again.

So could we provide a feature like:

SCHED_FEAT(TG_INTERNAL_BALANCE, false)

I do believe there are more cases could benefit from it, for those who
don't want too many wake-affine and want group-tasks more balanced on
each CPU, scheduler could provide this as an option then, shall we?

Regards,
Michael Wang

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Sachin Kamat: "Re: [PATCH 1/1] pinctrl: bcm281xx: Staticize bcm281xx_pinctrl_probe"
Previous message: Tejun Heo: "Re: On-stack work item completion race? (was Re: XFS crash?)"
In reply to: Peter Zijlstra: "Re: [PATCH] sched: select 'idle' cfs_rq per task-group to prevent tg-internal imbalance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]