Re: [RFC V2 0/2] sched/fair: Fallback to sched-idle CPU for better performance

From: Viresh Kumar
Date: Wed May 15 2019 - 07:19:41 EST


On 25-04-19, 15:07, Viresh Kumar wrote:
> Hi,
>
> Here is another attempt to get some benefit out of the sched-idle
> policy. The previous version [1] focused on getting better power numbers
> and this version tries to get better performance or lower response time
> for the tasks.
>
> The first patch is unchanged from v1 and accumulates
> information about sched-idle tasks per CPU.
>
> The second patch changes the way the target CPU is selected in the fast
> path. Currently, we target for an idle CPU in select_idle_sibling() to
> run the next task, but in case we don't find idle CPUs it is better to
> pick a CPU which will run the task the soonest, for performance reason.
> A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> should be a good target based on this criteria as any normal fair task
> will most likely preempt the currently running SCHED_IDLE task
> immediately. In fact, choosing a SCHED_IDLE CPU shall give better
> results as it should be able to run the task sooner than an idle CPU
> (which requires to be woken up from an idle state).
>
> Basic testing is done with the help of rt-app currently to make sure the
> task is getting placed correctly.

More results here:

- Tested on Octacore Hikey platform (all CPUs change frequency
together).

- rt-app json attached here. It creates few tasks and we monitor the
scheduling latency for them by looking at "wu_lat" field (usec).

- The histograms are created using
https://github.com/adkein/textogram: textogram -a 0 -z 1000 -n 10

- the stats are accumulated using: https://github.com/nferraz/st

- NOTE: The % values shown don't add up, just look at total numbers
instead


Test 1: Create 8 CFS tasks (no SCHED_IDLE tasks) without this
patchset:

0 - 100 : ################################################## 72% (3688)
100 - 200 : ################ 24% (1253)
200 - 300 : ## 2% (149)
300 - 400 : 0% (22)
400 - 500 : 0% (1)
500 - 600 : 0% (3)
600 - 700 : 0% (1)
700 - 800 : 0% (1)
800 - 900 :
900 - 1000 : 0% (1)
>1000 : 0% (17)

N min max sum mean stddev
5136 0 2452 535985 104.358 104.585


Test 2: Create 8 CFS tasks and 5 SCHED_IDLE tasks:

A. Without sched-idle patchset:

0 - 100 : ################################################## 88% (3102)
100 - 200 : ## 4% (148)
200 - 300 : 1% (41)
300 - 400 : 0% (27)
400 - 500 : 0% (33)
500 - 600 : 0% (32)
600 - 700 : 1% (36)
700 - 800 : 0% (27)
800 - 900 : 0% (19)
900 - 1000 : 0% (26)
>1000 : 34% (1218)

N min max sum mean stddev
4710 0 67664 5.25956e+06 1116.68 2315.09


B. With sched-idle patchset:

0 - 100 : ################################################## 99% (5042)
100 - 200 : 0% (8)
200 - 300 :
300 - 400 :
400 - 500 : 0% (2)
500 - 600 : 0% (1)
600 - 700 :
700 - 800 : 0% (1)
800 - 900 : 0% (1)
900 - 1000 :
>1000 : 0% (40)

N min max sum mean stddev
5095 0 7773 523170 102.683 475.482


The mean latency dropped to 10% and the stddev to around 25% with this
patchset.

I have tried more combinations of CFS and SCHED_IDLE tasks and see
expected improvement in scheduling latency for all of them.

--
viresh

Attachment: sched-idle.json
Description: application/json