PROBLEM: Scheduling latency not respected in CFS with HRTICK enabled

From: Tomasz Grabiec
Date: Fri May 06 2016 - 11:43:19 EST


Hi,

System details:

Kernel: 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CPU: 2 sockets, 8 cores per socket, 2 threads per core
CPU Model: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

I was trying to reduce scheduling delays below 1/HZ, which is 4ms
for our kernel. I enabled HRTICK scheduler feature and set
scheduler tunables to:

sched_latency_ns: 1000000
sched_min_granularity_ns: 400000
sched_wakeup_granularity_ns: 400000
sched_autogroup_enabled: 0
sched_tunable_scaling: 1
sched_cfs_bandwidth_slice_us: 5000
sched_child_runs_first: 0
sched_migration_cost_ns: 500000
sched_nr_migrate: 32
sched_rr_timeslice_ms: 25
sched_rt_period_us: 1000000
sched_rt_runtime_us: 950000
sched_shares_window_ns: 1000000
sched_time_avg_ms: 1000

Note that scheduling latency is set to 1ms and min_granularity to 0.4ms.

The test case I used was to run two processes competing for CPU
pinned to a single core, like this:

python -c "import math; math.factorial(100000000)" &
pid1=$!
python -c "import math; math.factorial(100000000)" &
pid2=$!
sleep 1
sudo taskset -cap 2 $pid1
sudo taskset -cap 2 $pid2

Using perf I've found that scheduling delays do not stay around
0.5ms as could be expected from scheduling latency, but switch
between modes of 0.5ms and 4ms back and forth.

For example (perf script):

python 147208 [002] 3272129.329011: sched:sched_stat_wait:
comm=python pid=147207 delay=504875 [ns]
python 147207 [002] 3272129.329515: sched:sched_stat_wait:
comm=python pid=147208 delay=504096 [ns]
python 147208 [002] 3272129.330019: sched:sched_stat_wait:
comm=python pid=147207 delay=503936 [ns]
python 147207 [002] 3272129.330523: sched:sched_stat_wait:
comm=python pid=147208 delay=503888 [ns]
python 147208 [002] 3272129.331067: sched:sched_stat_wait:
comm=python pid=147207 delay=543902 [ns]
python 147207 [002] 3272129.334047: sched:sched_stat_wait:
comm=python pid=147208 delay=2980114 [ns]
python 147208 [002] 3272129.338048: sched:sched_stat_wait:
comm=python pid=147207 delay=4000695 [ns]
python 147207 [002] 3272129.342051: sched:sched_stat_wait:
comm=python pid=147208 delay=4002851 [ns]
python 147208 [002] 3272129.346055: sched:sched_stat_wait:
comm=python pid=147207 delay=4004720 [ns]
python 147207 [002] 3272129.350058: sched:sched_stat_wait:
comm=python pid=147208 delay=4002403 [ns]
python 147208 [002] 3272129.354064: sched:sched_stat_wait:
comm=python pid=147207 delay=4006037 [ns]
python 147207 [002] 3272129.358064: sched:sched_stat_wait:
comm=python pid=147208 delay=3999730 [ns]

More detailed trace around such transition point looks like this:

perf record \
-e sched:sched_stat_runtime \
-e sched:sched_stat_wait \
-e sched:sched_switch \
-e probe:hrtick \
-e probe:hrtick_start \
-e probe:scheduler_tick -C 2

python 873 [002] 3273334.927836: probe:hrtick: (ffffffff8109b430)
python 873 [002] 3273334.927837: sched:sched_stat_runtime:
comm=python pid=873 runtime=502915 [ns] vruntime=1549788940847739 [ns]
python 873 [002] 3273334.927837: sched:sched_stat_runtime:
comm=python pid=873 runtime=880 [ns] vruntime=1549788940848619 [ns]
python 873 [002] 3273334.927838: sched:sched_stat_wait:
comm=python pid=874 delay=503795 [ns]
python 873 [002] 3273334.927838: probe:hrtick_start: (ffffffff8109f4f0)
python 873 [002] 3273334.927839: sched:sched_switch:
prev_comm=python prev_pid=873 prev_prio=120 prev_state=R ==>
next_comm=python next_pid=874 next_prio=120
python 874 [002] 3273334.928233: probe:scheduler_tick: (ffffffff810a1f30)
python 874 [002] 3273334.928233: sched:sched_stat_runtime:
comm=python pid=874 runtime=395992 [ns] vruntime=1549788941220502 [ns]
python 874 [002] 3273334.928236: sched:sched_stat_runtime:
comm=python pid=874 runtime=2278 [ns] vruntime=1549788941222780 [ns]
python 874 [002] 3273334.928238: sched:sched_stat_wait:
comm=kworker/2:0 pid=93149 delay=0 [ns]
python 874 [002] 3273334.928238: probe:hrtick_start: (ffffffff8109f4f0)
python 874 [002] 3273334.928239: sched:sched_switch:
prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
next_comm=kworker/2:0 next_pid=93149 next_prio=120
kworker/2:0 93149 [002] 3273334.928242: sched:sched_stat_runtime:
comm=kworker/2:0 pid=93149 runtime=5827 [ns] vruntime=1549788940354446
[ns]
kworker/2:0 93149 [002] 3273334.928242: probe:hrtick_start: (ffffffff8109f4f0)
kworker/2:0 93149 [002] 3273334.928243: sched:sched_stat_wait:
comm=python pid=874 delay=5827 [ns]
kworker/2:0 93149 [002] 3273334.928244: probe:hrtick_start: (ffffffff8109f4f0)
kworker/2:0 93149 [002] 3273334.928244: sched:sched_switch:
prev_comm=kworker/2:0 prev_pid=93149 prev_prio=120 prev_state=S ==>
next_comm=python next_pid=874 next_prio=120
python 874 [002] 3273334.928745: probe:hrtick: (ffffffff8109b430)
python 874 [002] 3273334.928745: sched:sched_stat_runtime:
comm=python pid=874 runtime=503888 [ns] vruntime=1549788941726668 [ns]
python 874 [002] 3273334.928746: sched:sched_stat_runtime:
comm=python pid=874 runtime=910 [ns] vruntime=1549788941727578 [ns]
python 874 [002] 3273334.928747: sched:sched_stat_wait:
comm=python pid=873 delay=908895 [ns]
python 874 [002] 3273334.928747: probe:hrtick_start: (ffffffff8109f4f0)
python 874 [002] 3273334.928748: sched:sched_switch:
prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
next_comm=python next_pid=873 next_prio=120
python 873 [002] 3273334.929249: probe:hrtick: (ffffffff8109b430)
python 873 [002] 3273334.929249: sched:sched_stat_runtime:
comm=python pid=873 runtime=502850 [ns] vruntime=1549788941351469 [ns]
python 873 [002] 3273334.929250: sched:sched_stat_runtime:
comm=python pid=873 runtime=918 [ns] vruntime=1549788941352387 [ns]
python 873 [002] 3273334.932236: probe:scheduler_tick: (ffffffff810a1f30)
python 873 [002] 3273334.932237: sched:sched_stat_runtime:
comm=python pid=873 runtime=2986981 [ns] vruntime=1549788944339368
[ns]
python 873 [002] 3273334.932239: sched:sched_stat_runtime:
comm=python pid=873 runtime=1685 [ns] vruntime=1549788944341053 [ns]
python 873 [002] 3273334.932239: sched:sched_stat_wait:
comm=python pid=874 delay=3492434 [ns]
python 873 [002] 3273334.932240: probe:hrtick_start: (ffffffff8109f4f0)
python 873 [002] 3273334.932240: sched:sched_switch:
prev_comm=python prev_pid=873 prev_prio=120 prev_state=R ==>
next_comm=python next_pid=874 next_prio=120
python 874 [002] 3273334.932741: probe:hrtick: (ffffffff8109b430)
python 874 [002] 3273334.932742: sched:sched_stat_runtime:
comm=python pid=874 runtime=503174 [ns] vruntime=1549788942230752 [ns]
python 874 [002] 3273334.932743: sched:sched_stat_runtime:
comm=python pid=874 runtime=948 [ns] vruntime=1549788942231700 [ns]
python 874 [002] 3273334.936240: probe:scheduler_tick: (ffffffff810a1f30)
python 874 [002] 3273334.936241: sched:sched_stat_runtime:
comm=python pid=874 runtime=3497530 [ns] vruntime=1549788945729230
[ns]
python 874 [002] 3273334.936249: sched:sched_stat_runtime:
comm=python pid=874 runtime=8553 [ns] vruntime=1549788945737783 [ns]
python 874 [002] 3273334.936250: sched:sched_stat_wait:
comm=python pid=873 delay=4010205 [ns]
python 874 [002] 3273334.936250: probe:hrtick_start: (ffffffff8109f4f0)
python 874 [002] 3273334.936251: sched:sched_switch:
prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
next_comm=python next_pid=873 next_prio=120


Please advise, is this a bug?

Regards,
Tomasz Grabiec