Re: PROBLEM: Scheduling latency not respected in CFS with HRTICK enabled

From: Tomasz Grabiec
Date: Mon May 09 2016 - 04:54:15 EST


On Sun, May 8, 2016 at 2:19 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>
> 2016-05-06 23:43 GMT+08:00 Tomasz Grabiec <tgrabiec@xxxxxxxxxxxx>:
> > Hi,
> >
> > System details:
> >
> > Kernel: 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26
> > UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> > CPU: 2 sockets, 8 cores per socket, 2 threads per core
> > CPU Model: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
> >
> > I was trying to reduce scheduling delays below 1/HZ, which is 4ms
> > for our kernel. I enabled HRTICK scheduler feature and set
> > scheduler tunables to:
> >
> > sched_latency_ns: 1000000
> > sched_min_granularity_ns: 400000
> > sched_wakeup_granularity_ns: 400000
> > sched_autogroup_enabled: 0
> > sched_tunable_scaling: 1
> > sched_cfs_bandwidth_slice_us: 5000
> > sched_child_runs_first: 0
> > sched_migration_cost_ns: 500000
> > sched_nr_migrate: 32
> > sched_rr_timeslice_ms: 25
> > sched_rt_period_us: 1000000
> > sched_rt_runtime_us: 950000
> > sched_shares_window_ns: 1000000
> > sched_time_avg_ms: 1000
> >
> > Note that scheduling latency is set to 1ms and min_granularity to 0.4ms.
> >
> > The test case I used was to run two processes competing for CPU
> > pinned to a single core, like this:
> >
> > python -c "import math; math.factorial(100000000)" &
> > pid1=$!
> > python -c "import math; math.factorial(100000000)" &
> > pid2=$!
> > sleep 1
> > sudo taskset -cap 2 $pid1
> > sudo taskset -cap 2 $pid2
> >
> > Using perf I've found that scheduling delays do not stay around
> > 0.5ms as could be expected from scheduling latency, but switch
> > between modes of 0.5ms and 4ms back and forth.
> >
> > For example (perf script):
> >
> > python 147208 [002] 3272129.329011: sched:sched_stat_wait:
> > comm=python pid=147207 delay=504875 [ns]
> > python 147207 [002] 3272129.329515: sched:sched_stat_wait:
> > comm=python pid=147208 delay=504096 [ns]
> > python 147208 [002] 3272129.330019: sched:sched_stat_wait:
> > comm=python pid=147207 delay=503936 [ns]
> > python 147207 [002] 3272129.330523: sched:sched_stat_wait:
> > comm=python pid=147208 delay=503888 [ns]
> > python 147208 [002] 3272129.331067: sched:sched_stat_wait:
> > comm=python pid=147207 delay=543902 [ns]
> > python 147207 [002] 3272129.334047: sched:sched_stat_wait:
> > comm=python pid=147208 delay=2980114 [ns]
> > python 147208 [002] 3272129.338048: sched:sched_stat_wait:
> > comm=python pid=147207 delay=4000695 [ns]
> > python 147207 [002] 3272129.342051: sched:sched_stat_wait:
> > comm=python pid=147208 delay=4002851 [ns]
> > python 147208 [002] 3272129.346055: sched:sched_stat_wait:
> > comm=python pid=147207 delay=4004720 [ns]
> > python 147207 [002] 3272129.350058: sched:sched_stat_wait:
> > comm=python pid=147208 delay=4002403 [ns]
> > python 147208 [002] 3272129.354064: sched:sched_stat_wait:
> > comm=python pid=147207 delay=4006037 [ns]
> > python 147207 [002] 3272129.358064: sched:sched_stat_wait:
> > comm=python pid=147208 delay=3999730 [ns]
> >
> > More detailed trace around such transition point looks like this:
> >
> > perf record \
> > -e sched:sched_stat_runtime \
> > -e sched:sched_stat_wait \
> > -e sched:sched_switch \
> > -e probe:hrtick \
> > -e probe:hrtick_start \
> > -e probe:scheduler_tick -C 2
> >
> > python 873 [002] 3273334.927836: probe:hrtick: (ffffffff8109b430)
> > python 873 [002] 3273334.927837: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=502915 [ns] vruntime=1549788940847739 [ns]
> > python 873 [002] 3273334.927837: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=880 [ns] vruntime=1549788940848619 [ns]
> > python 873 [002] 3273334.927838: sched:sched_stat_wait:
> > comm=python pid=874 delay=503795 [ns]
> > python 873 [002] 3273334.927838: probe:hrtick_start: (ffffffff8109f4f0)
> > python 873 [002] 3273334.927839: sched:sched_switch:
> > prev_comm=python prev_pid=873 prev_prio=120 prev_state=R ==>
> > next_comm=python next_pid=874 next_prio=120
> > python 874 [002] 3273334.928233: probe:scheduler_tick: (ffffffff810a1f30)
> > python 874 [002] 3273334.928233: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=395992 [ns] vruntime=1549788941220502 [ns]
> > python 874 [002] 3273334.928236: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=2278 [ns] vruntime=1549788941222780 [ns]
> > python 874 [002] 3273334.928238: sched:sched_stat_wait:
> > comm=kworker/2:0 pid=93149 delay=0 [ns]
> > python 874 [002] 3273334.928238: probe:hrtick_start: (ffffffff8109f4f0)
> > python 874 [002] 3273334.928239: sched:sched_switch:
> > prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
> > next_comm=kworker/2:0 next_pid=93149 next_prio=120
> > kworker/2:0 93149 [002] 3273334.928242: sched:sched_stat_runtime:
> > comm=kworker/2:0 pid=93149 runtime=5827 [ns] vruntime=1549788940354446
> > [ns]
> > kworker/2:0 93149 [002] 3273334.928242: probe:hrtick_start: (ffffffff8109f4f0)
> > kworker/2:0 93149 [002] 3273334.928243: sched:sched_stat_wait:
> > comm=python pid=874 delay=5827 [ns]
> > kworker/2:0 93149 [002] 3273334.928244: probe:hrtick_start: (ffffffff8109f4f0)
> > kworker/2:0 93149 [002] 3273334.928244: sched:sched_switch:
> > prev_comm=kworker/2:0 prev_pid=93149 prev_prio=120 prev_state=S ==>
> > next_comm=python next_pid=874 next_prio=120
> > python 874 [002] 3273334.928745: probe:hrtick: (ffffffff8109b430)
> > python 874 [002] 3273334.928745: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=503888 [ns] vruntime=1549788941726668 [ns]
> > python 874 [002] 3273334.928746: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=910 [ns] vruntime=1549788941727578 [ns]
> > python 874 [002] 3273334.928747: sched:sched_stat_wait:
> > comm=python pid=873 delay=908895 [ns]
> > python 874 [002] 3273334.928747: probe:hrtick_start: (ffffffff8109f4f0)
> > python 874 [002] 3273334.928748: sched:sched_switch:
> > prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
> > next_comm=python next_pid=873 next_prio=120
> > python 873 [002] 3273334.929249: probe:hrtick: (ffffffff8109b430)
> > python 873 [002] 3273334.929249: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=502850 [ns] vruntime=1549788941351469 [ns]
> > python 873 [002] 3273334.929250: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=918 [ns] vruntime=1549788941352387 [ns]
> > python 873 [002] 3273334.932236: probe:scheduler_tick: (ffffffff810a1f30)
> > python 873 [002] 3273334.932237: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=2986981 [ns] vruntime=1549788944339368
> > [ns]
> > python 873 [002] 3273334.932239: sched:sched_stat_runtime:
> > comm=python pid=873 runtime=1685 [ns] vruntime=1549788944341053 [ns]
> > python 873 [002] 3273334.932239: sched:sched_stat_wait:
> > comm=python pid=874 delay=3492434 [ns]
> > python 873 [002] 3273334.932240: probe:hrtick_start: (ffffffff8109f4f0)
> > python 873 [002] 3273334.932240: sched:sched_switch:
> > prev_comm=python prev_pid=873 prev_prio=120 prev_state=R ==>
> > next_comm=python next_pid=874 next_prio=120
> > python 874 [002] 3273334.932741: probe:hrtick: (ffffffff8109b430)
> > python 874 [002] 3273334.932742: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=503174 [ns] vruntime=1549788942230752 [ns]
> > python 874 [002] 3273334.932743: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=948 [ns] vruntime=1549788942231700 [ns]
> > python 874 [002] 3273334.936240: probe:scheduler_tick: (ffffffff810a1f30)
> > python 874 [002] 3273334.936241: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=3497530 [ns] vruntime=1549788945729230
> > [ns]
> > python 874 [002] 3273334.936249: sched:sched_stat_runtime:
> > comm=python pid=874 runtime=8553 [ns] vruntime=1549788945737783 [ns]
> > python 874 [002] 3273334.936250: sched:sched_stat_wait:
> > comm=python pid=873 delay=4010205 [ns]
> > python 874 [002] 3273334.936250: probe:hrtick_start: (ffffffff8109f4f0)
> > python 874 [002] 3273334.936251: sched:sched_switch:
> > prev_comm=python prev_pid=874 prev_prio=120 prev_state=R ==>
> > next_comm=python next_pid=873 next_prio=120
> >
> >
> > Please advise, is this a bug?
> >
>
> Could you cat /sys/kernel/debug/sched_features?

Here it goes:

$ sudo cat /sys/kernel/debug/sched_features | xargs -n 1 echo
GENTLE_FAIR_SLEEPERS
START_DEBIT
NO_NEXT_BUDDY
LAST_BUDDY
CACHE_HOT_BUDDY
WAKEUP_PREEMPTION
ARCH_CAPACITY
HRTICK
NO_DOUBLE_TICK
LB_BIAS
NONTASK_CAPACITY
TTWU_QUEUE
RT_PUSH_IPI
NO_FORCE_SD_OVERLAP
RT_RUNTIME_SHARE
NO_LB_MIN
NUMA
NUMA_FAVOUR_HIGHER
NO_NUMA_RESIST_LOWER