Re: [PATCH] fix share rt runtime with offline rq

From: Steven Rostedt
Date: Mon Dec 23 2019 - 11:40:36 EST


On Sat, 21 Dec 2019 10:20:12 +0800
chenying <chen.ying153@xxxxxxxxxx> wrote:

> In my environment,cpu0-11 are online, cpu12-15 are offline, CPU2 is isolated,
> sched_rt_runtime_us is 950000,and then bind a rt process with dead loop to CPU2.
> We can see that CPU usage on CPU2 reaches 100%,but only one cpu is isolated,
> so it can be inferred that CPU2 shares the rt runtime of offline cpu.
>
> / # cat /sys/devices/system/cpu/online
> 0-11
> / # cat /sys/devices/system/cpu/offline
> 12-15
> / # cat /sys/devices/system/cpu/isolated
> 2
> / # cat /proc/sys/kernel/sched_rt_runtime_us
> 950000
> / # chrt -p 357
> pid 357's current scheduling policy: SCHED_FIFO
> pid 357's current scheduling priority: 1

I'm guessing that you took the cpus offline via the kernel command line
parameter. Because when I tried this with just:

# echo 0 > /sys/devices/system/cpu/cpu${cpu}/online

I could not reproduce it. But when I booted with maxcpus=X set, I could.


>
> top - 15:52:12 up 4 min, 0 users, load average: 0.92, 0.41, 0.16
> Tasks: 201 total, 2 running, 199 sleeping, 0 stopped, 0 zombie
> %Cpu0 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu2 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> %Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 357 root -2 0 4044 172 136 R 100.0 0.0 2:32.99 deadloop
> 366 root 20 0 22060 2404 2128 R 0.7 0.0 0:00.06 top
> 1 root 20 0 2624 20 0 S 0.0 0.0 0:05.93 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
> 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
> 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
>
> Signed-off-by: chenying <chen.ying153@xxxxxxxxxx>
> ---
> kernel/sched/rt.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index a532558..d20dc86 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -648,8 +648,12 @@ static void do_balance_runtime(struct rt_rq *rt_rq)
> rt_period = ktime_to_ns(rt_b->rt_period);
> for_each_cpu(i, rd->span) {
> struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
> + struct rq *rq = rq_of_rt_rq(iter);
> s64 diff;
>
> + if (!rq->online)
> + continue;
> +

I think this might be papering over the real issue. Perhaps
rq_offline_rt() needs to be called for CPUs not being brought online?

-- Steve


> if (iter == rt_rq)
> continue;
>