RT_GROUP_SCHED throttling blocks unthrottled RT tasks?

From: Doug Anderson
Date: Fri Nov 05 2021 - 13:44:38 EST


Hi,

I'm seeing a strange behavior that I _think_ is a bug. I'm hoping that
some of the scheduling experts can tell me if I'm just
misunderstanding or if this is truly a bug. To see it, I do this:

--

# Allow 1000 us more of RT at system and top cgroup
old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
echo $((old_rt + 1000)) > /proc/sys/kernel/sched_rt_runtime_us
old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
echo $((old_rt + 1000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us

# Give the 1000 us to my own group
mkdir /sys/fs/cgroup/cpu/doug
echo 1000 > /sys/fs/cgroup/cpu/doug/cpu.rt_runtime_us

# Fork off a bunch of spinny things
for i in $(seq 13); do
python -c "while True: pass"&
done

# Make my spinny things RT and put in my group
# (assumes no other python is running!)
for pid in $(ps aux | grep python | grep -v grep | awk '{print $2}'); do
echo $pid >> /sys/fs/cgroup/cpu/doug/tasks
chrt -p -f 99 $pid
done

--

As expected, the spinny python tasks are pretty much throttled down to
0 in the above (they get 1 ms out of 1 second).

However, _the bug_ is that the above basically causes all _other_ RT
things in my system to stop functioning. I'm on an ARM Chromebook
(sc7180-trogdor) and we communicate to our EC on a "realtime" thread
due to SPI timing requirements. The above commands appear to starve
the EC's communication task and (as far as I can tell) other RT tasks
in the system.

Notably:

a) My EC comms slow to a crawl (eventually one gets through).

b) "top" shows stuff like this:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
179 root -51 0 0 0 0 R 100.0 0.0 0:31.79
cros_ec_spi_hig
180 root -51 0 0 0 0 R 95.2 0.0 0:50.19
irq/169-chromeo
184 root -51 0 0 0 0 R 95.2 0.0 0:13.24
spi10
221 root -2 0 0 0 0 R 95.2 0.0 0:50.57
ring0

c) If I give my spinny tasks just a little bit more time than 1 ms
then I get a hung task.


When I'm testing the above, the non-RT stuff in the system continues
to work OK though. I can even go in and kill all my python tasks and
the system returns to normal.

I tried gathering some tracing. One bit that might (?) be relevant:

cros_ec_spi_hig-179 [000] d.h5 1495.305919: sched_waking:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
cros_ec_spi_hig-179 [000] d.h6 1495.305926: sched_wakeup:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
<idle>-0 [001] d.H5 1495.309113: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
<idle>-0 [001] d.H6 1495.309119: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
cros_ec_spi_hig-179 [000] d.h5 1495.309336: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
cros_ec_spi_hig-179 [000] d.h6 1495.309341: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
<idle>-0 [001] d.H5 1495.312137: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
<idle>-0 [001] d.H6 1495.312142: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
cros_ec_spi_hig-179 [000] d.h5 1495.312859: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
cros_ec_spi_hig-179 [000] d.h6 1495.312870: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006

My best guess is that there's some bug in the scheduler where it just
loops constantly picking an unthrottled RT task but then incorrectly
decides that it's throttled and thus doesn't run it.

Most of my testing has been on the chromeos-5.4 kernel, but just in
case I tried a vanilla v5.15 kernel and I could reproduce the same
problems.


Anyway, if I'm just doing something stupid then I appologize of the
noise. If the above should work and you need me to gather more logging
/ try any experiments, I'm happy to do so.

Thanks!

-Doug