Re: [External] Re: [PATCH 0/2] Fix nohz_full vs rt bandwidth

From: Hao Jia
Date: Sun Sep 10 2023 - 23:39:14 EST




On 2023/9/8 Phil Auld wrote:
On Fri, Sep 08, 2023 at 10:57:26AM +0800 Hao Jia wrote:


On 2023/9/7 Phil Auld wrote:
Hi Hao,

On Wed, Sep 06, 2023 at 02:45:39PM +0800 Hao Jia wrote:

Friendly ping...

On 2023/8/21 Hao Jia wrote:
Since the commit 88c56cfeaec4 ("sched/fair: Block nohz tick_stop
when cfs bandwidth in use") was merged, it handles conflicts between
NOHZ full and cfs_bandwidth well, and the scheduler feature HZ_BW
allows us to choose which one to prefer.

This conflict also exists between NOHZ full and rt_bandwidth,
these two patches try to handle it in a similar way.


Are you actually hitting this in the real world?

We, for example, no longer enable RT_GROUP_SCHED so this is a non-issue
for our use cases. I'd recommend considering that. (Does it even
work with cgroup2?)


Yes, it has always been there. Regardless of whether RT_GROUP_SCHED is
enabled or not, rt bandwidth is always enabled. If RT_GROUP_SCHED is not
enabled, all rt tasks in the system are a group, and rt_runtime is 950000,
and rt_period is 1000000.So rt bandwidth is always enabled by default.

Sure, there is that. But I think Daniel is actively trying to remove it.


Thank you for your reply. Maybe I'm missing something. Can you give me some links to discussions about it?

Also I'm not sure you answered my question. Are you actually hitting this
in the real world? I'd be tempted to think this is a mis-configuration or
mis-use of RT. Plus you can disable that throttling and use stalld to catch
cases where the rt task goes out of control.


> Are you actually hitting this in the real world?

I tested on my machine using default settings (rt_runtime is 950000, and rt_period is 1000000.). The rt task is supposed to be throttled after running for 0.95 seconds, but due to the influence of NO_HZ_FULL, it may be throttled after running for about 1.4 seconds. This will only cause the rt_bandwidth throttle to be delayed, but no warning will be triggered.


> Plus you can disable that throttling and use stalld to catch cases where the rt task goes out of control.

IIRC, if we disable rt_bandwidth. The rt task is always running, which may cause cfs task starvation and hung_task warnning. This may be the reason why rt_bandwidth is enabled by default (rt_runtime is 950000, and rt_period is 1000000).


Thanks,
Hao

I'm not totally against doing this (for what my vote counts...), I just
wonder if it's really needed. It seem it may be over-engineering something
that is soon to be a non-problem.


Cheers,
Phil



Thanks,
Hao

In some ways what you have is a simplification of code, but it also
obfuscates the stop_tick conditions by hiding them all in the class
specific functions. It was easier to see why the tick didn't stop
looking at the original code.

It would be better to do this only if it is really needed, in my opinion.


Cheers,
Phil

patch1: Extracts a can_stop_tick() callback function for each
sched_class from sched_can_stop_tick(), it will make things clearer
and also convenient to handle the conflict between NOHZ full
and rt_bandwidth.

patch2: If the HZ_BW scheduler feature is enabled, and the RT task
to be run is constrained by rt_bandwidth runtime. Then it will
prevent NO_HZ full from stopping tick.

Hao Jia (2):
sched/core: Introduce sched_class::can_stop_tick()
sched/rt: Block nohz tick_stop when rt bandwidth in use

kernel/sched/core.c | 67 +++++--------------------------
kernel/sched/deadline.c | 16 ++++++++
kernel/sched/fair.c | 56 +++++++++++++++++++++++---
kernel/sched/rt.c | 89 ++++++++++++++++++++++++++++++++++++++++-
kernel/sched/sched.h | 5 ++-
5 files changed, 168 insertions(+), 65 deletions(-)