Re: [PATCH 0/2] Fix nohz_full vs rt bandwidth

From: Phil Auld
Date: Thu Sep 07 2023 - 12:36:35 EST


Hi Hao,

On Wed, Sep 06, 2023 at 02:45:39PM +0800 Hao Jia wrote:
>
> Friendly ping...
>
> On 2023/8/21 Hao Jia wrote:
> > Since the commit 88c56cfeaec4 ("sched/fair: Block nohz tick_stop
> > when cfs bandwidth in use") was merged, it handles conflicts between
> > NOHZ full and cfs_bandwidth well, and the scheduler feature HZ_BW
> > allows us to choose which one to prefer.
> >
> > This conflict also exists between NOHZ full and rt_bandwidth,
> > these two patches try to handle it in a similar way.
> >

Are you actually hitting this in the real world?

We, for example, no longer enable RT_GROUP_SCHED so this is a non-issue
for our use cases. I'd recommend considering that. (Does it even
work with cgroup2?)

In some ways what you have is a simplification of code, but it also
obfuscates the stop_tick conditions by hiding them all in the class
specific functions. It was easier to see why the tick didn't stop
looking at the original code.

It would be better to do this only if it is really needed, in my opinion.


Cheers,
Phil

> > patch1: Extracts a can_stop_tick() callback function for each
> > sched_class from sched_can_stop_tick(), it will make things clearer
> > and also convenient to handle the conflict between NOHZ full
> > and rt_bandwidth.
> >
> > patch2: If the HZ_BW scheduler feature is enabled, and the RT task
> > to be run is constrained by rt_bandwidth runtime. Then it will
> > prevent NO_HZ full from stopping tick.
> >
> > Hao Jia (2):
> > sched/core: Introduce sched_class::can_stop_tick()
> > sched/rt: Block nohz tick_stop when rt bandwidth in use
> >
> > kernel/sched/core.c | 67 +++++--------------------------
> > kernel/sched/deadline.c | 16 ++++++++
> > kernel/sched/fair.c | 56 +++++++++++++++++++++++---
> > kernel/sched/rt.c | 89 ++++++++++++++++++++++++++++++++++++++++-
> > kernel/sched/sched.h | 5 ++-
> > 5 files changed, 168 insertions(+), 65 deletions(-)
> >
>

--