Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API

From: Marcelo Tosatti
Date: Wed Mar 29 2023 - 10:37:43 EST


On Tue, Mar 28, 2023 at 01:48:02PM +0200, Michal Hocko wrote:
> On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote:
> > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> [...]
> > > > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> > > >
> > > > This is not really my area. Frederic, could you have a look please?
> > >
> > > The point is to have a function that tells if either nohz_full= or
> > > isolcpus=[domain] has been passed for the given CPU.
> > >
> > > Because I assumed that both would be interested in avoiding that flush
> > > noise, wouldn't it be the case?
> >
> > Yes, that is the case. But as a note: for the two main types of
> > configuration performed (one uses isolcpus=[domain] and the other
> > cgroups, for isolating processes) nohz_full= is always set.
> >
> > So just testing for nohz_full= would be sufficient (which perhaps would
> > make the code simpler).
>
> I do not see any mention about that assumption under Documentation/.

Documentation/admin-guide/kernel-per-CPU-kthreads.rst

SCHED_SOFTIRQ
-------------

Do all of the following:

1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
for example, ensure that at most one runnable kthread is present
on that CPU. If a thread that expects to run on the de-jittered
CPU awakens, the scheduler will send an IPI that can result in
a subsequent SCHED_SOFTIRQ.
2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
is marked as an adaptive-ticks CPU using the "nohz_full="
boot parameter. This reduces the number of scheduler-clock
interrupts that the de-jittered CPU receives, minimizing its
chances of being selected to do the load balancing work that
runs in SCHED_SOFTIRQ context.

> Is this a best practice documented anywhere or it just happens to be
> the case with workloads you deal with?

Option 2. However Frederic seems interested in matching the exported
toggles with the known use-cases classes.

For example, for this guide:
http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index

Using nohz_full= would be a benefit (and its not being currently set,
perhaps due to not knowing all the options?).

http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index


AFAIU the workloads for which disabling nohz_full= is a benefit are those
where the switching between nohz full mode and sched tick enabled mode
and vice-versa (which involve programming the local timer) happens
often and is therefore avoidable? For example switching between 1
runnable task and more than 1 runnable task (and vice versa).