Re: [PATCH v2] isolcpus: affine kernel threads to specified cpumask

From: Marcelo Tosatti
Date: Wed Mar 25 2020 - 07:48:07 EST



Hi Frederic,

On Wed, Mar 25, 2020 at 01:30:00AM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 24, 2020 at 12:20:16PM -0300, Marcelo Tosatti wrote:
> >
> > This is a kernel enhancement to configure the cpu affinity of kernel
> > threads via kernel boot option isolcpus=no_kthreads,<isolcpus_params>,<cpulist>
> >
> > When this option is specified, the cpumask is immediately applied upon
> > thread launch. This does not affect kernel threads that specify cpu
> > and node.
> >
> > This allows CPU isolation (that is not allowing certain threads
> > to execute on certain CPUs) without using the isolcpus=domain parameter,
> > making it possible to enable load balancing on such CPUs
> > during runtime (see
> >
> > Note-1: this is based off on Wind River's patch at
> > https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch
> >
> > Difference being that this patch is limited to modifying
> > kernel thread cpumask: Behaviour of other threads can
> > be controlled via cgroups or sched_setaffinity.
> >
> > Note-2: MontaVista's patch was based off Christoph Lameter's patch at
> > https://lwn.net/Articles/565932/ with the only difference being
> > the kernel parameter changed from kthread to kthread_cpus.
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
>
> I'm wondering, why do you need such a boot shift at all when you
> can actually affine kthreads on runtime?

New, unbound kernel threads inherit the cpumask of kthreadd.

Therefore there is a race between kernel thread creation
and affine.

If you know of a solution to that problem, that can be used instead.

> >
> > ---
> >
> > v2: use isolcpus= subcommand (Thomas Gleixner)
> >
> > Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
> > include/linux/cpumask.h | 5 +++++
> > include/linux/sched/isolation.h | 1 +
> > init/main.c | 1 +
> > kernel/cpu.c | 13 +++++++++++++
> > kernel/kthread.c | 4 ++--
> > kernel/sched/isolation.c | 6 ++++++
> > 7 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index c07815d230bc..7318e3057383 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1959,6 +1959,14 @@
> > the CPU affinity syscalls or cpuset.
> > <cpu number> begins at 0 and the maximum value is
> > "number of CPUs in system - 1".
> > + When using cpusets, use the isolcpus option no_kthreads
> > + to avoid creation of kernel threads on isolated CPUs.
> > +
> > + no_kthreads
> > + Adjust the CPU affinity mask of unbound kernel threads to
> > + not contain CPUs on the isolated list. This complements
> > + the isolation provided by the cpusets mechanism described
> > + above.
>
> Actually that should be "kthread" instead of no_kthreads. A flag of isolcpus
> describes what we want a set of CPUs to be isolated from. Well, at least that's
> how we started with "domain" and "managed_irq".

Sure.

>
> >
> > managed_irq
> >
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 0fbcbacd1b29..d002332d00eb 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -14,6 +14,7 @@ enum hk_flags {
> > HK_FLAG_DOMAIN = (1 << 5),
> > HK_FLAG_WQ = (1 << 6),
> > HK_FLAG_MANAGED_IRQ = (1 << 7),
> > + HK_FLAG_NO_KTHREADS = (1 << 8),
>
> Similarly that should be HK_FLAG_KTHREAD.

Sure.

>
> > };
> >
> > #ifdef CONFIG_CPU_ISOLATION
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index b262f47046ca..be9c8d53a986 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -347,7 +347,7 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
> > * The kernel thread should not inherit these properties.
> > */
> > sched_setscheduler_nocheck(task, SCHED_NORMAL, &param);
> > - set_cpus_allowed_ptr(task, cpu_all_mask);
> > + set_cpus_allowed_ptr(task, cpu_kthread_mask);
>
> I'm wondering, why are we using cpu_all_mask and not cpu_possible_mask here?
> If we used the latter, you wouldn't need to create cpu_kthread_mask and
> you could directly rely on housekeeping_cpumask(HK_FLAG_KTHREAD).

I suppose that either work: CPUs can only be online from
cpu_possible_mask (and is contained in cpu_possible_mask).

Nice cleanup, thanks.

>
> > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > index 008d6ac2342b..e9d48729efd4 100644
> > --- a/kernel/sched/isolation.c
> > +++ b/kernel/sched/isolation.c
> > @@ -169,6 +169,12 @@ static int __init housekeeping_isolcpus_setup(char *str)
> > continue;
> > }
> >
> > + if (!strncmp(str, "no_kthreads,", 12)) {
> > + str += 12;
> > + flags |= HK_FLAG_NO_KTHREADS;
>
> You will certainly want HK_FLAG_WQ as well since workqueue has its own
> way to deal with unbound affinity.

Yep. HK_FLAG_WQ is simply a convenience so that the user does not have
to configure this separately: OK.

>
> > + continue;
> > + }
> > +
> > pr_warn("isolcpus: Error, unknown flag\n");
> > return 0;
> > }
> >
>
> Thanks.