Re: [PATCH] smp,cpumask: Don't call functions on offline CPUs
From: Andrew Murray
Date: Wed May 22 2019 - 10:39:56 EST
On Wed, May 22, 2019 at 04:09:21PM +0200, Peter Zijlstra wrote:
> On Wed, May 22, 2019 at 12:15:37PM +0100, Andrew Murray wrote:
> > When we are able to allocate a cpumask in on_each_cpu_cond_mask
> > we call functions with on_each_cpu_mask - this masks out offline
> > cpus via smp_call_function_many.
> >
> > However when we fail to allocate a cpumask in on_each_cpu_cond_mask
> > we call functions with smp_call_function_single - this will return
> > -ENXIO from generic_exec_single if a CPU is offline which will
> > result in a WARN_ON_ONCE.
> >
> > Let's avoid the WARN by only calling smp_call_function_single when
> > the CPU is online and thus making both paths consistent with each
> > other.
>
> I'm confused, why are you feeding it offline CPUs to begin with? @mask
> shouldn't include them.
There are only two users of this function: on_each_cpu_cond which passes
cpu_online_mask, and native_flush_tlb_others from arch/x86/mm/tlb.c. But
I haven't followed all callers to native_flush_tlb_others to determine if
they are only passing in online CPUs.
Whilst trying to understand the code I couldn't help but notice that the
behaviour changes depending on the ability of memory allocation at the
time. This seemed odd. And of course there may be future callers of this
function that for some strange reason do pass in offline CPUs. (Maybe this
is OK and it's something they shouldn't be doing).
I felt that this function should be consistent and document the behaviour
via the @mask doc like some of the other functions in the file.
>
> Is perhaps the problem that on_each_cpu_cond() uses cpu_onlne_mask
> without protection?
Does this prevent racing with a CPU going offline? I guess this prevents
the warning at the expense of a lock - but is only beneficial in the
unlikely path. (In the likely path this prevents new CPUs going offline
but we don't care because we don't WARN if they aren't they when we
attempt to call functions).
At least this is my limited understanding.
Thanks,
Andrew Murray
>
> Something like so?
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index f4cf1b0bb3b8..a493b3dfa67f 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -705,8 +707,10 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
> smp_call_func_t func, void *info, bool wait,
> gfp_t gfp_flags)
> {
> + cpus_read_lock();
> on_each_cpu_cond_mask(cond_func, func, info, wait, gfp_flags,
> cpu_online_mask);
> + cpus_read_unlock();
> }
> EXPORT_SYMBOL(on_each_cpu_cond);
>
>