Re: [v7 4/8] smp: add func to IPI cpus based on parameter func

From: Gilad Ben-Yossef
Date: Sun Jan 29 2012 - 07:04:35 EST


On Sat, Jan 28, 2012 at 1:57 AM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 26 Jan 2012 12:01:57 +0200
> Gilad Ben-Yossef <gilad@xxxxxxxxxxxxx> wrote:
...
>>
>> @@ -153,6 +162,16 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>>                       local_irq_enable();             \
>>               }                                       \
>>       } while (0)
>> +#define on_each_cpu_cond(cond_func, func, info, wait, gfpflags) \
>> +     do {                                            \
>> +             preempt_disable();                      \
>> +             if (cond_func(0, info)) {               \
>> +                     local_irq_disable();            \
>> +                     (func)(info);                   \
>> +                     local_irq_enable();             \
>
> Ordinarily, local_irq_enable() in such a low-level thing is dangerous,
> because it can cause horrid bugs when called from local_irq_disable()d
> code.
>
> However I think we're OK here because it is a bug to call on_each_cpu()
> and friends with local irqs disabled, yes?

Yes, that is my understanding and this way the function gets called in
the same conditions in UP and SMP.

> Do we have any warnings printks if someone calls the ipi-sending
> functions with local interrupts disabled?  I didn't see any, but didn't
> look very hard.

There is this check in smp_call_function_many():

WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress && !early_boot_irqs_disabled);

Only catches SMP offenders though.


> If my above claims are correct then why does on_each_cpu() use
> local_irq_save()?  hrm.

The comment in on_each_cpu() in kernel.smp.c says: "May be
used during early boot while early_boot_irqs_disabled is set. Use
local_irq_save/restore() instead of local_irq_disable/enable()."


>
>> +             }                                       \
>> +             preempt_enable();                       \
>> +     } while (0)
>>
>>  static inline void smp_send_reschedule(int cpu) { }
>>  #define num_booting_cpus()                   1
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index a081e6c..fa0912a 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -730,3 +730,61 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
>>       put_cpu();
>>  }
>>  EXPORT_SYMBOL(on_each_cpu_mask);
>> +
>> +/*
>> + * on_each_cpu_cond(): Call a function on each processor for which
>> + * the supplied function cond_func returns true, optionally waiting
>> + * for all the required CPUs to finish. This may include the local
>> + * processor.
>> + * @cond_func:       A callback function that is passed a cpu id and
>> + *           the the info parameter. The function is called
>> + *           with preemption disabled. The function should
>> + *           return a blooean value indicating whether to IPI
>> + *           the specified CPU.
>> + * @func:    The function to run on all applicable CPUs.
>> + *           This must be fast and non-blocking.
>> + * @info:    An arbitrary pointer to pass to both functions.
>> + * @wait:    If true, wait (atomically) until function has
>> + *           completed on other CPUs.
>> + * @gfpflags:        GFP flags to use when allocating the cpumask
>> + *           used internally by the function.
>> + *
>> + * The function might sleep if the GFP flags indicates a non
>> + * atomic allocation is allowed.
>> + *
>> + * You must not call this function with disabled interrupts or
>> + * from a hardware interrupt handler or from a bottom half handler.
>> + */
>> +void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
>> +                     smp_call_func_t func, void *info, bool wait,
>> +                     gfp_t gfpflags)
>
> bah.
>
> z:/usr/src/linux-3.3-rc1> grep -r gfpflags . | wc -l
> 78
> z:/usr/src/linux-3.3-rc1> grep -r gfp_flags . | wc -l
> 548
>

I have no specific preference. Should I switch?

>> +{
>> +     cpumask_var_t cpus;
>> +     int cpu, ret;
>> +
>> +     might_sleep_if(gfpflags & __GFP_WAIT);
>
> For the zalloc_cpumask_var(), it seems.  I expect there are
> might_sleep() elsewhere in the memory allocation paths, but putting one
> here will detect bugs even if CONFIG_CPUMASK_OFFSTACK=n.

Well, yes, although I didn't think about that :-)

>
>> +     if (likely(zalloc_cpumask_var(&cpus, (gfpflags|__GFP_NOWARN)))) {
>> +             preempt_disable();
>> +             for_each_online_cpu(cpu)
>> +                     if (cond_func(cpu, info))
>> +                             cpumask_set_cpu(cpu, cpus);
>> +             on_each_cpu_mask(cpus, func, info, wait);
>> +             preempt_enable();
>> +             free_cpumask_var(cpus);
>> +     } else {
>> +             /*
>> +              * No free cpumask, bother. No matter, we'll
>> +              * just have to IPI them one by one.
>> +              */
>> +             preempt_disable();
>> +             for_each_online_cpu(cpu)
>> +                     if (cond_func(cpu, info)) {
>> +                             ret = smp_call_function_single(cpu, func,
>> +                                                             info, wait);
>> +                             WARN_ON_ONCE(!ret);
>> +                     }
>> +             preempt_enable();
>> +     }
>> +}
>> +EXPORT_SYMBOL(on_each_cpu_cond);
>
> I assume the preempt_disable()s here are to suspend CPU hotplug?

Yes. Also, I figured that since the original code disabled
preemption for the entire on_each_cpu run time, including waiting for all
the CPUs to ack the IPI, and since we (hopefully) wait for less CPUs, the
overall runtime with preemption disabled will be (usually) lower then the
original code most of the time and we'll get a more robust interface.

Thanks,
Gilad

--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@xxxxxxxxxxxxx
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/