Re: [PATCH 1/4] generic-smp: remove single ipi fallback forsmp_call_function_many()

From: Oleg Nesterov
Date: Mon Feb 16 2009 - 14:14:37 EST


On 02/16, Peter Zijlstra wrote:
>
> @@ -347,31 +384,32 @@ void smp_call_function_many(const struct
> return;
> }
>
> - data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
> - if (unlikely(!data)) {
> - /* Slow path. */
> - for_each_online_cpu(cpu) {
> - if (cpu == smp_processor_id())
> - continue;
> - if (cpumask_test_cpu(cpu, mask))
> - smp_call_function_single(cpu, func, info, wait);
> - }
> - return;
> + data = kmalloc(sizeof(*data), GFP_ATOMIC);
> + if (data)
> + data->csd.flags = CSD_FLAG_ALLOC;
> + else {
> + data = &per_cpu(cfd_data, me);
> + /*
> + * We need to wait for all previous users to go away.
> + */
> + while (data->csd.flags & CSD_FLAG_LOCK)
> + cpu_relax();
> + data->csd.flags = CSD_FLAG_LOCK;
> }
>
> spin_lock_init(&data->lock);
> - data->csd.flags = CSD_FLAG_ALLOC;
> if (wait)
> data->csd.flags |= CSD_FLAG_WAIT;
> data->csd.func = func;
> data->csd.info = info;
> - cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);
> - cpumask_clear_cpu(smp_processor_id(), to_cpumask(data->cpumask_bits));
> - data->refs = cpumask_weight(to_cpumask(data->cpumask_bits));
> -
> - spin_lock_irqsave(&call_function_lock, flags);
> - list_add_tail_rcu(&data->csd.list, &call_function_queue);
> - spin_unlock_irqrestore(&call_function_lock, flags);
> + cpumask_and(&data->cpumask, mask, cpu_online_mask);
> + cpumask_clear_cpu(smp_processor_id(), &data->cpumask);

(perhaps it makes sense to use "me" instead of smp_processor_id())

> + data->refs = cpumask_weight(&data->cpumask);
> +
> + spin_lock_irqsave(&call_function.lock, flags);
> + call_function.counter++;
> + list_add_tail_rcu(&data->csd.list, &call_function.queue);
> + spin_unlock_irqrestore(&call_function.lock, flags);

What if the initialization above leaks into the critical section?

I mean, generic_smp_call_function_interrupt() running on another CPU
can see the result of list_add_tail_rcu() and cpumask_and(data->cpumask)
but not (say) "data->refs = ...".


Actually I don't understand the counter/free_list logic.

generic_smp_call_function_interrupt:

/*
* When the global queue is empty, its guaranteed that no cpu
* is still observing any entry on the free_list, therefore
* we can go ahead and unlock them.
*/
if (!--call_function.counter)
list_splice_init(&call_function.free_list, &free_list);

I can't see why "its guaranteed that no cpu ...". Let's suppose CPU 0
"hangs" for some reason in generic_smp_call_function_interrupt() right
before "if (!cpumask_test_cpu(cpu, data->cpumask))" test. Then it is
possible that another CPU removes the single entry (which doesn't have
CPU 0 in data->cpumask) from call_function.queue.

Now, if that entry is re-added, we can have a problem if CPU 0 sees
the partly initialized entry.

But why do we need counter/free_list at all?
Can't we do the following:

- initialize call_function_data.lock at boot time

- change smp_call_function_many() to initialize cfd_data
under ->lock

- change generic_smp_call_function_interrupt() to do

list_for_each_entry_rcu(data) {

if (!cpumask_test_cpu(cpu, data->cpumask))
continue;

spin_lock(&data->lock);
if (!cpumask_test_cpu(cpu, data->cpumask)) {
spin_unlock(data->lock);
continue;
}

cpumask_clear_cpu(cpu, data->cpumask);
refs = --data->refs;

func = data->func;
info = data->info;
spin_unlock(&data->lock);

func(info);

if (refs)
continue;

...
}

Afaics, it is OK if smp_call_function_many() sees the "unneeded" entry on
list, the only thing we must ensure is that we can't "misunderstand" this
entry.

No?


Off-topic question, looks like smp_call_function_single() must not be
called from interrupt/bh handler, right? But the comment says nothing.

And,
smp_call_function_single:

/* Can deadlock when called with interrupts disabled */
WARN_ON(irqs_disabled());

Just curious, we can only deadlock if wait = T, right?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/