Re: [PATCH 1/4] generic-smp: remove single ipi fallback forsmp_call_function_many()

From: Peter Zijlstra
Date: Mon Feb 16 2009 - 14:42:45 EST


On Mon, 2009-02-16 at 20:10 +0100, Oleg Nesterov wrote:
> On 02/16, Peter Zijlstra wrote:
> >
> > @@ -347,31 +384,32 @@ void smp_call_function_many(const struct
> > return;
> > }
> >
> > - data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
> > - if (unlikely(!data)) {
> > - /* Slow path. */
> > - for_each_online_cpu(cpu) {
> > - if (cpu == smp_processor_id())
> > - continue;
> > - if (cpumask_test_cpu(cpu, mask))
> > - smp_call_function_single(cpu, func, info, wait);
> > - }
> > - return;
> > + data = kmalloc(sizeof(*data), GFP_ATOMIC);
> > + if (data)
> > + data->csd.flags = CSD_FLAG_ALLOC;
> > + else {
> > + data = &per_cpu(cfd_data, me);
> > + /*
> > + * We need to wait for all previous users to go away.
> > + */
> > + while (data->csd.flags & CSD_FLAG_LOCK)
> > + cpu_relax();
> > + data->csd.flags = CSD_FLAG_LOCK;
> > }
> >
> > spin_lock_init(&data->lock);
> > - data->csd.flags = CSD_FLAG_ALLOC;
> > if (wait)
> > data->csd.flags |= CSD_FLAG_WAIT;
> > data->csd.func = func;
> > data->csd.info = info;
> > - cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);
> > - cpumask_clear_cpu(smp_processor_id(), to_cpumask(data->cpumask_bits));
> > - data->refs = cpumask_weight(to_cpumask(data->cpumask_bits));
> > -
> > - spin_lock_irqsave(&call_function_lock, flags);
> > - list_add_tail_rcu(&data->csd.list, &call_function_queue);
> > - spin_unlock_irqrestore(&call_function_lock, flags);
> > + cpumask_and(&data->cpumask, mask, cpu_online_mask);
> > + cpumask_clear_cpu(smp_processor_id(), &data->cpumask);
>
> (perhaps it makes sense to use "me" instead of smp_processor_id())

Ah, missed one it seems, thanks ;-)

> > + data->refs = cpumask_weight(&data->cpumask);
> > +
> > + spin_lock_irqsave(&call_function.lock, flags);
> > + call_function.counter++;
> > + list_add_tail_rcu(&data->csd.list, &call_function.queue);
> > + spin_unlock_irqrestore(&call_function.lock, flags);
>
> What if the initialization above leaks into the critical section?
>
> I mean, generic_smp_call_function_interrupt() running on another CPU
> can see the result of list_add_tail_rcu() and cpumask_and(data->cpumask)
> but not (say) "data->refs = ...".

Ho humm, nice :-)

So best would be to put that data initialization under data->lock. This
would be a bug in the original code too.

> Actually I don't understand the counter/free_list logic.
>
> generic_smp_call_function_interrupt:
>
> /*
> * When the global queue is empty, its guaranteed that no cpu
> * is still observing any entry on the free_list, therefore
> * we can go ahead and unlock them.
> */
> if (!--call_function.counter)
> list_splice_init(&call_function.free_list, &free_list);
>
> I can't see why "its guaranteed that no cpu ...". Let's suppose CPU 0
> "hangs" for some reason in generic_smp_call_function_interrupt() right
> before "if (!cpumask_test_cpu(cpu, data->cpumask))" test. Then it is
> possible that another CPU removes the single entry (which doesn't have
> CPU 0 in data->cpumask) from call_function.queue.

Then call_function.counter wouldn't be 0, and we would not release the
list entries.

> Now, if that entry is re-added, we can have a problem if CPU 0 sees
> the partly initialized entry.

Right, so the scenario is that a cpu observes the list entry after we do
list_del_rcu() -- most likely a cpu not in the original mask, taversing
the list for a next entry -- we have to wait for some quiescent state to
ensure nobody is still referencing it.

We cannot use regular RCU, because its quiesent state takes forever to
happen, therefore this implementes a simple counting rcu for the queue
domain only.

When the list is empty, there's nobody seeing any elements, ergo we can
release the entries and re-use them.

> But why do we need counter/free_list at all?
> Can't we do the following:
>
> - initialize call_function_data.lock at boot time
>
> - change smp_call_function_many() to initialize cfd_data
> under ->lock
>
> - change generic_smp_call_function_interrupt() to do
>
> list_for_each_entry_rcu(data) {
>
> if (!cpumask_test_cpu(cpu, data->cpumask))
> continue;
>
> spin_lock(&data->lock);
> if (!cpumask_test_cpu(cpu, data->cpumask)) {
> spin_unlock(data->lock);
> continue;
> }
>
> cpumask_clear_cpu(cpu, data->cpumask);
> refs = --data->refs;
>
> func = data->func;
> info = data->info;
> spin_unlock(&data->lock);
>
> func(info);
>
> if (refs)
> continue;
>
> ...
> }
>
> Afaics, it is OK if smp_call_function_many() sees the "unneeded" entry on
> list, the only thing we must ensure is that we can't "misunderstand" this
> entry.
>
> No?

Hmm, could that not leed to loops, and or missed entries in the
list-iteration?

The saves approach seemed to wait until sure nobody observed the entry
before re-using it.

>
> Off-topic question, looks like smp_call_function_single() must not be
> called from interrupt/bh handler, right? But the comment says nothing.
>
> And,
> smp_call_function_single:
>
> /* Can deadlock when called with interrupts disabled */
> WARN_ON(irqs_disabled());
>
> Just curious, we can only deadlock if wait = T, right?

Right.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/