Re: [PATCH 1/3] generic-ipi: simplify the barriers

From: Paul E. McKenney
Date: Tue Feb 17 2009 - 19:27:19 EST


On Tue, Feb 17, 2009 at 10:59:05PM +0100, Peter Zijlstra wrote:
> From: Nick Piggin <npiggin@xxxxxxx>
>
> Firstly, just unconditionally take the lock and check the list in the
> generic_call_function_single_interrupt IPI handler. As we've just taken
> an IPI here, the chances are fairly high that there will be work on the
> list for us, so do the locking unconditionally. This removes the tricky
> lockless list_empty check and dubious barriers. The change looks bigger
> than it is because it is just removing an outer loop.
>
> Secondly, clarify architecture specific IPI locking rules. Generic code
> has no tools to impose any sane ordering on IPIs if they go outside
> normal cache coherency, ergo the arch code must make them appear to
> obey cache coherency as a "memory operation" to initiate an IPI, and
> a "memory operation" to receive one. This way at least they can be
> reasoned about in generic code, and smp_mb used to provide ordering.
>
> The combination of these two changes means that explict barriers can
> be taken out of queue handling for the single case -- shared data is
> explicitly locked, and ipi ordering must conform to that, so no
> barriers needed. An extra barrier is needed in the many handler, so
> as to ensure we load the list element after the IPI is received.
>
> Does any architecture actually needs barriers? For the initiator I
> could see it, but for the handler I would be surprised. The other
> thing we could do for simplicity is just to require that a full
> barrier is required before generating an IPI, and after receiving an
> IPI. We can't just do that in generic code without auditing
> architectures. There have been subtle hangs here on some archs in
> the past.

While I sympathize with pushing memory barriers down into the
arch-specific functions, you -are- running this by the various
arch maintainers so that they have an opportunity to adjust, right?

Thanx, Paul

> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> ---
> kernel/smp.c | 83 +++++++++++++++++++++++++++++++----------------------------
> 1 file changed, 44 insertions(+), 39 deletions(-)
>
> Index: linux-2.6/kernel/smp.c
> ===================================================================
> --- linux-2.6.orig/kernel/smp.c
> +++ linux-2.6/kernel/smp.c
> @@ -74,9 +74,16 @@ static void generic_exec_single(int cpu,
> spin_unlock_irqrestore(&dst->lock, flags);
>
> /*
> - * Make the list addition visible before sending the ipi.
> + * The list addition should be visible before sending the IPI
> + * handler locks the list to pull the entry off it because of
> + * normal cache coherency rules implied by spinlocks.
> + *
> + * If IPIs can go out of order to the cache coherency protocol
> + * in an architecture, sufficient synchronisation should be added
> + * to arch code to make it appear to obey cache coherency WRT
> + * locking and barrier primitives. Generic code isn't really equipped
> + * to do the right thing...
> */
> - smp_mb();

While I sympathize with the above, you -are- running this by the various
arch maintainers so that they have an opportunity to adjust, right?

>
> if (ipi)
> arch_send_call_function_single_ipi(cpu);
> @@ -104,6 +111,14 @@ void generic_smp_call_function_interrupt
> int cpu = get_cpu();
>
> /*
> + * Ensure entry is visible on call_function_queue after we have
> + * entered the IPI. See comment in smp_call_function_many.
> + * If we don't have this, then we may miss an entry on the list
> + * and never get another IPI to process it.
> + */
> + smp_mb();
> +
> + /*
> * It's ok to use list_for_each_rcu() here even though we may delete
> * 'pos', since list_del_rcu() doesn't clear ->next
> */
> @@ -154,49 +169,37 @@ void generic_smp_call_function_single_in
> {
> struct call_single_queue *q = &__get_cpu_var(call_single_queue);
> LIST_HEAD(list);
> + unsigned int data_flags;
>
> - /*
> - * Need to see other stores to list head for checking whether
> - * list is empty without holding q->lock
> - */
> - smp_read_barrier_depends();
> - while (!list_empty(&q->list)) {
> - unsigned int data_flags;
> -
> - spin_lock(&q->lock);
> - list_replace_init(&q->list, &list);
> - spin_unlock(&q->lock);
> -
> - while (!list_empty(&list)) {
> - struct call_single_data *data;
> -
> - data = list_entry(list.next, struct call_single_data,
> - list);
> - list_del(&data->list);
> + spin_lock(&q->lock);
> + list_replace_init(&q->list, &list);
> + spin_unlock(&q->lock);
>
> - /*
> - * 'data' can be invalid after this call if
> - * flags == 0 (when called through
> - * generic_exec_single(), so save them away before
> - * making the call.
> - */
> - data_flags = data->flags;
> + while (!list_empty(&list)) {
> + struct call_single_data *data;
>
> - data->func(data->info);
> + data = list_entry(list.next, struct call_single_data,
> + list);
> + list_del(&data->list);
>
> - if (data_flags & CSD_FLAG_WAIT) {
> - smp_wmb();
> - data->flags &= ~CSD_FLAG_WAIT;
> - } else if (data_flags & CSD_FLAG_LOCK) {
> - smp_wmb();
> - data->flags &= ~CSD_FLAG_LOCK;
> - } else if (data_flags & CSD_FLAG_ALLOC)
> - kfree(data);
> - }
> /*
> - * See comment on outer loop
> + * 'data' can be invalid after this call if
> + * flags == 0 (when called through
> + * generic_exec_single(), so save them away before
> + * making the call.
> */
> - smp_read_barrier_depends();
> + data_flags = data->flags;
> +
> + data->func(data->info);
> +
> + if (data_flags & CSD_FLAG_WAIT) {
> + smp_wmb();
> + data->flags &= ~CSD_FLAG_WAIT;
> + } else if (data_flags & CSD_FLAG_LOCK) {
> + smp_wmb();
> + data->flags &= ~CSD_FLAG_LOCK;
> + } else if (data_flags & CSD_FLAG_ALLOC)
> + kfree(data);
> }
> }
>
> @@ -375,6 +378,8 @@ void smp_call_function_many(const struct
>
> /*
> * Make the list addition visible before sending the ipi.
> + * (IPIs must obey or appear to obey normal Linux cache coherency
> + * rules -- see comment in generic_exec_single).
> */
> smp_mb();
>
>
> --
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/