Re: [PATCH 1/2] smp: Non busy-waiting IPI queue

From: Paul E. McKenney
Date: Wed Apr 02 2014 - 14:05:34 EST


On Wed, Apr 02, 2014 at 06:26:05PM +0200, Frederic Weisbecker wrote:
> Some IPI users, such as the nohz subsystem, need to be able to send
> an async IPI (async = non waiting for any other IPI completion) on
> contexts with disabled interrupts. And we want the IPI subsystem to handle
> concurrent calls by itself.
>
> Currently the nohz machinery uses the scheduler IPI for this purpose
> because it can be triggered from any context and doesn't need any
> serialization from the caller. But this is an abuse of a scheduler
> fast path. We are bloating it with a job that should use its own IPI.
>
> The current set of IPI functions can't be called when interrupts are
> disabled otherwise we risk a deadlock when two CPUs wait for each
> other's IPI completion.
>
> OTOH smp_call_function_single_async() can be called when interrupts
> are disabled. But then it's up to the caller to serialize the given
> IPI. This can't be called concurrently without special care.
>
> So we need a version of the async IPI that takes care of concurrent
> calls.
>
> The proposed solution is to synchronize the IPI with a specific flag
> that prevents the IPI from being sent if it is already pending but not
> yet executed. Ordering is maintained such that, if the IPI is not sent
> because it's already pending, we guarantee it will see the new state of
> the data we expect it to when it will execute.
>
> This model is close to the irq_work design. It's also partly inspired by
> suggestions from Peter Zijlstra.
>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jens Axboe <jens.axboe@xxxxxxxxxx>
> Cc: Kevin Hilman <khilman@xxxxxxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>

Nice, but one question below.

Thanx, Paul

> ---
> include/linux/smp.h | 12 ++++++++++++
> kernel/smp.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 56 insertions(+)
>
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 633f5ed..155dc86 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -29,6 +29,18 @@ extern unsigned int total_cpus;
> int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
> int wait);
>
> +struct queue_single_data;
> +typedef void (*smp_queue_func_t)(struct queue_single_data *qsd);
> +
> +struct queue_single_data {
> + struct call_single_data data;
> + smp_queue_func_t func;
> + int pending;
> +};
> +
> +int smp_queue_function_single(int cpuid, smp_queue_func_t func,
> + struct queue_single_data *qsd);
> +
> /*
> * Call a function on all processors
> */
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 06d574e..bfe7b36 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -265,6 +265,50 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
> }
> EXPORT_SYMBOL_GPL(smp_call_function_single_async);
>
> +void generic_smp_queue_function_single_interrupt(void *info)
> +{
> + struct queue_single_data *qsd = info;
> +
> + WARN_ON_ONCE(xchg(&qsd->pending, 0) != 1);

I am probably missing something here, but shouldn't this function copy
*qsd to a local on-stack variable before doing the above xchg()? What
prevents the following from happening?

o CPU 0 does smp_queue_function_single(), which sets ->pending
and fills in ->func and ->data.

o CPU 1 takes IPI, invoking generic_smp_queue_function_single_interrupt().

o CPU 1 does xchg(), so that ->pending is now zero.

o An attempt to reuse the queue_single_data sees ->pending equal
to zero, so the ->func and ->data is overwritten.

o CPU 1 calls the new ->func with the new ->data (or any of the other
two possible unexpected outcomes), which might not be helpful to
the kernel's actuarial statistics.

So what am I missing?

> + qsd->func(qsd);
> +}
> +
> +/**
> + * smp_queue_function_single - Queue an asynchronous function to run on a
> + * specific CPU unless it's already pending.
> + * @func: The function to run. This must be fast and non-blocking.
> + * @qsd: The data contained in the interested object if any
> + *
> + * Like smp_call_function_single_async() but the call to @func is serialized
> + * and won't be queued if it is already pending. In the latter case, ordering
> + * is still guaranteed such that the pending call will sees the new data we
> + * expect it to.
> + *
> + * This must not be called on offline CPUs.
> + *
> + * Returns 0 when @func is successfully queued or already pending, else a negative
> + * status code.
> + */
> +int smp_queue_function_single(int cpu, smp_queue_func_t func, struct queue_single_data *qsd)
> +{
> + int err;
> +
> + if (cmpxchg(&qsd->pending, 0, 1))
> + return 0;
> +
> + qsd->func = func;
> +
> + preempt_disable();
> + err = generic_exec_single(cpu, &qsd->data, generic_smp_queue_function_single_interrupt, qsd, 0);
> + preempt_enable();
> +
> + /* Reset is case of error. This must not be called on offline CPUs */
> + if (err)
> + qsd->pending = 0;
> +
> + return err;
> +}
> +
> /*
> * smp_call_function_any - Run a function on any of the given cpus
> * @mask: The mask of cpus it can run on.
> --
> 1.8.3.1
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/