Re: [PATCH v2] IPI performance benchmark

From: Andrew Morton
Date: Tue Dec 19 2017 - 18:51:53 EST


On Tue, 19 Dec 2017 11:50:10 +0300 Yury Norov <ynorov@xxxxxxxxxxxxxxxxxx> wrote:

> This benchmark sends many IPIs in different modes and measures
> time for IPI delivery (first column), and total time, ie including
> time to acknowledge the receive by sender (second column).
>
> The scenarios are:
> Dry-run: do everything except actually sending IPI. Useful
> to estimate system overhead.
> Self-IPI: Send IPI to self CPU.
> Normal IPI: Send IPI to some other CPU.
> Broadcast IPI: Send broadcast IPI to all online CPUs.
> Broadcast lock: Send broadcast IPI to all online CPUs and force them
> acquire/release spinlock.
>
> The raw output looks like this:
> [ 155.363374] Dry-run: 0, 2999696 ns
> [ 155.429162] Self-IPI: 30385328, 65589392 ns
> [ 156.060821] Normal IPI: 566914128, 631453008 ns
> [ 158.384427] Broadcast IPI: 0, 2323368720 ns
> [ 160.831850] Broadcast lock: 0, 2447000544 ns
>
> For virtualized guests, sending and reveiving IPIs causes guest exit.
> I used this test to measure performance impact on KVM subsystem of
> Christoffer Dall's series "Optimize KVM/ARM for VHE systems" [1].
>
> Test machine is ThunderX2, 112 online CPUs. Below the results normalized
> to host dry-run time, broadcast lock results omitted. Smaller - better.
>
> Host, v4.14:
> Dry-run: 0 1
> Self-IPI: 9 18
> Normal IPI: 81 110
> Broadcast IPI: 0 2106
>
> Guest, v4.14:
> Dry-run: 0 1
> Self-IPI: 10 18
> Normal IPI: 305 525
> Broadcast IPI: 0 9729
>
> Guest, v4.14 + [1]:
> Dry-run: 0 1
> Self-IPI: 9 18
> Normal IPI: 176 343
> Broadcast IPI: 0 9885
>

That looks handy. Peter and Ingo might be interested.

I wonder if it should be in kernel/. Perhaps it's better to accumulate
these things in lib/test_*.c, rather than cluttering up other top-level
directories.

> +static ktime_t __init send_ipi(int flags)
> +{
> + ktime_t time = 0;
> + DEFINE_SPINLOCK(lock);

I have some vague historical memory that an on-stack spinlock can cause
problems, perhaps with debugging code. Can't remember, maybe I dreamed it.