Re: [PATCH] IPI performance benchmark

From: Christian Borntraeger
Date: Wed Dec 13 2017 - 06:32:14 EST




On 12/13/2017 12:23 PM, Yury Norov wrote:
> On Mon, Dec 11, 2017 at 05:30:25PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 12/11/2017 03:55 PM, Yury Norov wrote:
>>> On Mon, Dec 11, 2017 at 03:35:02PM +0100, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 12/11/2017 03:16 PM, Yury Norov wrote:
>>>>> This benchmark sends many IPIs in different modes and measures
>>>>> time for IPI delivery (first column), and total time, ie including
>>>>> time to acknowledge the receive by sender (second column).
>>>>>
>>>>> The scenarios are:
>>>>> Dry-run: do everything except actually sending IPI. Useful
>>>>> to estimate system overhead.
>>>>> Self-IPI: Send IPI to self CPU.
>>>>> Normal IPI: Send IPI to some other CPU.
>>>>> Broadcast IPI: Send broadcast IPI to all online CPUs.
>>>>>
>>>>> For virtualized guests, sending and reveiving IPIs causes guest exit.
>>>>> I used this test to measure performance impact on KVM subsystem of
>>>>> Christoffer Dall's series "Optimize KVM/ARM for VHE systems".
>>>>>
>>>>> https://www.spinics.net/lists/kvm/msg156755.html
>>>>>
>>>>> Test machine is ThunderX2, 112 online CPUs. Below the results normalized
>>>>> to host dry-run time. Smaller - better.
>>>>>
>>>>> Host, v4.14:
>>>>> Dry-run: 0 1
>>>>> Self-IPI: 9 18
>>>>> Normal IPI: 81 110
>>>>> Broadcast IPI: 0 2106
>>>>>
>>>>> Guest, v4.14:
>>>>> Dry-run: 0 1
>>>>> Self-IPI: 10 18
>>>>> Normal IPI: 305 525
>>>>> Broadcast IPI: 0 9729
>>>>>
>>>>> Guest, v4.14 + VHE:
>>>>> Dry-run: 0 1
>>>>> Self-IPI: 9 18
>>>>> Normal IPI: 176 343
>>>>> Broadcast IPI: 0 9885
>> [...]
>>>>> +static int __init init_bench_ipi(void)
>>>>> +{
>>>>> + ktime_t ipi, total;
>>>>> + int ret;
>>>>> +
>>>>> + ret = bench_ipi(NTIMES, DRY_RUN, &ipi, &total);
>>>>> + if (ret)
>>>>> + pr_err("Dry-run FAILED: %d\n", ret);
>>>>> + else
>>>>> + pr_err("Dry-run: %18llu, %18llu ns\n", ipi, total);
>>>>
>>>> you do not use NTIMES here to calculate the average value. Is that intended?
>>>
>>> I think, it's more visually to represent all results in number of dry-run
>>> times, like I did in patch description. So on kernel side I expose raw data
>>> and calculate final values after finishing tests.
>>
>> I think it is highly confusing that the output from the patch description does not
>> match the output from the real module. So can you make that match at least?
>
> I think so. That's why I noticed that results are normalized to host dry-run
> time, even more, they are small and better for human perception.
>
> I was recommended not to public raw data, you'd understand. If this is
> the blocker, I can post results from QEMU-hosted kernel.

you could just post some example data from any random x86 laptop. I think it
would just be good to have the patch description output match the real output.