Re: [PATCH 3/3] IPI: Avoid to use 2 cache lines for one call_single_data
From: Huang\, Ying
Date: Thu Aug 03 2017 - 04:35:49 EST
Eric Dumazet <eric.dumazet@xxxxxxxxx> writes:
> On Wed, 2017-08-02 at 16:52 +0800, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@xxxxxxxxx>
>>
>> struct call_single_data is used in IPI to transfer information between
>> CPUs. Its size is bigger than sizeof(unsigned long) and less than
>> cache line size. Now, it is allocated with no any alignment
>> requirement. This makes it possible for allocated call_single_data to
>> cross 2 cache lines. So that double the number of the cache lines
>> that need to be transferred among CPUs. This is resolved by aligning
>> the allocated call_single_data with cache line size.
>>
>> To test the effect of the patch, we use the vm-scalability multiple
>> thread swap test case (swap-w-seq-mt). The test will create multiple
>> threads and each thread will eat memory until all RAM and part of swap
>> is used, so that huge number of IPI will be triggered when unmapping
>> memory. In the test, the throughput of memory writing improves ~5%
>> compared with misaligned call_single_data because of faster IPI.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
>> Cc: Borislav Petkov <bp@xxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Juergen Gross <jgross@xxxxxxxx>
>> Cc: Aaron Lu <aaron.lu@xxxxxxxxx>
>> ---
>> kernel/smp.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 3061483cb3ad..81d9ae08eb6e 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -51,7 +51,7 @@ int smpcfd_prepare_cpu(unsigned int cpu)
>> free_cpumask_var(cfd->cpumask);
>> return -ENOMEM;
>> }
>> - cfd->csd = alloc_percpu(struct call_single_data);
>> + cfd->csd = alloc_percpu_aligned(struct call_single_data);
>
> I do not believe allocating 64 bytes (per cpu) for this structure is
> needed. That would be an increase of cache lines.
>
> What we can do instead is to force an alignment on 4*sizeof(void *).
> (32 bytes on 64bit, 16 bytes on 32bit arches)
>
> Maybe something like this :
>
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 68123c1fe54918c051292eb5ba3427df09f31c2f..f7072bf173c5456e38e958d6af85a4793bced96e 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -19,7 +19,7 @@ struct call_single_data {
> smp_call_func_t func;
> void *info;
> unsigned int flags;
> -};
> +} __attribute__((aligned(4 * sizeof(void *))));
>
> /* total number of cpus in this system (may exceed NR_CPUS) */
> extern unsigned int total_cpus;
OK. And if the sizeof(struct call_single_data) changes, we need to
change the alignment accordingly too. So I add some BUILD_BUG_ON() for
that.
Best Regards,
Huang, Ying
------------------>8------------------