[PATCH -v2] IPI: Avoid to use 2 cache lines for one call_single_data

From: Huang Ying
Date: Thu Jul 27 2017 - 04:43:20 EST

struct call_single_data is used in IPI to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Now, it is allocated with no any alignment
requirement. This makes it possible for allocated call_single_data to
cross 2 cache lines. So that double the number of the cache lines
that need to be transferred among CPUs.

This is resolved by aligning the allocated call_single_data with 4 *
sizeof(void *). If the size of struct call_single_data is changed in
the future, the alignment should be changed accordingly. It should be
more than sizeof(struct call_single_data) and the power of 2.

To test the effect of the patch, we use the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPI will be triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data because of faster IPI.

[Align with 4 * sizeof(void*) instead of cache line size]
Suggested-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: Aaron Lu <aaron.lu@xxxxxxxxx>
include/linux/smp.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 68123c1fe549..4d3b372d50b0 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -13,13 +13,22 @@
#include <linux/init.h>
#include <linux/llist.h>

+#define CSD_ALIGNMENT (4 * sizeof(void *))
typedef void (*smp_call_func_t)(void *info);
struct call_single_data {
struct llist_node llist;
smp_call_func_t func;
void *info;
unsigned int flags;
+} __aligned(CSD_ALIGNMENT);
+/* To avoid allocate csd across 2 cache lines */
+static inline void check_alignment_of_csd(void)
+ BUILD_BUG_ON(sizeof(struct call_single_data) > CSD_ALIGNMENT);

/* total number of cpus in this system (may exceed NR_CPUS) */
extern unsigned int total_cpus;