[PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls

From: Zhongqiu Han
Date: Tue Dec 31 2024 - 10:02:21 EST


If the timer is deferrable and NO_HZ_COMMON is enabled, the function
get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
to avoid potentially redundant per_cpu_ptr() calls.

One of the call paths of the get_timer_cpu_base() function is through the
lock_timer_base() function, which contains a loop. Within this loop, the
get_timer_base() func is called, and in turn, it calls the
get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
a hotspot function. It is called approximately 13,000 times in 12 seconds
on test x86 KVM machines.

lock_timer_base(){
for(;;) {
...
--> get_timer_base() [inline]
--> get_timer_cpu_base() [inline]
...
}
}

With the patch, assembly code(on x86 and ARM64) to be executed in loop is
reduced. And conducting comparative tests on x86 KVM virtual machines,
comparison of runtime before and after optimization (in nanoseconds), we
can see that the distribution of runtime tends to favor smaller time
intervals.

Before After
[0-19]: 0 [0-19]: 0
[20-39]: 6 [20-39]: 1014
[40-59]: 41 [40-59]: 2198
[60-79]: 93 [60-79]: 2073
[80-99]: 814 [80-99]: 3081
[100-119]: 5262 [100-119]: 3268
[120-139]: 4510 [120-139]: 671
[140-159]: 2202 [140-159]: 468
[160-179]: 81 [160-179]: 158
[180-199]: 15 [180-199]: 160
[200-219]: 3 [200-219]: 54
[220-239]: 2 [220-239]: 7
[240-259]: 2 [240-259]: 3
[260-279]: 0 [260-279]: 0
[280-299]: 0 [280-299]: 1
[300-319]: 0 [300-319]: 0
total: 13031 total: 13156

Signed-off-by: Zhongqiu Han <quic_zhonhan@xxxxxxxxxxx>
---
kernel/time/timer.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a5860bf6d16f..40706cb36920 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -956,33 +956,29 @@ static int detach_if_pending(struct timer_list *timer, struct timer_base *base,
static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
{
int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
- struct timer_base *base;
-
- base = per_cpu_ptr(&timer_bases[index], cpu);

/*
* If the timer is deferrable and NO_HZ_COMMON is set then we need
* to use the deferrable base.
*/
if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
- base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
- return base;
+ index = BASE_DEF;
+
+ return per_cpu_ptr(&timer_bases[index], cpu);
}

static inline struct timer_base *get_timer_this_cpu_base(u32 tflags)
{
int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
- struct timer_base *base;
-
- base = this_cpu_ptr(&timer_bases[index]);

/*
* If the timer is deferrable and NO_HZ_COMMON is set then we need
* to use the deferrable base.
*/
if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
- base = this_cpu_ptr(&timer_bases[BASE_DEF]);
- return base;
+ index = BASE_DEF;
+
+ return this_cpu_ptr(&timer_bases[index]);
}

static inline struct timer_base *get_timer_base(u32 tflags)
--
2.25.1