Re: [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls
From: Frederic Weisbecker
Date: Tue Dec 31 2024 - 11:08:13 EST
Le Tue, Dec 31, 2024 at 11:01:15PM +0800, Zhongqiu Han a écrit :
> If the timer is deferrable and NO_HZ_COMMON is enabled, the function
> get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
> to avoid potentially redundant per_cpu_ptr() calls.
>
> One of the call paths of the get_timer_cpu_base() function is through the
> lock_timer_base() function, which contains a loop. Within this loop, the
> get_timer_base() func is called, and in turn, it calls the
> get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
> a hotspot function. It is called approximately 13,000 times in 12 seconds
> on test x86 KVM machines.
>
> lock_timer_base(){
> for(;;) {
> ...
> --> get_timer_base() [inline]
> --> get_timer_cpu_base() [inline]
> ...
> }
> }
>
> With the patch, assembly code(on x86 and ARM64) to be executed in loop is
> reduced. And conducting comparative tests on x86 KVM virtual machines,
> comparison of runtime before and after optimization (in nanoseconds), we
> can see that the distribution of runtime tends to favor smaller time
> intervals.
>
> Before After
> [0-19]: 0 [0-19]: 0
> [20-39]: 6 [20-39]: 1014
> [40-59]: 41 [40-59]: 2198
> [60-79]: 93 [60-79]: 2073
> [80-99]: 814 [80-99]: 3081
> [100-119]: 5262 [100-119]: 3268
> [120-139]: 4510 [120-139]: 671
> [140-159]: 2202 [140-159]: 468
> [160-179]: 81 [160-179]: 158
> [180-199]: 15 [180-199]: 160
> [200-219]: 3 [200-219]: 54
> [220-239]: 2 [220-239]: 7
> [240-259]: 2 [240-259]: 3
> [260-279]: 0 [260-279]: 0
> [280-299]: 0 [280-299]: 1
> [300-319]: 0 [300-319]: 0
> total: 13031 total: 13156
>
> Signed-off-by: Zhongqiu Han <quic_zhonhan@xxxxxxxxxxx>
Reviewed-by: Frederic Weisbecker <frederic@xxxxxxxxxx>