Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count
From: Will Deacon
Date: Tue Sep 29 2020 - 13:50:06 EST
On Thu, Sep 24, 2020 at 07:55:19PM +0800, Hou Tao wrote:
> The following is the newest performance data:
>
> aarch64 host (4 sockets, 24 cores per sockets)
>
> * v4.19.111
>
> no writer, reader cn | 24 | 48 | 72 | 96
> rate of percpu_down_read/percpu_up_read per second |
> default: use __this_cpu_inc|dec() | 166129572 | 166064100 | 165963448 | 165203565
> patched: use this_cpu_inc|dec() | 87727515 | 87698669 | 87675397 | 87337435
> modified: local_irq_save + __this_cpu_inc|dec() | 15470357 | 15460642 | 15439423 | 15377199
>
> * v4.19.111+ [1]
>
> modified: use this_cpu_inc|dec() + LSE atomic | 8224023 | 8079416 | 7883046 | 7820350
>
> * 5.9-rc6
>
> no writer, reader cn | 24 | 48 | 72 | 96
> rate of percpu_down_read/percpu_up_read per second |
> reverted: use __this_cpu_inc|dec() + revert 91fc957c| 169664061 | 169481176 | 168493488 | 168844423
> reverted: use __this_cpu_inc|dec() | 78355071 | 78294285 | 78026030 | 77860492
> modified: use this_cpu_inc|dec() + no LSE atomic | 64291101 | 64259867 | 64223206 | 63992316
> default: use this_cpu_inc|dec() + LSE atomic | 16231421 | 16215618 | 16188581 | 15959290
>
> It seems that enabling LSE atomic has a negative impact on performance under this test scenario.
>
> And it is astonished to me that for my test scenario the performance of v5.9-rc6 is just one half of v4.19.
> The bisect finds the culprit is 91fc957c9b1d6 ("arm64/bpf: don't allocate BPF JIT programs in module memory").
> If reverting the patch brute-forcibly under 5.9-rc6 [2], the performance will be the same with
> v4.19.111 (169664061 vs 166129572). I have had the simplified test module [3] and .config attached [4],
> so could you please help to check what the problem is ?
I have no idea how that patch can be responsible for this :/ Have you
confirmed that the bisection is not bogus?
Ard, do you have any ideas?
Will