Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count
From: peterz
Date: Wed Sep 16 2020 - 14:53:27 EST
On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote:
> I have simply test the performance impact on both x86 and aarch64.
>
> There is no degradation under x86 (2 sockets, 18 core per sockets, 2 threads per core)
Yeah, x86 is magical here, it's the same single instruction for both ;-)
But it is, afaik, unique in this position, no other arch can pull that
off.
> However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost.
>
> v4.19.111
> no writer, reader cn | 24 | 48 | 72 | 96
> the rate of down_read/up_read per second | 166129572 | 166064100 | 165963448 | 165203565
> the rate of down_read/up_read per second (patched) | 63863506 | 63842132 | 63757267 | 63514920
Teh hurt :/