Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

From: peterz
Date: Wed Sep 16 2020 - 14:53:27 EST


On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote:

> I have simply test the performance impact on both x86 and aarch64.
>
> There is no degradation under x86 (2 sockets, 18 core per sockets, 2 threads per core)

Yeah, x86 is magical here, it's the same single instruction for both ;-)
But it is, afaik, unique in this position, no other arch can pull that
off.

> However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost.
>
> v4.19.111
> no writer, reader cn | 24 | 48 | 72 | 96
> the rate of down_read/up_read per second | 166129572 | 166064100 | 165963448 | 165203565
> the rate of down_read/up_read per second (patched) | 63863506 | 63842132 | 63757267 | 63514920

Teh hurt :/