However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost.
v4.19.111
no writer, reader cn | 24 | 48 | 72 | 96
the rate of down_read/up_read per second | 166129572 | 166064100 | 165963448 | 165203565
the rate of down_read/up_read per second (patched) | 63863506 | 63842132 | 63757267 | 63514920
My leading alternative was adding: percpu_down_read_irqsafe() /
percpu_up_read_irqsafe(), which use local_irq_save() instead of
preempt_disable().