From: Ma Ling<ling.ml@xxxxxxxxxxxxxxx>
Hi ALL,
Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
specially on multi-core platform.
However if the serialized works are sent to one core and executed
when lock contention happens, that can save much time and power,
because all shared data are located in private cache of one core.
We call the mechanism as Acceleration from Lock Integration
(ali spinlock)
Usually when requests are queued, we have to wait work to submit
one bye one, in order to improve the whole throughput further,
we introduce LOCK_FREE. So when requests are sent to lock owner,
requester may do other works in parallelism, then ali_spin_is_completed
function could tell us whether the work has been completed.
The new code is based on qspinlock and implement Lock Integration,
improves performance up to 3X on intel platform with 72 cores(18x2HTx2S HSW),
2X on ARM platform with 96 cores too. And additional trival changes on
Makefile/Kconfig are made to enable compiling of this feature on x86 platform.
(We would like to do further experiments according to your requirement)
Happy New Year 2016!
Ling
Signed-off-by: Ma Ling<ling.ml@xxxxxxxxxxxxxxx>
---
arch/x86/Kconfig | 1 +
include/linux/alispinlock.h | 41 ++++++++++++++++++
kernel/Kconfig.locks | 7 +++
kernel/locking/Makefile | 1 +
kernel/locking/alispinlock.c | 97 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 147 insertions(+), 0 deletions(-)
create mode 100644 include/linux/alispinlock.h
create mode 100644 kernel/locking/alispinlock.c