Re: [RFC PATCH] aliworkqueue: Adaptive lock integration on multi-core platform

From: Waiman Long
Date: Fri Apr 15 2016 - 13:37:32 EST


On 04/15/2016 12:05 AM, ling.ma.program@xxxxxxxxx wrote:
From: Ma Ling<ling.ml@xxxxxxxxxxxxxxx>

Wire-latency(RC delay) dominate modern computer performance,
conventional serialized works cause cache line ping-pong seriously,
the process spend lots of time and power to complete.
specially on multi-core platform.

However if the serialized works are sent to one core and executed
ONLY when contention happens, that can save much time and power,
because all shared data are located in private cache of one core.
We call the mechanism as Adaptive Lock Integration.
(ali workqueue)

The new code is based on qspinlock and implement Lock Integration,
when user space application cause the bottle neck from kernel spinlock
the new mechanism could improve performance up to 1.65x for
https://lkml.org/lkml/2016/2/4/48 or
http://lkml.iu.edu/hypermail/linux/kernel/1602.0/03745.html
and 2.79x for https://lkml.org/lkml/2016/4/4/848 respectively.

And additional changes on Makefile/Kconfig are made to enable compiling of
this feature on x86 platform.

Signed-off-by: Ma Ling<ling.ml@xxxxxxxxxxxxxxx>
---
The patch is based on https://lkml.org/lkml/2015/12/31/20,
in this version we append init function and fix function name.

arch/x86/Kconfig | 1 +
include/linux/aliworkqueue.h | 34 ++++++++++++++
kernel/Kconfig.locks | 7 +++
kernel/locking/Makefile | 1 +
kernel/locking/aliworkqueue.c | 97 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 140 insertions(+), 0 deletions(-)
create mode 100644 include/linux/aliworkqueue.h
create mode 100644 kernel/locking/aliworkqueue.c



As I said before, you need a use case within the kernel to demonstrate its usefulness. The Linux kernel community will not accept code that isn't used anywhere.

A major problem to convert regular locking code to using the aliworkqueue is that it requires rather significant code changes. So you really need a good use case where you can show the performance benefit is much greater the cost of making the conversion.

Cheers,
Longman