Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
From: Ling Ma
Date: Thu Feb 04 2016 - 02:07:23 EST
> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review.
Ok, we will do it.
>Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.
In order to be justified the attachment in this letter include 3 files:
1. user space code (thread.c), which can cause lots of hot kernel spinlock from
__kmalloc and kfree on multi-core platform
2. ali_work_queue.patch , the kernel patch for 4.3.0-rc4,
when we run user space code (thread.c) based on the patch,
the synchronous operation consumption from __kmalloc and kfree is
about 15% on Intel E5-2699V3
3. org_spin_lock.patch, which is based on above ali_work_queue.patch,
when we run user space code thread.c based on the patch,
the synchronous operation consumption from __kmalloc and kfree is
about 25% on Intel E5-2699V3
the main difference between ali_work_queue.patch and
org_spin_lock.patch as below:
diff --git a/mm/slab.h b/mm/slab.h
...
- ali_spinlock_t list_lock;
+ spinlock_t list_lock;
...
diff --git a/mm/slab.c b/mm/slab.c
...
- alispinlock(lock, &info);
+ spin_lock((spinlock_t *)lock);
+ fn(para);
+ spin_unlock((spinlock_t *)lock);
...
The above operations remove all performance noise from program modification.
We run user space code thread.c with ali_work_queue.patch, and
org_spin_lock.patch respectively
the output from thread.c as below:
ORG NEW
38923684 43380604
38100464 44163011
37769241 43354266
37908638 43554022
37900994 43457066
38495073 43421394
37340217 43146352
38083979 43506951
37713263 43775215
37749871 43487289
37843224 43366055
38173823 43270225
38303612 43214675
37886717 44083950
37736455 43060728
37529307 44607597
38862690 43541484
37992824 44749925
38013454 43572225
37783135 45240502
37745372 44712540
38721413 43584658
38097842 43235392
ORG NEW
TOTAL 874675292 1005486126
So the data tell us the new mechanism can improve performance 14% (
1005486126/874675292) ,
and the operation can be justified fairly.
Thanks
Ling
2016-02-04 5:42 GMT+08:00 Waiman Long <waiman.long@xxxxxxx>:
> On 02/02/2016 11:40 PM, Ling Ma wrote:
>>
>> Longman,
>>
>> The attachment include user space code(thread.c), and kernel
>> patch(ali_work_queue.patch) based on 4.3.0-rc4,
>> we replaced all original spinlock (list_lock) in slab.h/c with the
>> new mechanism.
>>
>> The thread.c in user space caused lots of hot kernel spinlock from
>> __kmalloc and kfree,
>> perf top -d1 shows ~25% before ali_work_queue.patch,after appending
>> this patch ,
>> the synchronous operation consumption from __kmalloc and kfree is
>> reduced from 25% to ~15% on Intel E5-2699V3
>> (we also observed the output from user space code (thread.c) is
>> improved clearly)
>
>
> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review. Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.
>
> Cheers,
> Longman
>
>
Attachment:
ali_work_queue.tar.bz2
Description: BZip2 compressed data