Re: [PATCH v5 7/7] locking/lockdep: Add a fast path for chain_hlocks allocation

From: Waiman Long
Date: Tue Feb 04 2020 - 10:07:25 EST


On 2/4/20 7:47 AM, Peter Zijlstra wrote:
> On Mon, Feb 03, 2020 at 11:41:47AM -0500, Waiman Long wrote:
>> When alloc_chain_hlocks() is called, the most likely scenario is
>> to allocate from the primordial chain block which holds the whole
>> chain_hlocks[] array initially. It is the primordial chain block if its
>> size is bigger than MAX_LOCK_DEPTH. As long as the number of entries left
>> after splitting is still bigger than MAX_CHAIN_BUCKETS it will remain
>> in bucket 0. By splitting out a sub-block at the end, we only need to
>> adjust the size without changing any of the existing linkage information.
>> This optimized fast path can reduce the latency of allocation requests.
>>
>> This patch does change the order by which chain_hlocks entries are
>> allocated. The original code allocates entries from the beginning of
>> the array. Now it will be allocated from the end of the array backward.
> Cute; but why do we care? Is there any measurable performance indicator?
>
I used parallel kernel compilation test to see if there is a performance
benefit. I did see the compile time get reduced by a few seconds out of
several minutes of total time on average. So it is only about 1% or so.
I didn't mention it as it is within the margin of error.

One of the goals of this patchset is to make sure that little or no
performance regression is introduced. That was why I was hesitant to
adopt the single allocator approach as suggested. That is also why I add
this patch to try to get some performance back.

Cheers,
Longman