Re: [PATCH v7 1/4] spinlock: A new lockref structure for locklessupdate of refcount

From: Waiman Long
Date: Thu Sep 05 2013 - 13:33:43 EST


This is a multi-part message in MIME format.On 09/05/2013 09:31 AM, Ingo Molnar wrote:
* Waiman Long<waiman.long@xxxxxx> wrote:


The latest tty patches did work. The tty related spinlock contention
is now completely gone. The short workload can now reach over 8M JPM
which is the highest I have ever seen.

The perf profile was:

5.85% reaim reaim [.] mul_short
4.87% reaim [kernel.kallsyms] [k] ebitmap_get_bit
4.72% reaim reaim [.] mul_int
4.71% reaim reaim [.] mul_long
2.67% reaim libc-2.12.so [.] __random_r
2.64% reaim [kernel.kallsyms] [k] lockref_get_not_zero
1.58% reaim [kernel.kallsyms] [k] copy_user_generic_string
1.48% reaim [kernel.kallsyms] [k] mls_level_isvalid
1.35% reaim [kernel.kallsyms] [k] find_next_bit
6%+ spent in ebitmap_get_bit() and mls_level_isvalid() looks like
something worth optimizing.

Is that called very often, or is it perhaps cache-bouncing for some
reason?

The high cycle count is due more to inefficient algorithm in the mls_level_isvalid() function than cacheline contention in the code. The attached patch should address this problem. It is in linux-next and hopefully will be merged in 3.12.

-Longman