Re: [PATCH v5] lib: optimize cpumask_next_and()

From: Yury Norov
Date: Thu Nov 30 2017 - 01:22:06 EST


On Wed, Nov 29, 2017 at 10:35:55AM +0100, Clement Courbet wrote:
> > > Note that on Arm (), the new c implementation still outperforms the
> > > old one that uses c+ the asm implementation of `find_next_bit` [3].
> > What is 'c+'? Is it typo?
>
> I meant "a mix of C and asm" ~(C + asm). Rephrased.
>
> > If you find generic find_bit() on arm faster that asm one, we'd
> > definitely drop that piece of asm. I have this check it in my
> > long list.
>
> What's faster for sure is the mix (the improvement in this commit minus the
> possible hit from not using the ASM implementation). I can't tell whether the
> latter is negligible or not (I only have one ARM board to try it out), but
> that's definitly something to try.
>
> > This is old version of test based on get_cycles. New one is based on
> > ktime_get and has other minor changes. I think you'd rerun tests to
> > not confuse readers. New version is already in linux-next.
>
> So I'm not sure whether I should be submitting this against 'linux' or
> 'linux-next' ? This patch is against 'linux', so I think it should
> be consistent with the code around.

Linux-next is your choice.
https://lwn.net/Articles/289013/


> > > #ifndef find_first_bit
> > > #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
> > > #endif
> > > #ifndef find_first_zero_bit
> > > #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
> > > #endif
> > How this change related to the find_next_and_bit?
>
> The arm header defines these symbols. Now that we're including
> the generic implementation in the arm headers, we need to guard this to
> avoid the duplicate definition.

So I think it really worth to be separated patch. Really, it's
completely nontrivial why adding new function in lib/find_bit.c
requires including asm-generic/bitops/find.h in arm and uncore32
asm/bitops.h headers (bug?). And why doing that makes you guard
find_first_bit and find_first_zero_bit (another bug?).

Headers are always important. Please elaborate it and also CC arm
and uncore32 communities, linux-arch and Arnd Bergman.

> > > test_find_next_and_bit_ref
> > I don't understand the purpose of this. It's obviously clear that
> > test_find_next_and_bit cannot be slower than test_find_next_and_bit_ref
>
> Fair enough :) That was to back my claim that this commit is worth it.
> I've removed the "_ref" version.
>
> > For sparse bitmaps it will be like traversing zero-bitmaps. I doubt
> > this numbers will be representative. Do we need this test at all?
>
> It's just two lines, and gives an interesting data point. Why not
> keep it ?

---

Again. test_find_next_and_bit is trimmed, but it is still based on
get_cycles and uses tabs in printf(). Please fix it.

Yury