Re: [PATCH] lib: test module for find_*_bit() functions
From: Yury Norov
Date: Tue Nov 14 2017 - 05:58:56 EST
Hi Michael,
On Sun, Nov 12, 2017 at 10:33:55PM +1100, Michael Ellerman wrote:
> Yury Norov <ynorov@xxxxxxxxxxxxxxxxxx> writes:
>
> > find_bit functions are widely used in the kernel, including hot paths.
> > This module tests performance of that functions in 2 typical scenarios:
> > randomly filled bitmap with relatively equal distribution of set and
> > cleared bits, and sparse bitmap which has 1 set bit for 500 cleared bits.
> >
> > On ThunderX machine:
> >
> > Start testing find_bit() with random-filled bitmap
> > [1032111.632383] find_next_bit: 240043 cycles, 164062 iterations
> > [1032111.647236] find_next_zero_bit: 312848 cycles, 163619 iterations
> > [1032111.661585] find_last_bit: 193748 cycles, 164062 iterations
> > [1032113.450517] find_first_bit: 177720874 cycles, 164062 iterations
> > [1032113.462930]
> > Start testing find_bit() with sparse bitmap
> > [1032113.477229] find_next_bit: 3633 cycles, 656 iterations
> > [1032113.494281] find_next_zero_bit: 620399 cycles, 327025 iterations
> > [1032113.506723] find_last_bit: 3038 cycles, 656 iterations
> > [1032113.524485] find_first_bit: 691407 cycles, 656 iterations
>
> Have you thought about timing it rather than using get_cycles()?
>
> get_cycles() has the downside that it can't be compared across different
> architectures or even platforms within an architecture.
This test is written to benchmark find_bit() on the same target if algorithm
is changed. Comparing different architectures looks problematic anyway.
Different architectures may have different clock rates, and even implementations
of the function, like ARM. And many CPUs support dynamic changing of CPU speed
which will directly affect time of execution. So I don't think that direct
comparison of time across platforms would be informative without additional
precautions.
Also, other tests, like lib/interval_tree_test.c or lib/rbtree_test.c print
get_cycles() where they need to estimate performance, and it looks like common
practice.
Do you have real usecase for it?
Yury