Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps

From: Yury Norov
Date: Thu Oct 26 2023 - 23:51:31 EST


On Wed, Oct 25, 2023 at 10:18:00AM +0200, Rasmus Villemoes wrote:
> On 25/10/2023 09.18, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on:
>
> So with that, can we please just finally say "yeah, let's make the
> generic bitmap library functions correct

They are all correct already.

> and usable in more cases"

See below.

> instead of worrying about random micro-benchmarks that just show
> you-win-some-you-lose-some.

That's I agree. I don't worry about either +2% or -3% benchmark, and
don't think that they alone can or can't justificate such a radical
change like making all find_bit functions volatile, and shutting down
a newborn KCSAN.

Keeping that in mind, my best guess is that Jan's and Misrad's test
that shows +2% was against stable bitmaps; and what robot measured
is most likely against heavily concurrent access to some bitmap in
the kernel.

I didn't look at both tests sources, but that at least makes some
sense, because if GCC optimizes code against properly described
memory correctly, this is exactly what we can expect.

> Yes, users will have to treat results from the find routines carefully
> if their bitmap may be concurrently modified. They do. Nobody wins if
> those users are forced to implement their own bitmap routines for their
> lockless algorithms.

Again, I agree with this point, and I'm trying to address exactly this.

I'm working on a series that introduces lockless find_bit functions
based on existing FIND_BIT() engine. It's not ready yet, but I hope
I'll submit it in the next merge window.

https://github.com/norov/linux/commits/find_and_bit

Now that we've got a test that presumably works faster if find_bit()
functions are all switched to be volatile, it would be great if we get
into details and understand:
- what find_bit function or functions gives that gain in performance;
- on what bitmap(s);
- is the reason in concurrent memory access (guess yes), and if so,
- can we refactor the code to use lockless find_and_bit() functions
mentioned above;
- if not, how else can we address this.

If you or someone else have an extra time slot to get deeper into
that, I'll be really thankful.

Thanks,
Yury