Re: [PATCH 1/5] ARM: findbit: document ARMv5 bit offset calculation

From: Linus Torvalds
Date: Fri Oct 28 2022 - 15:01:29 EST


On Fri, Oct 28, 2022 at 10:45 AM Russell King (Oracle)
<linux@xxxxxxxxxxxxxxx> wrote:
>
> For the _find_first_bit, there isn't much difference in the number
> of instructions or really what is going on, only the organisation
> and flow of the code is more inline - but that shouldn't make much
> of a difference. Yet, there is a definite repeatable measurable
> difference between the two:

Hmm. Interestingly, your _find_first_zero_bit_le() (which
find_next_bit ends up using except for the first byte) ends up doing
an optimization that is technically not valid.

In particular, the *generic* code does

sz = min(idx * BITS_PER_LONG + __ffs(MUNGE(val)), sz);

for the final result.

In contrast, the arm code doesn't do the "min()" at all, and if there
are bits after the bitmap (in a partial byte), it will just return
those bits.

So the arm code ends up avoiding some operations. Which works most of
the time, because

(a) usually bitmaps are byte-aligned anyway

(b) most users do "find_first_bit(.., size) >= size" as the "found no
bits" test

but it actually looks to me like your handcoded arm code is simply
wrong. At least going by our docbook comments for find_first_bit:

* Returns the bit number of the first set bit.
* If no bits are set, returns @size.

And look here: bitmap_empty() does

return find_first_bit(src, nbits) == nbits;

and now imagine that 'nbits' is not a small constant value (which we
handle separately) and is also not byte aligned.

Maybe I'm mis-reading your assembly (I "know" arm assembly, but I
can't read it fluently like x86). But I don't think so.

So I think your code is actually buggy, but probably the bug is quite
hard to trigger in practice due to (a)/(b) above.

We do have bitmaps that aren't byte-aligned. The cpumask ones are the
most common ones. But in the cpumask_first() implementation (which is
just a wrapper for find_first_bit()), our documentation actually says

* Returns >= nr_cpu_ids if no cpus set.

and I think that may have been what we historically did elsewhere too,
and may be the source of the arm situation.

Anyway, this can be fixed by either

(a) fixing the arm code

(b) changing the docs and making that ">= size" be the right thing to do

but I do think this is a very strong example of why having
architecture-specific code like this is very very dangerous. Because
as it stands now, that arm code really looks like it's actively buggy
to me.

Linus