Re: [PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro
From: Arnd Bergmann
Date: Sun Dec 27 2020 - 17:07:44 EST
On Sat, Dec 26, 2020 at 7:42 AM Syed Nayyar Waris <syednwaris@xxxxxxxxx> wrote:
>
> This macro iterates for each group of bits (clump) with set bits,
> within a bitmap memory region. For each iteration, "start" is set to
> the bit offset of the found clump, while the respective clump value is
> stored to the location pointed by "clump". Additionally, the
> bitmap_get_value() and bitmap_set_value() functions are introduced to
> respectively get and set a value of n-bits in a bitmap memory region.
> The n-bits can have any size from 1 to BITS_PER_LONG. size less
> than 1 or more than BITS_PER_LONG causes undefined behaviour.
> Moreover, during setting value of n-bit in bitmap, if a situation arise
> that the width of next n-bit is exceeding the word boundary, then it
> will divide itself such that some portion of it is stored in that word,
> while the remaining portion is stored in the next higher word. Similar
> situation occurs while retrieving the value from bitmap.
>
> GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
> Add explicit check to see if the value being written into the bitmap
> does not fall outside the bitmap.
> The situation that it is falling outside would never be possible in the
> code because the boundaries are required to be correct before the
> function is called. The responsibility is on the caller for ensuring the
> boundaries are correct.
> The code change is simply to silence the GCC warning messages
> because GCC is not aware that the boundaries have already been checked.
> As such, we're better off using __builtin_unreachable() here because we
> can avoid the latency of the conditional check entirely.
Didn't the __builtin_unreachable() end up leading to an objtool
warning about incorrect stack frames for the code path that leads
into the undefined behavior? I thought I saw a message from the 0day
build bot about that and didn't expect to see it again after that.
Can you actually measure any performance difference compared
to BUG_ON() that avoids the undefined behavior? Practically
all CPUs from the past 20 years have branch predictors that should
completely hide measurable overhead from this.
Arnd