Re: [PATCH 05/32] locking/lockdep: Prepare valid_state() to handle plain masks

From: Linus Torvalds
Date: Wed Feb 13 2019 - 14:47:36 EST


On Wed, Feb 13, 2019 at 7:16 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> >
> > If "vectors" only has the high hit set, you end up with "fs" having
> > the value "64".
> >
> > And then "vectors >>= fs" is undefined and won't actually do anything
> > at all on x86.
>
> Oh! ok didn't know that...

So in general, shift counts >= width of the type (or negative) are undefined.

They can sometimes happen to work (that's the "undefined" part ;), but
it's not reliable or portable.

It's why you occasionally see things like

drivers/block/sx8.c:
tmp = (blk_rq_pos(rq) >> 16) >> 16;

to get the upper 32 bits of the value. It is written with that odd
double shift, rather than being written as ">> 32". That way it works
even if the sector type happens to be 32-bit (and the compiler will
just end up turning it into a zero if it's an unsigned 32-bit type
since it's compile-time obvious).

> I see, perhaps I should use for_each_set_bit() that should take care about those
> details?

That would _work_, but don't do that. "for_each_set_bit()" works on
bitmaps in memory, and is slow for a simple word case. In addition to
being slow, it uses the Linux tradition of working on bitmaps that are
comprised of "unsigned long". So it has byte order issues too.

So for_each_set_bit() is useful when you have real arrays of bits and
are using the "set_bit()" etc interfaces.

When you're actually working on just a single variable, your "__ffs()"
model works fine, you just need to be careful to _not_ do the "+1" and
then use it for shifts.

Also, it actually turns out that if you want to be really clever, you
can play tricks if you don't care about the exact bit *number*.

For example, this expression:

v = a & (a-1);

will remove the lowest bit set from 'a' very cheaply. So what you can
do is something like this:

void for_each_bit_in_mask(u64 mask)
{
while (mask) {
u64 newmask = mask & (mask-1);
u64 onebit = mask ^ newmask;
mask = newmask;
do_something_with(onebit);
}
}

to do some operation on each bit set, and quite efficiently and
without any undefined behavior or expensive shifts.

But the above trick does require that you are happy to just see the
bit *mask*, not the bit *number*. I'm not sure any of your cases match
that.

Linus