Re: [PATCH 05/32] locking/lockdep: Prepare valid_state() to handle plain masks

From: Frederic Weisbecker
Date: Wed Feb 20 2019 - 22:54:02 EST


On Wed, Feb 13, 2019 at 11:47:13AM -0800, Linus Torvalds wrote:
> On Wed, Feb 13, 2019 at 7:16 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> > >
> > > If "vectors" only has the high hit set, you end up with "fs" having
> > > the value "64".
> > >
> > > And then "vectors >>= fs" is undefined and won't actually do anything
> > > at all on x86.
> >
> > Oh! ok didn't know that...
>
> So in general, shift counts >= width of the type (or negative) are undefined.
>
> They can sometimes happen to work (that's the "undefined" part ;), but
> it's not reliable or portable.
>
> It's why you occasionally see things like
>
> drivers/block/sx8.c:
> tmp = (blk_rq_pos(rq) >> 16) >> 16;
>
> to get the upper 32 bits of the value. It is written with that odd
> double shift, rather than being written as ">> 32". That way it works
> even if the sector type happens to be 32-bit (and the compiler will
> just end up turning it into a zero if it's an unsigned 32-bit type
> since it's compile-time obvious).

Ok, I see.

>
> > I see, perhaps I should use for_each_set_bit() that should take care about those
> > details?
>
> That would _work_, but don't do that. "for_each_set_bit()" works on
> bitmaps in memory, and is slow for a simple word case. In addition to
> being slow, it uses the Linux tradition of working on bitmaps that are
> comprised of "unsigned long". So it has byte order issues too.
>
> So for_each_set_bit() is useful when you have real arrays of bits and
> are using the "set_bit()" etc interfaces.

Yeah I suspected some overhead.

>
> When you're actually working on just a single variable, your "__ffs()"
> model works fine, you just need to be careful to _not_ do the "+1" and
> then use it for shifts.
>
> Also, it actually turns out that if you want to be really clever, you
> can play tricks if you don't care about the exact bit *number*.
>
> For example, this expression:
>
> v = a & (a-1);
>
> will remove the lowest bit set from 'a' very cheaply. So what you can
> do is something like this:
>
> void for_each_bit_in_mask(u64 mask)
> {
> while (mask) {
> u64 newmask = mask & (mask-1);
> u64 onebit = mask ^ newmask;
> mask = newmask;
> do_something_with(onebit);
> }
> }
>
> to do some operation on each bit set, and quite efficiently and
> without any undefined behavior or expensive shifts.
>
> But the above trick does require that you are happy to just see the
> bit *mask*, not the bit *number*. I'm not sure any of your cases match
> that.

Nice, I couldn't resist introducing such a headache in my set ;-) unfortunately
I indeed need the bit number itself most of the time.

So following your 1st advice, I should rather do something along the lines of:

nr = 0;
while (mask) {
fs = __ffs64(mask);
mask >>= fs;
mask >>= 1;
nr += fs + 1;
process_bit_nr(nr - 1);
}

And define a for_each_lock_usage_bit(usage_mask) on top of it.

Thanks a lot!