Re: [PATCH] x86: generic versions of find_first_(zero_)bit, convert i386

From: Alexander van Heukelum
Date: Sun Apr 06 2008 - 14:52:03 EST

On Sun, 6 Apr 2008 10:03:43 -0700 (PDT), "dean gaudet" <dean@xxxxxxxxxx>
> fwiw there's a way to do ffz / ntz which can do lg(n) conditional moves in
> parallel... i'm not sure what (non-x86) architectures this might be best
> on, but it might be a good choice for the generic code... although maybe
> the large number of constants required will be a burden on RISC
> processors.

Hello Dean,

The current generic implementation of ffz is O(lg(n)) already, but
the version you suggest might indeed be a bit faster if the compiler
recognises that is can use conditional moves and the architecture
can handle large constants efficiently.

On the other had, the bit-search functions tend to be avoided as
much as possible, because they are often not implemented as a
hardware instruction and even if they are implemented in hardware,
they might be slow. The generic version is slow anyhow. That's
why the bitmap searches first test if a word in the bitmap is
all-0-bits/all-1-bits. The single-word version of ffz might even
be better off if it was optimized for size instead of being fully

> take a look at figure 5-17 here
> int ntz(unsigned x) {
> unsigned y, bz, b4, b3, b2, b1, b0;
> y = x & -x; // Isolate rightmost 1-bit.
> bz = y ? 0 : 1; // 1 if y = 0.
> b4 = (y & 0x0000FFFF) ? 0 : 16;
> b3 = (y & 0x00FF00FF) ? 0 : 8;
> b2 = (y & 0x0F0F0F0F) ? 0 : 4;
> b1 = (y & 0x33333333) ? 0 : 2;
> b0 = (y & 0x55555555) ? 0 : 1;
> return bz + b4 + b3 + b2 + b1 + b0;
> }

Note: mask32 = ~0ul; mask16 = mask32 ^ (mask32 << 16), mask8 = ...

> -dean
Alexander van Heukelum

-- - Or how I learned to stop worrying and
love email again

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at