Re: [RFC][PATCH] Faster generic_fls

From: Linus Torvalds (torvalds@transmeta.com)
Date: Thu May 01 2003 - 11:02:13 EST


On Thu, 1 May 2003, Willy Tarreau wrote:
>
> BTW, has someone benchmarked BSF/BSR on x86 ? It should be
> faster, it it's also possible that a poor microcode implements it with a one
> bit/cycle algo, which will result in one instruction not being as fast as your
> code.

I think the original i386 did it with a one-bit-per-cycle algorithm,
anything since should be fine. In particular, on a P4 where I just tested,
the bsf seems to be 4 cycles over the whole input set (actually, my whole
loop was 4 cycles per iteration, so 4 cycles is worst-case. I'm assuming
the rest could have been done in parallell).

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 07 2003 - 22:00:13 EST