Re: [PATCH 2/5] bitops: compile time optimization forhweight_long(CONSTANT)

From: Borislav Petkov
Date: Thu Feb 04 2010 - 10:10:47 EST


On Wed, Feb 03, 2010 at 11:49:54AM -0800, H. Peter Anvin wrote:
> On 02/03/2010 10:47 AM, Peter Zijlstra wrote:
> > On Wed, 2010-02-03 at 19:14 +0100, Borislav Petkov wrote:
> >
> >> alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT)
> >
> > Make sure to apply a 0xff bitmask to the popcnt r16 call for hweight8(),
> > and hweight64() needs a bit of magic for 32bit, but yes, something like
> > that ought to work nicely.
> >
>
> Arguably the "best" option is to have the alternative being a jump to an
> out-of-line stub which does the necessary parameter marshalling before
> calling a stub. This technique is already used in a few other places.

Ok, here's a first alpha prototype and completely untested. The asm
output looks ok though. I've added separate 32-bit and 64-bit helpers in
order to dispense with the if-else tests. The hw-popcnt versions are the
opcodes for "popcnt %eax, %eax" and "popcnt %rax, %rax", respectively,
so %rAX has to be preloaded with the bitmask and the computed value
has to be retrieved from there afterwards. And yes, it looks not that
elegant so I'm open for suggestions.

The good thing is, this should work on any toolchain since we don't rely
on the compiler to know about popcnt and we're protected by CPUID flag
so that the hw-popcnt version is used only on processors which support
it.

Please take a good look and let me know what do you guys think.

Thanks.

--
arch/x86/include/asm/bitops.h | 4 ++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/popcnt.c | 62 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/lib/popcnt.c

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 02b47a6..deb5013 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -434,6 +434,10 @@ static inline int fls(int x)
#endif
return r + 1;
}
+
+
+extern int arch_hweight_long(unsigned long);
+
#endif /* __KERNEL__ */

#undef ADDR
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index cffd754..c03fe2d 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,7 @@ lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_KPROBES) += insn.o inat.o

-obj-y += msr.o msr-reg.o msr-reg-export.o
+obj-y += msr.o msr-reg.o msr-reg-export.o popcnt.o

ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
diff --git a/arch/x86/lib/popcnt.c b/arch/x86/lib/popcnt.c
new file mode 100644
index 0000000..179a6e8
--- /dev/null
+++ b/arch/x86/lib/popcnt.c
@@ -0,0 +1,62 @@
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+
+int _hweight32(void)
+{
+ unsigned long w;
+
+ asm volatile("" : "=a" (w));
+
+ return hweight32(w);
+}
+
+int _hweight64(void)
+{
+ unsigned long w;
+
+ asm volatile("" : "=a" (w));
+
+ return hweight64(w);
+}
+
+int _popcnt32(void)
+{
+
+ unsigned long w;
+
+ asm volatile(".byte 0xf3\n\t.byte 0x0f\n\t.byte 0xb8\n\t.byte 0xc0\n\t"
+ : "=a" (w));
+
+ return w;
+}
+
+int _popcnt64(void)
+{
+
+ unsigned long w;
+
+ asm volatile(".byte 0xf3\n\t.byte 0x48\n\t.byte 0x0f\n\t."
+ "byte 0xb8\n\t.byte 0xc0\n\t"
+ : "=a" (w));
+
+ return w;
+}
+
+int arch_hweight_long(unsigned long w)
+{
+ if (sizeof(w) == 4) {
+ asm volatile("movl %[w], %%eax" :: [w] "r" (w));
+ alternative("call _hweight32",
+ "call _popcnt32",
+ X86_FEATURE_POPCNT);
+ asm volatile("" : "=a" (w));
+
+ } else {
+ asm volatile("movq %[w], %%rax" :: [w] "r" (w));
+ alternative("call _hweight64",
+ "call _popcnt64",
+ X86_FEATURE_POPCNT);
+ asm volatile("" : "=a" (w));
+ }
+ return w;
+}
--
1.6.6

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/