Re: [patch V2] lib: GCD: add binary GCD algorithm
From: Andrew Morton
Date: Wed Apr 27 2016 - 16:09:05 EST
On Wed, 27 Apr 2016 16:05:50 +0800 zengzhaoxiu@xxxxxxx wrote:
> From: Zhaoxiu Zeng <zhaoxiu.zeng@xxxxxxxxx>
>
> Because some architectures (alpha, armv6, etc.) don't provide hardware division,
> the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations,
> it replaces division with arithmetic shifts, comparisons, and subtraction.
>
> I have compiled successfully with x86_64_defconfig and i386_defconfig.
>
> I use the following code to test:
>
> ...
>
> Compiled with "-O2", on "VirtualBox 4.2.0-35-generic #40-Ubuntu x86_64" got:
>
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 92281
> gcd1: elapsed 55005
> gcd2: elapsed 91088
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 115546
> gcd1: elapsed 55928
> gcd2: elapsed 91938
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91189
> gcd1: elapsed 55493
> gcd2: elapsed 90078
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 157364
> gcd1: elapsed 55204
> gcd2: elapsed 90058
> zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd
> gcd0: elapsed 91667
> gcd1: elapsed 54641
> gcd2: elapsed 91364
Can you please include a summary which describes the overall impact of
the patch? eg, how faster does a%b become on <some architecture>
> --- a/lib/gcd.c
> +++ b/lib/gcd.c
> @@ -2,19 +2,82 @@
> #include <linux/gcd.h>
> #include <linux/export.h>
>
> -/* Greatest common divisor */
> +/*
> + * use __ffs if the CPU has efficient __ffs
> + */
> +#if (defined(CONFIG_ALPHA) && defined(CONFIG_ALPHA_EV6) && defined(CONFIG_ALPHA_EV67)) || \
> + defined(CONFIG_ARC) || \
> + (defined(CONFIG_ARM) && __LINUX_ARM_ARCH__ >= 5) || defined(CONFIG_ARM64) || \
> + defined(CONFIG_AVR32) || \
> + defined(CONFIG_BLACKFIN) || \
> + defined(CONFIG_C6X) || \
> + defined(CONFIG_CRIS) || \
> + defined(CONFIG_FRV) || \
> + defined(CONFIG_HEXAGON) || \
> + defined(CONFIG_IA64) || \
> + (defined(CONFIG_M68K) && \
> + (!defined(CONFIG_CPU_HAS_NO_BITFIELDS) || \
> + ((defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
> + !defined(CONFIG_M68000) && !defined(CONFIG_MCPU32)))) || \
> + defined(CONFIG_MN10300) || \
> + defined(CONFIG_OPENRISC) || \
> + defined(CONFIG_POWERPC) || \
> + defined(CONFIG_S390) || \
> + defined(CONFIG_TILE) || \
> + defined(CONFIG_UNICORE32) || \
> + defined(CONFIG_X86) || \
> + defined(CONFIG_XTENSA)
> +# define USE_FFS 1
> +#elif defined(CONFIG_MIPS)
> +# define USE_FFS (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz)
> +#else
> +# define USE_FFS 0
> +#endif
Yikes.
Normally we'd create a new Kconfig variable and do this in the
individual arch/XXX/Kconfig files.
Can we just assume CONFIG_ARCH_HAS_SLOW_FFS is false and then set it to
true for MIPS, arm, etc?
> +/*
> + * This implements the binary GCD algorithm. (Often attributed to Stein,
> + * but as Knith has noted, appears a first-century Chinese math text.)
> + */
Knith might be Knuth?
> unsigned long gcd(unsigned long a, unsigned long b)
> {
> - unsigned long r;
> + unsigned long r = a | b;
> +
> + if (!a || !b)
> + return r;