Re: [patch V2] lib: GCD: add binary GCD algorithm

From: George Spelvin
Date: Wed Apr 27 2016 - 12:20:41 EST


I replicated your results in 32- and 64-bit x86:

- If __ffs (gcc's __builtin_ctz) is available, the basic
(non-even/odd) binary GCD algorithm is faster than the
divsion-based or even/odd.
- If shifting down has to be done in a loop, the even/odd
binary algorithm is fastest.

I tried a few variants, but couldn't consistently beat your code.
The following variant, with the loop test at the end, seemed to do better
in some cases, but it's not consistent enought to be sure:

static unsigned long gcd5(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __builtin_ctzl(b);

do {
a >>= __builtin_ctzl(a);
if (a < b)
swap(a, b);
a -= b;
} while (a);
return b << __builtin_ctzl(r);
}


The frequent "if (USE_FFS)" conditions seem to clutter the code.
Would it be easier to read if the two variants were just separated?
The function is short enough that the duplication isn't too severe.

Such as the following.

A few source cleanups (included for example; delete if you don't
like them):
- Return directly from the loop, rather than using break().
- Use "r &= -r" mostly because it's clearer.
Signed-off-by: George Spelvin <linux@xxxxxxxxxxx>

/*
* This implements the binary GCD algorithm. (Often attributed to Stein,
* but as Knuth has noted, appears a first-century Chinese math text.)
*
* This is faster than the division-based algorithm even on x86, which
* has decent hardware division.
*/
#if USE_FFS
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
unsigned long gcd(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

b >>= __ffs(b);

for (;;) {
a >>= __ffs(a);
if (a == b)
return a << __ffs(r);
if (a < b)
swap(a, b);
a -= b;
}
}

#else /* !USE_FFS */

/* If normalization is done by loops, the even/odd algorithm is a win. */
unsigned long gcd(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
return r;

r &= -r; /* Isolate lsbit of r */

while (!(b & r))
b >>= 1;

for (;;) {
while (!(a & r))
a >>= 1;
if (a == b)
return a;
if (a < b)
swap(a, b);
a -= b;
a >>= 1;
if (a & r)
a += b;
a >>= 1;
}
}
#endif