Re: [PATCH] tcp_cubic: use 32 bit math

From: Willy Tarreau
Date: Sat Mar 10 2007 - 06:53:22 EST


On Wed, Mar 07, 2007 at 07:51:35PM -0800, David Miller wrote:
> From: Stephen Hemminger <shemminger@xxxxxxxxxxxxxxxxxxxx>
> Date: Wed, 07 Mar 2007 19:10:47 -0800
>
> > David Miller wrote:
> > > What about Willy Tarreau's supposedly even faster variant?
> > > Or does this incorporate that set of improvements?
> > >
> > That's what this is:
> > x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3;
>
> Great, thanks for the clarification.

Oh BTW, I have a newer version with a first approximation of the
cbrt() before the div64_64, which allows us to reduce from 3 div64
to only 2 div64. This results in a version which is twice as fast
as the initial one (ncubic), but with slightly less accuracy (0.286%
compared to 0.247). But I see that other functions such as hcbrt()
had a 1.5% avg error, so I think this is not dramatic.

Also, I managed to remove all other divides, to be kind with CPUs
having a slow divide instruction or no divide at all. Since we compute
on limited range (22 bits), we can multiply then shift right. It shows
me even slightly better time on pentium-m and athlon, with a slightly
higher avg error (0.297% compared to 0.286%), and slightly smaller
code.

I just have to clean experiments from my code to provide a patch.
David, Stephen, are you interested ?

$ ./bictcp
fls(0)=0, fls(1)=1, fls(256)=9
Calibrating
Function clocks mean(us) max(us) std(us) Avg error
bictcp 936 0.61 24.28 1.99 0.172%
ocubic 886 0.57 23.51 3.18 0.274%
ncubic 644 0.42 16.59 2.18 0.247%
ncubic32 444 0.29 11.47 1.50 0.247%
ncubic32_1 444 0.29 11.56 1.88 0.238%
ncubic32b3 337 0.22 8.67 0.88 0.286%
ncubic_ndiv3 329 0.21 8.46 0.69 0.297%
acbrt 707 0.46 18.05 0.80 0.275%
hcbrt 644 0.42 16.44 0.51 1.580%


Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/