r_{n+1} = r_n*r_n + a * b_n * b_n;
s_{n+1} = 2 * r_n * b_n;
x_n = r_n/s_n;
Now I only need to run the division once at the end of iteration.
But my code still runs 30% slower than osf/1 libm
Don't know how can the osf/1 libm calculat sqrt, sb. said it does
not need division, anybody know the algorithm?
>
> This converges quadratically, i.e. the number of correct digits doubles
> with every iteration.
> As a starting value x_0 one can choose M*2^{E/2} if a=M*2^E, i.e. just
> divide the exponent of a by two and leave its mantissa unchanged.
>
> I don't know how fast the alpha is at divisions, if it is slow, one
> could perhaps replace the division a/x_n by a some series expansion.
>
> Greetings,
> -Gerhard
>
-- yansong chen-- YChen@uh.edu