Re: [PATCH] blk-wbt: Speed up integer square root in rwb_arm_timer
From: I Hsin Cheng
Date: Sat Mar 30 2024 - 04:45:40 EST
The last email didn't follow the plain-text format, I'm sorry for that,
that's why I resend it. Sorry for the bother.
>> Additionally, why to add a second
>> implementation of int_sqrt() instead of replacing the int_sqrt()
>> implementation in lib/math/int_sqrt.c?
I was thinking about adding an alternative option first rather than
replace the whole int_sqrt() function which is used in many other
parts in the Linux kernel.
>> Since int_sqrt() does not use divisions and since int_fastsqrt() uses
>> divisions, can all CPUs supported by the Linux kernel divide numbers as
>> quickly as the CPU mentioned above?
You're right about that. Thanks for pointing out the problem, I'll try to
replace the divisions maybe with another kind of approximation method.
> The claim that it is floor(sqrt(val)) is not true.
> Trivial example:
>
> 1005117225
> sqrt() 31703.58
> int_sqrt() 30703
> int_fastsqrt() 30821
Thanks for pointing out the problem, I only compare my method with int_sqrt()
and plot the result using gnuplot, the result shown that they gave
very very close
answers, but I didn't count the error based on the integer part of
sqrt(), which is
indeed necessary. Sorry for this part. I'll check on the precision of my method
again.
Thanks for your patience and time on reviewing my patch.
Best Regards,
I Hsin Cheng.
On Sat, Mar 30, 2024 at 4:29 PM 鄭以新 <richard120310@xxxxxxxxx> wrote:
>
> >> Additionally, why to add a second
> >> implementation of int_sqrt() instead of replacing the int_sqrt()
> >> implementation in lib/math/int_sqrt.c?
>
> I was thinking about adding an alternative option first rather than
> replace the whole int_sqrt() function which is used in many other
> parts in the Linux kernel.
>
> >> Since int_sqrt() does not use divisions and since int_fastsqrt() uses
> >> divisions, can all CPUs supported by the Linux kernel divide numbers as
> >> quickly as the CPU mentioned above?
>
> You're right about that. Thanks for pointing out the problem, I'll try to
> replace the divisions maybe with another kind of approximation method.
>
> > The claim that it is floor(sqrt(val)) is not true.
> > Trivial example:
> >
> > 1005117225
> > sqrt() 31703.58
> > int_sqrt() 30703
> > int_fastsqrt() 30821
>
> Thanks for pointing out the problem, I only compare my method with int_sqrt()
> and plot the result using gnuplot, the result shown that they gave very very close
> answers, but I didn't count the error based on the integer part of sqrt(), which is
> indeed necessary. Sorry for this part. I'll check on the precision of my method
> again.
>
> Thanks for your patience and time on reviewing my patch.
>
> Best Regards,
>
> I Hsin Cheng.
>
> Jens Axboe <axboe@xxxxxxxxx> 於 2024年3月30日 週六 上午3:12寫道:
>>
>> On 3/29/24 12:15 PM, Bart Van Assche wrote:
>> > On 3/29/24 2:12 AM, I Hsin Cheng wrote:
>> >> As the result shown, the origin version of integer square root, which is
>> >> "int_sqrt" takes 35.37 msec task-clock, 1,2181,3348 cycles, 1,6095,3665
>> >> instructions, 2551,2990 branches and causes 1,0616 branch-misses.
>> >>
>> >> At the same time, the variant version of integer square root, which is
>> >> "int_fastsqrt" takes 33.96 msec task-clock, 1,1645,7487 cyclces,
>> >> 5621,0086 instructions, 321,0409 branches and causes 2407 branch-misses.
>> >> We can clearly see that "int_fastsqrt" performs faster and better result
>> >> so it's indeed a faster invariant of integer square root.
>> >
>> > I'm not sure that a 4% performance improvement is sufficient to
>> > replace the int_sqrt() implementation. Additionally, why to add a
>> > second implementation of int_sqrt() instead of replacing the
>> > int_sqrt() implementation in lib/math/int_sqrt.c?
>>
>> That's the real question imho - if provides the same numbers and is
>> faster, why have two?
>>
>> I ran a quick test because I was curious, and the precision is
>> definitely worse. The claim that it is floor(sqrt(val)) is not true.
>> Trivial example:
>>
>> 1005117225
>> sqrt() 31703.58
>> int_sqrt() 30703
>> int_fastsqrt() 30821
>>
>> whether this matters, probably not, but then again it's hard to care
>> about a slow path sqrt calculation. I'd certainly err on the side of
>> precision for that.
>>
>> --
>> Jens Axboe
>>