Re: [PATCH v2 2/3] lib/string_helpers.c: don't lose precision in string_get_size()
From: Vitaly Kuznetsov
Date: Tue Oct 27 2015 - 12:16:27 EST
Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> writes:
> On Tue, 2015-10-27 at 15:06 +0100, Vitaly Kuznetsov wrote:
>> string_get_size() loses precision when there is a remainder for
>> blk_size / divisor[units] and size is big enough. E.g
>> string_get_size(8192, 4096, STRING_UNITS_10, ...) returns "32.7 MB"
>> while it is supposed to return "33.5 MB". For some artificial inputs
>> the result can be ridiculously wrong, e.g.
>> string_get_size(3000, 1900, STRING_UNITS_10, ...) returns "3.00 MB"
>> when "5.70 MB" is expected.
>>
>> The issues comes from the fact than we through away
>> blk_size / divisor[units] remainder when size is > exp. This can be
>> fixed
>> by saving it and doing some non-trivial calculations later to fix the
>> error
>> but that would make this function even more cumbersome. Slightly re-
>> factor
>> the function to not lose the precision for all inputs.
>>
>> The overall complexity of this function comes from the fact that size
>> can
>> be huge and we don't want to do size * blk_size as it can overflow.
>> Do the
>> math in two steps:
>> 1) Reduce size to something < blk_size * divisor[units]
>> 2) Multiply the result (and the remainder) by blk_size and do final
>> ÂÂÂcalculations.
>>
>> Reported-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> Changes since v1:
>> - Check against blk_size == 0 [Rasmus Villemoes]
>> - Do not rename 'i' to 'order' [Andy Shevchenko]
>> ---
>> Âlib/string_helpers.c | 37 ++++++++++++++++++++++++-------------
>> Â1 file changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
>> index f6c27dc..eba8e82 100644
>> --- a/lib/string_helpers.c
>> +++ b/lib/string_helpers.c
>> @@ -44,7 +44,8 @@ void string_get_size(u64 size, u32 blk_size, const
>> enum string_size_units units,
>> Â [STRING_UNITS_2] = 1024,
>> Â };
>> Â int i, j;
>> - u32 remainder = 0, sf_cap, exp;
>> + u64 remainder = 0;
>> + u32 sf_cap;
>> Â char tmp[8];
>> Â const char *unit;
>> Â
>> @@ -53,28 +54,36 @@ void string_get_size(u64 size, u32 blk_size,
>> const enum string_size_units units,
>> Â if (!size)
>> Â goto out;
>> Â
>> - while (blk_size >= divisor[units]) {
>> - remainder = do_div(blk_size, divisor[units]);
>> - i++;
>> + if (!blk_size) {
>> + WARN_ON(1);
>
> Hmm... Isn't it too strong? WARN_ONCE() might reduce a noise. Or even
> pr_warn_once/ratelimited().
Nobody is supposed to call string_get_size() with blk_size = 0, if
someone does that - it is a bug and that's what WARN_ON is supposed to
report. I'm OK with changing it to WARN_ONCE() but I don't see a big
difference - nobody's calling string_get_size() in a loop, one/two calls
per one storage device is expected.
>
>> + size = 0;
>> + goto out;
>> Â }
>
> What about doing it before if (!size) ?
>
> LikeÂ
>
> if (!blk_size) {
> Âpr_warn_once(); /* or WARN_ONCE() ? */
> Â/* Override size to follow error path */
> Âsize = 0;
> }
> Â
> if (!size)
To be honest I don't see a big difference but I'm fine with the change :-)
>
> Also, would it be separate patch?
>
I'm significantly changing the algorithm here and I didn't want to
introduce an infinite loop with blk_size = 0 but I've just checked and
previous version had division by zero here so yes, I can make it a
separate patch.
>> Â
>> - exp = divisor[units] / blk_size;
>> Â /*
>> - Â* size must be strictly greater than exp here to ensure
>> that remainder
>> - Â* is greater than divisor[units] coming out of the if
>> below.
>> + Â* size can be huge and doing size * blk_size right away can
>> overflow.
>> + Â* As a first step reduce huge size to something less than
>> + Â* blk_size * divisor[units].
>> Â Â*/
>> - if (size > exp) {
>> + while (size > (u64)blk_size * divisor[units]) {
>> Â remainder = do_div(size, divisor[units]);
>> - remainder *= blk_size;
>> Â i++;
>> - } else {
>> - remainder *= size;
>> Â }
>> Â
>> + /* Now we're OK with doing size * blk_size, it won't
>> overflow. */
>> Â size *= blk_size;
>> + remainder *= blk_size;
>> + /*
>> + Â* We were doing partial multiplication by blk_size.
>> + Â* remainder >= divisor[units] here means size should be
>> increased.
>> + Â*/
>> Â size += remainder / divisor[units];
>
>> - remainder %= divisor[units];
>> + remainder -= (remainder / divisor[units]) * divisor[units];
>
> I'm sorry I didn't get what the purpose of change here.
>
> (Yes, I was thinking about u64 on 32-bit architecture, but % and / are
> working in the similar way aren't they?)
Thanks for noticing, there is no functional change here, it just made
the code easier to understand (for me only?). I'm OK with reverting it
to '%='.
>
>> Â
>> + /*
>> + Â* Normalize. size >= divisor[units] means we still have
>> enough
>> + Â* precision and dropping remainder is fine.
>> + Â*/
>> Â while (size >= divisor[units]) {
>> Â remainder = do_div(size, divisor[units]);
>> Â i++;
>> @@ -87,7 +96,8 @@ void string_get_size(u64 size, u32 blk_size, const
>> enum string_size_units units,
>> Â if (j) {
>> Â remainder *= 1000;
>> Â remainder /= divisor[units];
>> - snprintf(tmp, sizeof(tmp), ".%03u", remainder);
>> + /* remainder is < divisor[units] here, (u32) is
>> legit */
>> + snprintf(tmp, sizeof(tmp), ".%03u", (u32)remainder);
>> Â tmp[j+1] = '\0';
>> Â }
>> Â
>> @@ -97,6 +107,7 @@ void string_get_size(u64 size, u32 blk_size, const
>> enum string_size_units units,
>> Â else
>> Â unit = units_str[units][i];
>> Â
>> + /* size is < divisor[units] here, (u32) is legit */
>> Â snprintf(buf, len, "%u%s %s", (u32)size,
>> Â Âtmp, unit);
>> Â}
--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/