Re: [PATCH v2 2/3] lib/string_helpers.c: don't lose precision in string_get_size()
From: Vitaly Kuznetsov
Date: Thu Oct 29 2015 - 09:51:42 EST
Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> writes:
> On Tue, Oct 27 2015, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>
>> string_get_size() loses precision when there is a remainder for
>> blk_size / divisor[units] and size is big enough. E.g
>> string_get_size(8192, 4096, STRING_UNITS_10, ...) returns "32.7 MB"
>> while it is supposed to return "33.5 MB". For some artificial inputs
>> the result can be ridiculously wrong, e.g.
>> string_get_size(3000, 1900, STRING_UNITS_10, ...) returns "3.00 MB"
>> when "5.70 MB" is expected.
>>
>> The issues comes from the fact than we through away
>> blk_size / divisor[units] remainder when size is > exp. This can be fixed
>> by saving it and doing some non-trivial calculations later to fix the error
>> but that would make this function even more cumbersome. Slightly re-factor
>> the function to not lose the precision for all inputs.
>>
>> The overall complexity of this function comes from the fact that size can
>> be huge and we don't want to do size * blk_size as it can overflow. Do the
>> math in two steps:
>> 1) Reduce size to something < blk_size * divisor[units]
>> 2) Multiply the result (and the remainder) by blk_size and do final
>> calculations.
>>
>> Reported-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> Changes since v1:
>> - Check against blk_size == 0 [Rasmus Villemoes]
>> - Do not rename 'i' to 'order' [Andy Shevchenko]
>> ---
>> lib/string_helpers.c | 37 ++++++++++++++++++++++++-------------
>> 1 file changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
>> index f6c27dc..eba8e82 100644
>> --- a/lib/string_helpers.c
>> +++ b/lib/string_helpers.c
>> @@ -44,7 +44,8 @@ void string_get_size(u64 size, u32 blk_size, const enum string_size_units units,
>> [STRING_UNITS_2] = 1024,
>> };
>> int i, j;
>> - u32 remainder = 0, sf_cap, exp;
>> + u64 remainder = 0;
>> + u32 sf_cap;
>> char tmp[8];
>> const char *unit;
>>
>> @@ -53,28 +54,36 @@ void string_get_size(u64 size, u32 blk_size, const enum string_size_units units,
>> if (!size)
>> goto out;
>>
>> - while (blk_size >= divisor[units]) {
>> - remainder = do_div(blk_size, divisor[units]);
>> - i++;
>> + if (!blk_size) {
>> + WARN_ON(1);
>> + size = 0;
>> + goto out;
>> }
>
> I don't think we need to handle that, but if you want, please put the
> WARN inside the conditional (so say "if (WARN_ON(!blk_size)) {...}". And
> don't make it _ONCE; the ordinary version is slightly cheaper (since
> there's no static bool __warned and no code to manage that).
Ok, I'll do that in a separate patch and hope Andy is not going to
complain about not adding _ONCE.
>
>>
>> - exp = divisor[units] / blk_size;
>> /*
>> - * size must be strictly greater than exp here to ensure that remainder
>> - * is greater than divisor[units] coming out of the if below.
>> + * size can be huge and doing size * blk_size right away can overflow.
>> + * As a first step reduce huge size to something less than
>> + * blk_size * divisor[units].
>> */
>> - if (size > exp) {
>> + while (size > (u64)blk_size * divisor[units]) {
>
> It seems weird that we reduce _more_ for smaller block sizes - indeed,
> for blk_size==1 we end up reducing size to <= 1000 (or 1024), which is
> certainly a good way to lose some precision. Also, this relies on
> blk_size being smaller than (roughly) sqrt(U64_MAX/divisor[units]),
> which is of course true in practice, but a slightly annoying implicit
> assumption.
>
> I do think that my approach of reducing size till it's smaller than
> U64_MAX/blk_size is simpler and better. There's much less fixup
> code. It's simply
>
> while (size > div_u64(U64_MAX, blk_size) {
> do_div(size, divisor[units]);
> ++i;
> }
> size *= blk_size;
> while (size > divisor[units]) {
> remainder = do_div(size, divisor[units]);
> ++i;
> }
>
> which is as self-explaining as it gets.
>
OK, let me borrow this idea from you and change Reported-by: tag with
Suggested-by:. I'll test and send v3.
> And yes, one needs to use the include/linux/math64.h functions/macros
> for u64/u32 divisions - otherwise I'm pretty sure one will get friendly
> mails from the build bot.
>
> while (size > )
>> remainder = do_div(size, divisor[units]);
>> - remainder *= blk_size;
>> i++;
>> - } else {
>> - remainder *= size;
>> }
>>
>> + /* Now we're OK with doing size * blk_size, it won't overflow. */
>> size *= blk_size;
>> + remainder *= blk_size;
>> + /*
>> + * We were doing partial multiplication by blk_size.
>> + * remainder >= divisor[units] here means size should be increased.
>> + */
>> size += remainder / divisor[units];
>> - remainder %= divisor[units];
>> + remainder -= (remainder / divisor[units]) * divisor[units];
>>
>> + /*
>> + * Normalize. size >= divisor[units] means we still have enough
>> + * precision and dropping remainder is fine.
>> + */
>> while (size >= divisor[units]) {
>> remainder = do_div(size, divisor[units]);
>> i++;
>> @@ -87,7 +96,8 @@ void string_get_size(u64 size, u32 blk_size, const enum string_size_units units,
>> if (j) {
>> remainder *= 1000;
>> remainder /= divisor[units];
>> - snprintf(tmp, sizeof(tmp), ".%03u", remainder);
>> + /* remainder is < divisor[units] here, (u32) is legit */
>
> What is actually important is that remainder is < 1000. remainder was
> initially < divisor[units], but then the multiplication and division
> transformed that into "1/1000s" of whatever unit we're using.
>
>> + snprintf(tmp, sizeof(tmp), ".%03u", (u32)remainder);
>> tmp[j+1] = '\0';
>> }
>>
>> @@ -97,6 +107,7 @@ void string_get_size(u64 size, u32 blk_size, const enum string_size_units units,
>> else
>> unit = units_str[units][i];
>>
>> + /* size is < divisor[units] here, (u32) is legit */
>> snprintf(buf, len, "%u%s %s", (u32)size,
>> tmp, unit);
>> }
--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/