Re: [PATCH v3 1/4] lib/string_helpers: change blk_size to u32 for string_get_size() interface
From: James Bottomley
Date: Fri Oct 30 2015 - 20:21:57 EST
On Fri, 2015-10-30 at 11:46 +0100, Vitaly Kuznetsov wrote:
> James Bottomley <jbottomley@xxxxxxxx> writes:
>
> > On Thu, 2015-10-29 at 17:30 +0100, Vitaly Kuznetsov wrote:
> >> string_get_size() can't really handle huge block sizes, especially
> >> blk_size > U32_MAX but string_get_size() interface states the opposite.
> >> Change blk_size from u64 to u32 to reflect the reality.
> >
> > What is the actual evidence for this? The calculation is designed to be
> > a symmetric 128 bit multiply. When I wrote and tested it, it worked
> > fine for huge block sizes.
>
> We have 'u32 remainder' and then we do:
>
> exp = divisor[units] / (u32)blk_size;
> ...
> remainder = do_div(size, divisor[units]);
> remainder *= blk_size;
>
> I'm pretty sure it will overflow for some inputs.
It shouldn't; the full code snippet does this:
while (blk_size >= divisor[units]) {
remainder = do_div(blk_size, divisor[units]);
i++;
}
exp = divisor[units] / (u32)blk_size;
So by the time it reaches the statement you complain about, blk_size is
already less than or equal to the divisor (which is 1000 or 1024) so
truncating to 32 bits is always correct.
I'm sort of getting the impression you don't quite understand the
mathematics: i is the logarithm to the base divisor[units]. We reduce
both operands to exponents of the logarithm base (adding the two bases
together in i), which means they are by definition in a range between
zero and the base and then multiply the remaining exponents correcting
the result for a base overflow (so the result is always a correct
exponent and i is the logarithm to the base). It's actually simply
Napier's algorithm.
The reason we're getting the up to 2.5% rounding errors you complain
about is because at each logarithm until the last one, we throw away the
remainder (it's legitimate because it's always 1000x smaller than the
exponent), but in the case of a large remainder it provides a small
correction to the final operation which we don't account for. If you
want to make a true correction, you save the penultimate residue in each
case, multiply each by the *other* exponent add them together, divide by
the base and increment the final result by the remainder.
However, for 2.5% the physicist in me says the above is way overkill.
James
> >
> > James
> >
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> >> ---
> >> include/linux/string_helpers.h | 2 +-
> >> lib/string_helpers.c | 4 ++--
> >> 2 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
> >> index dabe643..1223e80 100644
> >> --- a/include/linux/string_helpers.h
> >> +++ b/include/linux/string_helpers.h
> >> @@ -10,7 +10,7 @@ enum string_size_units {
> >> STRING_UNITS_2, /* use binary powers of 2^10 */
> >> };
> >>
> >> -void string_get_size(u64 size, u64 blk_size, enum string_size_units units,
> >> +void string_get_size(u64 size, u32 blk_size, enum string_size_units units,
> >> char *buf, int len);
> >>
> >> #define UNESCAPE_SPACE 0x01
> >> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
> >> index 5939f63..f6c27dc 100644
> >> --- a/lib/string_helpers.c
> >> +++ b/lib/string_helpers.c
> >> @@ -26,7 +26,7 @@
> >> * at least 9 bytes and will always be zero terminated.
> >> *
> >> */
> >> -void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
> >> +void string_get_size(u64 size, u32 blk_size, const enum string_size_units units,
> >> char *buf, int len)
> >> {
> >> static const char *const units_10[] = {
> >> @@ -58,7 +58,7 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
> >> i++;
> >> }
> >>
> >> - exp = divisor[units] / (u32)blk_size;
> >> + exp = divisor[units] / blk_size;
> >> /*
> >> * size must be strictly greater than exp here to ensure that remainder
> >> * is greater than divisor[units] coming out of the if below.
>