Re: [PATCH v7 08/13] iio: afe: rescale: fix precision on fractional log scale

From: Liam Beguin
Date: Sun Aug 15 2021 - 18:14:58 EST

Next message: kernel test robot: "include/asm-generic/uaccess.h:287:16: sparse: sparse: incorrect type in argument 1 (different address spaces)"
Previous message: Nikolai Zhubr: "Re: [PATCH 0/6] x86: PIRQ/ELCR-related fixes and updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon Aug 2, 2021 at 5:17 AM EDT, Peter Rosin wrote:
> On 2021-08-01 21:39, Liam Beguin wrote:
> > From: Liam Beguin <lvb@xxxxxxxxxx>
> >
> > The IIO_VAL_FRACTIONAL_LOG2 scale type doesn't return the expected
> > scale. Update the case so that the rescaler returns a fractional type
> > and a more precise scale.
> >
> > Signed-off-by: Liam Beguin <lvb@xxxxxxxxxx>
> > ---
> > drivers/iio/afe/iio-rescale.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iio/afe/iio-rescale.c b/drivers/iio/afe/iio-rescale.c
> > index abd7ad73d1ce..e37a9766080c 100644
> > --- a/drivers/iio/afe/iio-rescale.c
> > +++ b/drivers/iio/afe/iio-rescale.c
> > @@ -47,12 +47,17 @@ int rescale_process_scale(struct rescale *rescale, int scale_type,
> > *val2 = rescale->denominator;
> > return IIO_VAL_FRACTIONAL;
> > case IIO_VAL_FRACTIONAL_LOG2:
> > - tmp = *val * 1000000000LL;
> > - do_div(tmp, rescale->denominator);
> > - tmp *= rescale->numerator;
> > - do_div(tmp, 1000000000LL);
> > + if (check_mul_overflow(*val, rescale->numerator, (s32 *)&tmp) ||
> > + check_mul_overflow(rescale->denominator, (1 << *val2), (s32 *)&tmp2)) {
> > + tmp = (s64)*val * rescale->numerator;
> > + tmp2 = (s64)rescale->denominator * (1 << *val2);
> > + factor = gcd(abs(tmp), abs(tmp2));
> > + tmp = div_s64(tmp, factor);
> > + tmp2 = div_s64(tmp2, factor);

Hi Peter,

Apologies for the delay, I got caught up on some other work.

>
> The case I really worry about is when trying to get an exact result by
> using
> gcd() really doesn't improve the situation, and the only way to avoid
> overflow
> is to reduce the precision. A perhaps contrived example:
>
> scale numerator 1,220,703,125 i.e. 5 ^ 13
> scale denominator 1,162,261,467 i.e. 3 ^ 19
> *val 1,129,900,996 i.e. 7 ^ 10 * 2 ^ 2
> *val2 2 i.e. value = 7 ^ 10
>
> Then you get overflow for both the calls to check_mul_overflow(). But
> when gcd()
> returns 1 (or something too small) the overflow is "returned" as-is.

I was aware of the issue when gcd() returns 1 and thought it would be
unlikely enough to not be an issue, but as you pointed out there's also
cases where it returns something that's not good enough to take care of
the overflow. This is unfortunately more likely to happen, and makes it
impossible to ignore.

>
> With the old code you get something that is at least not completely
> wrong, just
> not as accurate as is perhaps possible:
> *val 1,186,715,480
> *val2 2
> Or 1,186,715,480 / 2^2 = 296,678,870.
>
> With this patch the above makes you attempt to return the fraction:
> *val 1,379,273,676,757,812,500
> *val2 4,649,045,868
> Or 296,678,870.443403528 (or something like that, not 100% sure about
> all the
> fractional digits, but they are not really important for my argument)
>
> While the latter is more correct, truncation to 32-bit clobbers the
> result so
> in reality this is returned:
> *val -281,918,188
> *val2 354,078,572
> Or -0.796202341
>
> So, while it might seem unlucky that gcd() will not find a big enough
> factor,
> it is certainly possible. And I also worry that when this happens it
> will only
> happen once in a while, and that the resulting bad values might be
> extremely
> unexpected and difficult to track down. Things that happen once in a
> blue moon
> are simply not fun to debug.
>
> I.e. I worry that small islands of input will cause failures. With the
> old code
> there are no such islands. The scale factor alone determines the
> precision, and
> if you get poor precision you get poor precision throughout the range.
> And any
> problem will therefore be "stable" and much easier to debug for
> "innocent" 3rd
> party users that may not even be aware that the rescaler is involved at
> all.

I agree with you, that such islands are a bad thing that might cause a
lot of pain, and it's probably not worth it just to gain a few digits of
precision (that can sometimes be irrelevant).

I'll drop this change and will update the test cases to take into
account an error margin.

>
> This is also an issue I have with patch 7/13, but there the only thing
> that is
> sacrificed is CPU cycles. But nonetheless, I'm dubious if patch 7/13 is
> wise
> precisely because it might cause issues that are intermittent and
> therefore
> difficult to debug.

Again, I agree with you, patch 7/13 has the same limitations,
unfortunately, I did run into an overflow while testing this on a real
setup.

>
> Also, changing the calculation so that you get more precision whenever
> that is
> possible feels dangerous. I fear linearity breaks and that bigger input
> cause
> smaller output due to rounding if the bigger value has to be rounded
> down, but
> that this isn't done carefully enough. I.e. attempting to return an
> exact
> fraction and only falling back to the old code when that is not possible
> is
> still not safe since the old code isn't careful enough about rounding. I
> think
> it is really important that bigger input cause bigger (or equal) output.
> Otherwise you might trigger instability in feedback loops should a
> rescaler be
> involved in a some regulator function.

I see what you mean here, and it's a good point I hadn't considered.

To address some of these concerns, I was thinking of using consecutive
right shifts instead of gcd(), but that seems like the wrong way to go
given that we're working with signed integers.

For 7/13, I'll look into approximating like you did here originally.

Thanks,
Liam

>
> Cheers,
> Peter
>
> > + }
> > *val = tmp;
> > - return scale_type;
> > + *val2 = tmp2;
> > + return IIO_VAL_FRACTIONAL;
> > case IIO_VAL_INT_PLUS_NANO:
> > case IIO_VAL_INT_PLUS_MICRO:
> > if (scale_type == IIO_VAL_INT_PLUS_NANO)
> >

Next message: kernel test robot: "include/asm-generic/uaccess.h:287:16: sparse: sparse: incorrect type in argument 1 (different address spaces)"
Previous message: Nikolai Zhubr: "Re: [PATCH 0/6] x86: PIRQ/ELCR-related fixes and updates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]