Re: defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

From: Neil Brown
Date: Wed Sep 23 2015 - 03:30:38 EST


Joe Perches <joe@xxxxxxxxxxx> writes:

> On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
>> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar <mlichvar@xxxxxxxxxx> wrote:
>> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno GonÃalves <nunojpg@xxxxxxxxx> wrote:
>> >>> > And just installing chrony from the feeds. With any kernel from 3.17
>> >>> > you'll have wrong estimates at chronyc sourcestats.
>> >>>
>> >>> Wrong estimates? Could you be more specific about what the failure
>> >>> you're seeing is here? The
>> >>>
>> >>> I installed the image above, which comes with a 4.1.6 kernel, and
>> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> >>> internet fairly quickly (at least according to chronyc tracking).
>> >>
>> >> To see the bug with chronyd the initial offset shouldn't be very close
>> >> to zero, so it's forced to correct the offset by adjusting the
>> >> frequency in a larger step.
>> >>
>> >> I'm attaching a simple C program that prints the frequency offset
>> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> >> adjtimex tick is set to 9000. It should show values close to -100000
>> >> ppm and I suspect on the BBB it will be much smaller.
>> >
>> > So I spent some time on this late last night and this afternoon.
>> >
>> > It was a little odd because things don't seem totally broken, but
>> > something isn't quite right.
>> >
>> > Digging around it seems the iterative logrithmic approximation done in
>> > timekeeping_freqadjust() wasn't working right. Instead of making
>> > smaller order alternating positive and negative adjustments, it was
>> > doing strange growing adjustments for the same value that wern't large
>> > enough to actually correct things very quickly. This made it much
>> > slower to adapt to specified frequency values.
>> >
>> > The odd bit, is it seems to come down to:
>> > tick_error = abs(tick_error);
>> >
>> > Haven't chased down why yet, but apparently abs() isn't doing what one
>> > would think when passed a s64 value.
>>
>> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
>> /*
>> * abs() handles unsigned and signed longs, ints, shorts and chars. For all
>> * input types abs() returns a signed long.
>> * abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()
>> * for those.
>> */
>>
>> Ouch.
>
> Here's a little cocci script that finds more of these in:

Thanks.

Maybe we should also:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410727cb..aa7d69afdcac 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -208,6 +208,7 @@ extern int _cond_resched(void);
*/
#define abs(x) ({ \
long ret; \
+ BUILD_BUG_ON(sizeof(x) > sizeof(long)); \
if (sizeof(x) == sizeof(long)) { \
long __x = (x); \
ret = (__x < 0) ? -__x : __x; \


so that people won't make the same mistake again.
That finds bugs in
driver/md/raid10.c
drivers/gpu/drm/radeon/radeon_display.c
kernel/time/clocksource.c
kernel/time/timekeeping.c
fs/ext4/mballoc.c

that your cocci scripted missed. All "abs(x - y)".

As sector_t can be 32bit and can be 64bit, I wonder if abs_sector()
would be a good idea ... probably not.

Thoughts?

NeilBrown

Attachment: signature.asc
Description: PGP signature