Re: [PATCH] ntp: remove accidental integer wrap-around

From: Justin Stitt
Date: Thu May 16 2024 - 19:40:24 EST


Hi,

On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, May 07 2024 at 04:34, Justin Stitt wrote:
> > Using syzkaller alongside the newly reintroduced signed integer overflow
> > sanitizer spits out this report:
> >
> > [ 138.454979] ------------[ cut here ]------------
> > [ 138.458089] UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16
> > [ 138.462134] 9223372036854775807 + 500 cannot be represented in type 'long'
> > [ 138.466234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-00038-gc0a509640e93-dirty #10
> > [ 138.471498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [ 138.477110] Call Trace:
> > [ 138.478657] <IRQ>
> > [ 138.479964] dump_stack_lvl+0x93/0xd0
> > [ 138.482276] handle_overflow+0x171/0x1b0
> > [ 138.484699] second_overflow+0x2d6/0x500
> > [ 138.487133] accumulate_nsecs_to_secs+0x60/0x160
> > [ 138.489931] timekeeping_advance+0x1fe/0x890
> > [ 138.492535] update_wall_time+0x10/0x30
>
> Same comment vs. trimming.

Gotcha, in the next version this will be trimmed.

>
> > Historically, the signed integer overflow sanitizer did not work in the
> > kernel due to its interaction with `-fwrapv` but this has since been
> > changed [1] in the newest version of Clang. It was re-enabled in the
> > kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
> > sanitizer").
>
> Again. Irrelevant to the problem.

Right, I'll move it below the fold.

>
> > Let's introduce a new macro and use that against NTP_PHASE_LIMIT to
> > properly limit the max size of time_maxerror without overflowing during
> > the check itself.
>
> This fails to tell what is causing the issue and just talks about what
> the patch is doing. The latter can be seen from the patch itself, no?
>
> Something like this:
>
> On second overflow time_maxerror is unconditionally incremented and
> the result is checked against NTP_PHASE_LIMIT, but the increment can
> overflow into negative space.
>
> Prevent this by checking the overflow condition before incrementing.
>
> Hmm?

Sounds better :thumbs_up: I'll use this!

>
> But that obviously begs the question why this can happen at all.
>
> #define MAXPHASE 500000000L
> #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5)
>
> ==> NTP_PHASE_LIMIT = 1.6e+07 = 0xf42400
>
> #define MAXFREQ 500000
>
> So how can 0xf42400 + 500000/1000 overflow in the first place?
>
> It can't unless time_maxerror is somehow initialized to a bogus
> value and indeed it is:
>
> process_adjtimex_modes()
> ....
> if (txc->modes & ADJ_MAXERROR)
> time_maxerror = txc->maxerror;
>
> So that wants to be fixed and not the symptom.

Isn't this usually supplied from the user and can be some pretty
random stuff? Are you suggesting we update
timekeeping_validate_timex() to include a check to limit the maxerror
field to (NTP_PHASE_LIMIT-(MAXFREQ / NSEC_PER_USEC))? It seems like we
should handle the overflow case where it happens: in
second_overflow().

The clear intent of the existing code was to saturate at
NTP_PHASE_LIMIT, they just did it in a way where the check itself
triggers overflow sanitizers.

>
> Thanks,
>
> tglx

Thanks
Justin