Re: [PATCH] ntp: remove accidental integer wrap-around

From: Thomas Gleixner
Date: Fri May 17 2024 - 04:49:50 EST


On Thu, May 16 2024 at 16:40, Justin Stitt wrote:
> On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> So how can 0xf42400 + 500000/1000 overflow in the first place?
>>
>> It can't unless time_maxerror is somehow initialized to a bogus
>> value and indeed it is:
>>
>> process_adjtimex_modes()
>> ....
>> if (txc->modes & ADJ_MAXERROR)
>> time_maxerror = txc->maxerror;
>>
>> So that wants to be fixed and not the symptom.
>
> Isn't this usually supplied from the user and can be some pretty
> random stuff?

Sure it comes from user space and can contain random nonsense as
syzkaller demonstrated.

> Are you suggesting we update timekeeping_validate_timex() to include a
> check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ /
> NSEC_PER_USEC))? It seems like we should handle the overflow case
> where it happens: in second_overflow().
>
> The clear intent of the existing code was to saturate at
> NTP_PHASE_LIMIT, they just did it in a way where the check itself
> triggers overflow sanitizers.

The clear intent of the code is to do saturation of a bound value.

Clearly the user space interface fails to validate the input to be in a
sane range and that makes you go and prevent the resulting overflow at
the usage site. Seriously?

Obviously the sanitizer detects the stupid in second_overflow(), but
that does not mean that the proper solution is to add overflow
protection to that code.

Tools are good to pin-point symptoms, but they are by definition
patently bad in root cause analysis. Otherwise we could just let the
tool write the "fix".

The obvious first question in such a case is to ask _WHY_ does
time_maxerror have a bogus value, which clearly cannot be achieved from
regular operation.

Once you figured out that the only way to set time_maxerror to a bogus
value is the user space interface the obvious follow up question is
whether such a value has to be considered as valid or not.

As it is obviously invalid the logical consequence is to add a sanity
check and reject that nonsense at that boundary, no?

Thanks,

tglx