Re: [PATCH v3] smp: Fix a potential usage of stale nr_cpus

From: Ingo Molnar
Date: Mon Jul 27 2020 - 17:34:51 EST

* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> Ingo Molnar <mingo@xxxxxxxxxx> writes:
> >> - get_option(&str, &nr_cpus);
> >> + if (get_option(&str, &nr_cpus) != 1)
> >> + return -EINVAL;
> >> +
> >> if (nr_cpus > 0 && nr_cpus < nr_cpu_ids)
> >> nr_cpu_ids = nr_cpus;
> >> + else
> >> + return -EINVAL;
> >
> > Exactly what does 'not valid' mean, and why doesn't get_option()
> > return -EINVAL in that case?
> What's unclear about invalid? If you specify nr_cpus=-1 or
> nr_cpus=2000000 the its obviously invalid.

So this was the old (buggy) code:

> {
> int nr_cpus;
> get_option(&str, &nr_cpus);
> if (nr_cpus > 0 && nr_cpus < nr_cpu_ids)
> nr_cpu_ids = nr_cpus;

And this was the explanation given in the changelog:

>> When the cmdline of "nr_cpus" is not valid, the @nr_cpu_ids is
>> assigned a stale value. The nr_cpus is only valid when get_option()
>> return 1. So check the return value to prevent this.

The answer to my question is that the bug is that the return value of
get_option() wasn't checked properly, and if get_option() returns an
error then the nr_cpus local variable is not set - but we used it in
the old code, which can result in essentially a random value for

> How should get_option() know that this is invalid? get_option() is a
> number parser and does not know about any restrictions on the parsed
> value obviously.

But that's apparently not the bug here, 'invalid' here was meant as
per the parser's syntax. If nr_cpus is out of range (like the 2000000
example you gave), then nr_cpu_ids might not be set at all, and
remains at the 0 initialized value. Which isn't good but not 'stale'

This is why I was puzzled where a 'stale' value might come from, at
first sight I was assuming that some large value was written, like
your 200000 example. The "stale value" happens if it's invalid syntax
and get_option() returns an error, in which case 'nr_cpus' remains

And this is the explanation I didn't find at first reading, and which
explanation future changelogs should perhaps include.

The new code does this:

int nr_cpus;

if (get_option(&str, &nr_cpus) != 1)
return -EINVAL;

if (nr_cpus > 0 && nr_cpus < nr_cpu_ids)
nr_cpu_ids = nr_cpus;
return -EINVAL;

Which does all the proper error handling and fixes the uninitialized
'nr_cpus' local variable usage. So I agree with the fix:

Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>