Re: [syzbot] [kernel?] WARNING in c_start (2)

From: Tetsuo Handa
Date: Sun Mar 12 2023 - 05:50:39 EST


On 2023/03/12 16:43, Linus Torvalds wrote:
> On Sat, Mar 11, 2023 at 6:57 PM Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
>> "cat /proc/cpuinfo" request.
>>
>> Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
>> changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
>> find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
>> I think that we need to make sure that cpumask_next() and friends would not return cpu id
>> beyond nr_cpumask_bits.
>
> Ahh. yes.
>
> It's the same old "cpumask scanning should be testing >= nr_cpu_ids"
> thing, but c_start() does
>
> *pos = cpumask_next(*pos - 1, cpu_online_mask);
>
> and basically assumes that it is "== nr_cpu_ids" for the end
> condition, and uses the value next time around.
>
> And if it is *exactly* nr_cpu_ids, then the next time it gets called,
> the "*pos - 1" means that it's all ok.
>
> But if it's > nr_cpu_ids, then next time the "-1" doesn't do anything
> useful and the input is still larger than the number of CPU ids.
>
> The core *works* correctly, but it triggers that warning because it is
> not doing that test properly.

Right. The fix that works for "cat /procc/cpuinfo" case is

Subject: cpumask: adjust valid cpu range check in cpumask_next()

Since commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations") changed to pass "small_cpumask_bits" (which is a build-time
constant if NR_CPUS <= BITS_PER_LONG) instead of "nr_cpumask_bits" (which
is not a build-time constant) when cpumask_next() calls find_next_bit(),
the caller of cpumask_next() started observing core id which is beyond
number of available cores.

If all callers of cpumask_next() are prepared for observing core id which
is beyond nr_cpumask_bits, we can preserve optimization introduced by that
commit.

But we need to treat small_cpumask_bits - 1 (which happens when
cpumask_next() is called after cpumask_next() returned small_cpumask_bits)
as well as -1 (which happens when cpumask_next() is called for the first
time while reading /proc/cpuinfo).

Reported-by: syzbot <syzbot+96cae094d90877641f32@xxxxxxxxxxxxxxxxxxxxxxxxx>
Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
Fixes: 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations")
---
include/linux/cpumask.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 63d637d18e79..f96f46326e32 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -207,8 +207,8 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp)
static inline
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
- /* -1 is a legal arg here. */
- if (n != -1)
+ /* -1 and small_cpumask_bits-1 are legal here. */
+ if (n != -1 && n != small_cpumask_bits - 1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), small_cpumask_bits, n + 1);
}
--
2.34.1

but we need to audit all cpumask_next*() users. That would take for a while
waiting for responses.

>
> That c_start() function is ugly, but the simplest patch is probably
> this one-liner (whitespace-damaged but hopefully really obvious):
>
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -156,6 +156,7 @@
> *pos = cpumask_next(*pos - 1, cpu_online_mask);
> if ((*pos) < nr_cpu_ids)
> return &cpu_data(*pos);
> + *pos = nr_cpu_ids;
> return NULL;
> }
>
>
> which just caps that ">= nr_cpu_ids" case down to nr_cpu_ids.
>
> Does that fix your test-case for you?

Yes. But other architectures will need the same fix.

>
> I'm not entirely convinced we shouldn't clean stuff up with a slightly
> bigger patch, though. Instead of capping the 'pos', just testing it
> seems the kind of more obvious thing. This code had similar problems
> before. So an alternative patch (still whitespace-damaged) would be
> something like
>
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -153,8 +153,12 @@
>
> static void *c_start(struct seq_file *m, loff_t *pos)
> {
> - *pos = cpumask_next(*pos - 1, cpu_online_mask);
> - if ((*pos) < nr_cpu_ids)
> + loff_t prev = *pos;
> +
> + if (prev >= nr_cpu_ids)
> + return NULL;
> + *pos = cpumask_next(prev - 1, cpu_online_mask);
> + if (*pos < nr_cpu_ids)
> return &cpu_data(*pos);
> return NULL;
> }
>
> which is a few lines more of patch, but stops depending on that "pos
> has to end up exactly at nr_cpu_ids" thing.

More worrisome thing for me is subtle changes in other locations.
For example, arch/ia64/kernel/mca.c is comparing the result of cpumask_next()
in different ways; one with nr_cpu_ids and the other with NR_CPUS. Maybe simply
commit 5dd3c9949a3e ("cpumask: prepare for iterators to only go to
nr_cpu_ids/nr_cpumask_bits.: ia64") forgot to update ia64_mca_cpe_int_caller().
Until all callers are checked, I afraid suddenly changing cpumask_next() to
return from nr_cpu_ids to NR_CPUS might break something.

static irqreturn_t
ia64_mca_cmc_int_caller(int cmc_irq, void *arg)
{
static int start_count = -1;
unsigned int cpuid;

cpuid = smp_processor_id();

/* If first cpu, update count */
if (start_count == -1)
start_count = IA64_LOG_COUNT(SAL_INFO_TYPE_CMC);

ia64_mca_cmc_int_handler(cmc_irq, arg);

cpuid = cpumask_next(cpuid+1, cpu_online_mask);

if (cpuid < nr_cpu_ids) {
ia64_send_ipi(cpuid, IA64_CMCP_VECTOR, IA64_IPI_DM_INT, 0);
} else {
/* If no log record, switch out of polling mode */
(...snipped...)
}

static irqreturn_t
ia64_mca_cpe_int_caller(int cpe_irq, void *arg)
{
static int start_count = -1;
static int poll_time = MIN_CPE_POLL_INTERVAL;
unsigned int cpuid;

cpuid = smp_processor_id();

/* If first cpu, update count */
if (start_count == -1)
start_count = IA64_LOG_COUNT(SAL_INFO_TYPE_CPE);

ia64_mca_cpe_int_handler(cpe_irq, arg);

cpuid = cpumask_next(cpuid+1, cpu_online_mask);

if (cpuid < NR_CPUS) {
ia64_send_ipi(cpuid, IA64_CPEP_VECTOR, IA64_IPI_DM_INT, 0);
} else {
/*
* If a log was recorded, increase our polling frequency,
(...snipped...)
}

>
> Either patch should result in the same thing and hopefully fix your
> warning, so I guess it's just a matter of taste.
>
> Linus
>

Since currently 99% of syzbot crashes are this bug, I want to fix as soon as
possible. Thus, regarding this cycle, restoring old behavior seems the safer.