Re: Oops from calibrate_delay_is_known on qemu machine with Linux v4.5-1523-g271ecc5253e2

From: Thomas Gleixner
Date: Thu Mar 17 2016 - 17:03:13 EST


Josh,

On Thu, 17 Mar 2016, Josh Boyer wrote:
> We've had a report [1] of the mainline kernel crashing on a single-cpu
> QEMU machine (not kvm) in Fedora. It looks as if the emulated machine
> is failing to provide a TSC and the calibrate_delay_is_known function
> is passing NULL to cpumask_any_but for the mask parameter. At least
> that's all I've been able to discern thus far.
>
> I was wondering if you had any insight into this issue, given your
> recent commit to change calibrate_delay_is_known to use
> topology_core_cpumask. The backtrace is below.

> at (null)
> [ 0.010000] IP: [<ffffffff814698b5>] _find_next_bit.part.0+0x15/0x70
> [ 0.010000] PGD 0
>
> [ 0.010000] RSP: 0000:ffffffff81e03e40 EFLAGS: 00000246
> [ 0.010000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 0.010000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> [ 0.010000] RBP: ffffffff81e03e50 R08: ffffffffffffffff R09: 0000000000000000
> [ 0.010000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 0.010000] R13: ffffffff82248960 R14: ffffffff822562e0 R15: 0000000000000000
> [ 0.010000] FS: 0000000000000000(0000) GS:ffff88001ee00000(0000)
> knlGS:0000000000000000
> [ 0.010000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.010000] CR2: 0000000000000000 CR3: 0000000001e06000 CR4: 00000000000006b0
> [ 0.010000] Stack:
> [ 0.010000] ffffffff81e03e50 ffffffff81469928 ffffffff81e03e70
> ffffffff81453d56
> [ 0.010000] 0000000000000000 ffff88001f3fa780 ffffffff81e03e80
> ffffffff81040495
> [ 0.010000] ffffffff81e03f40 ffffffff8100285a ffffffff810eefb3
> ffffffff00000000
> [ 0.010000] Call Trace:
> [ 0.010000] [<ffffffff81469928>] ? find_next_bit+0x18/0x20
> [ 0.010000] [<ffffffff81453d56>] cpumask_any_but+0x26/0x50

Yuck. That requires that topology_core_cpumask(cpu) is NULL.

#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))

...

DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);

So that can only result in a NULL pointer if you CONFIG_CPUMASK_OFFSTACK
enabled and the allocation fails, which is not checked !?@!

I tried to reproduce with Richards script, but so far no dice. Can you please
provide your kernel config?

Thanks,

tglx