Re: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()
From: poma
Date: Tue Feb 11 2014 - 15:45:18 EST
On 11.02.2014 15:25, Thomas Gleixner wrote:
> On Mon, 10 Feb 2014, Thomas Gleixner wrote:
>> On Mon, 10 Feb 2014, poma wrote:
>>
>>> [ 83.558551] [<ffffffff81025b17>] amd_e400_idle+0x87/0x130
>>
>> So this seems to happen only on AMD machines which use that e400 idle
>> mode. I have no idea at the moment whats wrong there. I'll find one of
>> those machines and try to reproduce.
>
> Found it. Patch below.
>
> Thanks,
>
> tglx
> ----
> Subject: tick: Clear broadcast pending bit when switching to oneshot
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Tue, 11 Feb 2014 14:35:40 +0100
>
> AMD systems which use the C1E workaround in the amd_e400_idle routine
> trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.
>
> The reason is that the idle routine of those AMD systems switches the
> cpu into forced broadcast mode early on before the newly brought up
> CPU can switch over to high resolution / NOHZ mode. The timer related
> CPU1 bringup looks like this:
>
> clockevent_register_device(local_apic);
> tick_setup(local_apic);
> ...
> idle()
> tick_broadcast_on_off(FORCE);
> tick_broadcast_oneshot_control(ENTER)
> cpumask_set(cpu, broadcast_oneshot_mask);
> halt();
>
> Now the broadcast interrupt on CPU0 sets CPU1 in the
> broadcast_pending_mask and wakes CPU1. So CPU1 continues:
>
> local_apic_timer_interrupt()
> tick_handle_periodic();
> softirq()
> tick_init_highres();
> cpumask_clr(cpu, broadcast_oneshot_mask);
>
> tick_broadcast_oneshot_control(ENTER)
> WARN_ON(cpumask_test(cpu, broadcast_pending_mask);
>
> So while we remove CPU1 from the broadcast_oneshot_mask when we switch
> over to highres mode, we do not clear the pending bit, which then
> triggers the warning when we go back to idle.
>
> The reason why this is only visible on C1E affected AMD systems is
> that the other machines enter the deep sleep states via
> acpi_idle/intel_idle and exit the broadcast mode before executing the
> remote triggered local_apic_timer_interrupt. So the pending bit is
> already cleared when the switch over to highres mode is clearing the
> oneshot mask.
>
> The solution is simple: Clear the pending bit together with the mask
> bit when we switch over to highres mode.
>
> Reported-by: poma <pomidorabelisima@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # 3.10+
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> kernel/time/tick-broadcast.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: linux-2.6/kernel/time/tick-broadcast.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/tick-broadcast.c
> +++ linux-2.6/kernel/time/tick-broadcast.c
> @@ -756,6 +756,7 @@ out:
> static void tick_broadcast_clear_oneshot(int cpu)
> {
> cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
> + cpumask_clear_cpu(cpu, tick_broadcast_pending_mask);
> }
>
> static void tick_broadcast_init_next_event(struct cpumask *mask,
>
>
Thanks!
poma
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/