Re: [PATCH v2 1/2] x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump kernel
From: Eric W. Biederman
Date: Wed Feb 07 2018 - 14:49:14 EST
ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>
>> Baoquan He <bhe@xxxxxxxxxx> writes:
>>
>>> On kvm guest, kernel always prints warning during kdump kernel boots as
>>> below.
>>>
>>> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
>>> [ 0.001000] Modules linked in:
>>> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
>>> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>>> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
>>> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
>>> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
>>> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
>>> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
>>> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
>>> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
>>> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
>>> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
>>> [ 0.001000] Call Trace:
>>> [ 0.001000] apic_bsp_setup+0x56/0x74
>>> [ 0.001000] x86_late_time_init+0x11/0x16
>>> [ 0.001000] start_kernel+0x3c9/0x486
>>> [ 0.001000] secondary_startup_64+0xa5/0xb0
>>> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff
>>> ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
>>> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
>>> [ 0.001000] masked ExtINT on CPU#0
>>>
>>> The root cause is the legacy irq mode is disabled before jump to kexec/kdump
>>> kernel since commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown
>>> of the local APIC"). In that commit, lapic_shutdown() calling was moved after
>>> disable_IO_APIC(). In fact in disable_IO_APIC(), it not only calls
>>> clear_IO_APIC() to disable IO-APIC, and also sets LAPIC and IO-APIC to make
>>> system be PIC or Virtual wire mode. Hence local APIC is disabled completely
>>> by the calling of lapic_shutdown().
>>
>> The actions of lapic_shutdown do not depend on the actions of
>> disable_IO_APIC so this description and justificaiton are nonsense.
>>
>> Further we don't hardware disable the local APIC except when we hardware
>> enable it. And only on 32bit at that.
>>
>> I keep wondering if the above oops is due to an emulation bug in kvm.
>> If that is the case it might be better to fix kvm.
>
> Sigh. Reading a little deeper I see where the local apic is affected.
> It is the work of disconnect_bsp_APIC called from disable_IO_APIC.
>
> Calling lapic_shutdown (which clears the local apic) after the local
> apic has been placed into virtual wire mode would indeed be a problem.
>
> Now that I see that I agree in essence with this patch series.
> I don't agree with the implemenation details.
>
> Can you please split disable_IO_APIC and switch_to_legacy_irq_mode
> in a single patch.
Now that I think about it can you call the function not
switch_to_legacy_irq_mode but restore_boot_irq_mode. And the
corresponding x86_io_apic_ops not .disable but .restore.
That should make the purpose of the code clearer, and help avoid
mistakes like the one that led to this regression.
Eric