Re: [LKP] [lkp] [x86 tsc] 19fa5e7364: WARNING: CPU: 0 PID: 0 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x72/0x80

From: Wanpeng Li
Date: Tue Jun 21 2016 - 19:32:27 EST


2016-06-21 21:59 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
> Hi Paolo,
> 2016-06-21 18:24 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
>> 2016-06-21 18:10 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>>>
>>>
>>> On 21/06/2016 08:08, Wanpeng Li wrote:
>>>> Cc KVM ML, Paolo, Radim,
>>>>>> FYI, raw QEMU command line is:
>>>>>>
>>>>>> qemu-system-x86_64 -enable-kvm -cpu SandyBridge -kernel /pkg/linux/x86_64-randconfig-w0-06180628/gcc-6/19fa5e73647fde1e6a7038a8f05cddf4c43f08d3/vmlinuz-4.7.0-rc3-00009-g19fa5e7 -append 'root=/dev/ram0 user=lkp job=/lkp/scheduled/vm-kbuild-yocto-x86_64-32/bisect_boot-1-yocto-minimal-x86_64.cgz-x86_64-randconfig-w0-06180628-19fa5e73647fde1e6a7038a8f05cddf4c43f08d3-20160618-25535-h82bax-0.yaml~ ARCH=x86_64 kconfig=x86_64-randconfig-w0-06180628 branch=internal-eywa/master commit=19fa5e73647fde1e6a7038a8f05cddf4c43f08d3 BOOT_IMAGE=/pkg/linux/x86_64-randconfig-w0-06180628/gcc-6/19fa5e73647fde1e6a7038a8f05cddf4c43f08d3/vmlinuz-4.7.0-rc3-00009-g19fa5e7 max_uptime=600 RESULT_ROOT=/result/boot/1/vm-kbuild-yocto-x86_64/yocto-minimal-x86_64.cgz/x86_64-randconfig-w0-06180628/gcc-6/19fa5e73647fde1e6a7038a8f05cddf4c43f08d3/0 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw ip=::::vm-kbuild-yocto-x86_64-32::dhcp drbd.minor_count=8' -initrd /fs/sdh1/initrd-vm-kbuild-yocto-x86_64-32 -m 320 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -drive file=/fs/sdh1/disk0-vm-kbuild-yocto-x86_64-32,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-vm-kbuild-yocto-x86_64-32 -serial file:/dev/shm/kboot/serial-vm-kbuild-yocto-x86_64-32 -daemonize -display none -monitor null
>>>>>>
>>>>> This problem was caused due to kvm does not support MSR_PLATFORM_INFO(0xce),
>>>>> according to Wanpeng's feedback.
>>>>>
>>>>> Hi Wanpeng, is it possible for kvm to simulate this MSR, otherwise we
>>>>> might have to use
>>>>> rdmsr_safe instead.
>>>>
>>>> There is a thread discussed this before
>>>> https://patchwork.kernel.org/patch/8833021/, MSR_PLATFORM_INFO can't
>>>> be simple emulation.
>>>>
>>>> Ping Paolo, Radim. :)
>>>
>>> rdmsr_safe must be used instead. I'll prepare a patch.
>>
>> Actually I have such a patch on hand under testing, I will send out soon. :)
>
> I have a temporal patch as below, it seems that guest tsc(~300MHz) is
> still not correct and guest kernel panic during boot w/ message
> "MP-BIOS bug: 8254 timer not connect to IO-APIC, kernel-panic - not
> syncing: IOAPIC + timer doesn't work" etc. Any proposal to improve my
> patch is a great appreciated. :) The patch is against x86 branch on
> Len Brown's tree. And try to fix this commit:
> https://git.kernel.org/cgit/linux/kernel/git/lenb/linux.git/commit/?h=x86&id=fc141535ad8a67fd58623289c04e35465e2a07f2
>
> --------------------
>
> From 8033ae4c7e44d6bfe26642b151de03c613125066 Mon Sep 17 00:00:00 2001
> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> Date: Tue, 21 Jun 2016 19:41:12 +0800
> Subject: [PATCH] x86: fix rdmsr MSR_PLATFORM_INFO unsafe warning in kvm guest
>
> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at arch/x86/mm/extable.c:50
> ex_handler_rdmsr_unsafe+0x6a/0x70
> unchecked MSR access error: RDMSR from 0xce
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc3+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> 0000000000000000 ffffffff81c03ce0 ffffffff813b3eae ffffffff81c03d30
> 0000000000000000 ffffffff81c03d20 ffffffff81067181 0000003200000001
> ffffffff81c03df8 ffffffff8179676c 0000000000000000 ffffffff81fcd2c0
> Call Trace:
> dump_stack+0x67/0x99
> __warn+0xd1/0xf0
> warn_slowpath_fmt+0x4f/0x60
> ex_handler_rdmsr_unsafe+0x6a/0x70
> fixup_exception+0x39/0x50
> do_general_protection+0x93/0x1b0
> general_protection+0x22/0x30
> ? cpu_khz_from_msr+0xd8/0x1c0
> native_calibrate_cpu+0x30/0x5b0
> tsc_init+0x2b/0x297
> x86_late_time_init+0xf/0x11
> start_kernel+0x398/0x451
> ? set_init_arg+0x55/0x55
> x86_64_start_reservations+0x2f/0x31
> x86_64_start_kernel+0xea/0xed
>
> After commit (fc141535ad8 : "x86 tsc_msr: Extend to include Intel Core
> Architecture"),
> rdmsr MSR_PLATFORM_INFO is used to get maximum non-turbo ratio for
> recent Intel Core
> Architecture which results in kvm guest rdmsr unsafe warning.
>
> As Radim pointed out before:
>
> | MSR_PLATFORM_INFO: Intel changes it from family to family and there is
> | no obvious overlap or default. If we picked 0 (any other fixed value),
> | then the guest would have to know that 0 doesn't mean that
> | MSR_PLATFORM_INFO returned 0, but that KVM doesn't emulate this MSR and
> | the value cannot be used. This is very similar to handling a #GP in the
> | guest, but also has a disadvantage, because KVM cannot say that
> | MSR_PLATFORM_INFO is 0. Simple emulation is not possible.
>
> This patch fix it by using rdmsr_safe to read MSR_PLATFORM_INFO in kvm guest
> in order that #GP can be fixed up.
>
> Reported-by: kernel test robot <xiaolong.ye@xxxxxxxxx>
> Cc: Len Brown <len.brown@xxxxxxxxx>
> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
> Cc: Chen Yu <y.c.chen@xxxxxxxxx>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> Cc: jacob.jun.pan@xxxxxxxxx
> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> ---
> arch/x86/kernel/tsc_msr.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c
> index e0c2b30..15e06e1 100644
> --- a/arch/x86/kernel/tsc_msr.c
> +++ b/arch/x86/kernel/tsc_msr.c
> @@ -123,8 +123,11 @@ unsigned long cpu_khz_from_msr(void)
> }
>
> get_ratio:
> - rdmsr(MSR_PLATFORM_INFO, lo, hi);
> - ratio = (lo >> 8) & 0xff;
> + if (rdmsr_safe(MSR_PLATFORM_INFO, &lo, &hi)) {
> + rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
> + ratio = (hi >> 8) & 0x1f;

I think this should fallback to PIT calibration instead of
MSR_IA32_PERF_STATUS. In addition, I remember Radim mentioned that
"PERF_CTL the target value for PERF_STATUS, but OS shouldn't put much
trust in those values ... especially under KVM, where those MSRs make
little sense." I will try it today.

Regards,
Wanpeng Li