Re: PROBLEM: lk 4.5 oops on boot with Xeon D-1520

From: Thomas Gleixner
Date: Wed Feb 24 2016 - 04:13:50 EST


Added Liang to CC, preserved full mail for reference

On Wed, 17 Feb 2016, Tony Battersby wrote:

> The following commit in 4.5 is causing a general protection fault during
> early boot:
>
> d6980ef32570 ("perf/x86/intel/uncore: Add Broadwell-EP uncore support")
>
> With the commit reverted, the system boots fine.
>
> CPU: Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> Motherboard: Supermicro X10SDV-4C-TLN2F
>
> The general protection fault occurs when
> hswep_uncore_sbox_msr_init_box() calls wrmsrl(). I added a printk to
> get the following values just before the oops:
>
> box->pmu->type->box_ctl = 1824
> box->pmu->pmu_idx = 0
> box->pmu->type->msr_offset = 10
> box->pmu->type->msr_offsets = NULL
> msr = 1824
> (all values are decimal)
>
> Here is the call trace:
> hswep_uncore_sbox_msr_init_box+0x7c/0xc0 (RIP)
> uncore_cpu_starting+0x8a/0x1c0
> ? uncore_change_context+0xe5/0x150
> ? uncore_types_init+0x1d6/0x1d6
> uncore_cpu_setup+0x10/0x12
> on_each_cpu+0x32/0x50
> intel_uncore_init+0x2e8/0x36d
> ? cstate_pmu_init+0x14f/0x195
> ? uncore_cpu_setup+0x12/0x12
>
> I have a jpg image of the monitor displaying the full oops; let me know
> if anyone wants that.
>
> ----------
>
> /proc/cpuinfo:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 0
> cpu cores : 4
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 1
> cpu cores : 4
> apicid : 2
> initial apicid : 2
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 2
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 2
> cpu cores : 4
> apicid : 4
> initial apicid : 4
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 3
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 3
> cpu cores : 4
> apicid : 6
> initial apicid : 6
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 4
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 0
> cpu cores : 4
> apicid : 1
> initial apicid : 1
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 5
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 1
> cpu cores : 4
> apicid : 3
> initial apicid : 3
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 6
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 2
> cpu cores : 4
> apicid : 5
> initial apicid : 5
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> processor : 7
> vendor_id : GenuineIntel
> cpu family : 6
> model : 86
> model name : Intel(R) Xeon(R) CPU D-1520 @ 2.20GHz
> stepping : 2
> microcode : 0xa
> cpu MHz : 2200.000
> tsc MHz : 2199.998
> cache size : 6144 KB
> physical id : 0
> siblings : 8
> core id : 3
> cpu cores : 4
> apicid : 7
> initial apicid : 7
> fpu : yes
> fpu_exception : yes
> cpuid level : 20
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdseed adx smap xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts
> bugs :
> bogomips : 4399.57
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> ----------
>
>