Re: [bisected] x86 boot still broken on -rc2
From: Prarit Bhargava
Date: Mon Dec 04 2017 - 07:28:12 EST
On 12/03/2017 08:28 PM, Jakub Kicinski wrote:
> Same thing on rc2, bisected down to:
>
> commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
> Author: Prarit Bhargava <prarit@xxxxxxxxxx>
> Date: Tue Nov 14 07:42:57 2017 -0500
>
> x86/smpboot: Fix __max_logical_packages estimate
>
> A system booted with a small number of cores enabled per package
> panics because the estimate of __max_logical_packages is too low.
>
> This occurs when the total number of active cores across all packages is
> less than the maximum core count for a single package. e.g.:
>
> On a 4 package system with 20 cores/package where only 4 cores are
> enabled on each package, the value of __max_logical_packages is
> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>
> Calculate __max_logical_packages after the cpu enumeration has completed.
> Use the boot cpu's data to extrapolate the number of packages.
>
> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx>
> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Kan Liang <kan.liang@xxxxxxxxx>
> Cc: He Chen <he.chen@xxxxxxxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
> Cc: Piotr Luc <piotr.luc@xxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Arvind Yadav <arvind.yadav.cs@xxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Cc: Mathias Krause <minipli@xxxxxxxxxxxxxx>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@xxxxxxxxxx
>
>
> On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
>> Hi!
>>
>> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
>> E5-2630 v4 box. It also happens on linux-next. Did anyone else
>> experience it? (.config attached)
>>
>> [ 5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
>> [ 5.007544] Modules linked in:
>> [ 5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [ 5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
I have a Dell R730 available for use. OOC are you booting with the default
BIOS options?
P.
>> [ 5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [ 5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
>> [ 5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
>> [ 5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
>> [ 5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
>> [ 5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
>> [ 5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
>> [ 5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
>> [ 5.007544] FS: 0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [ 5.007544] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [ 5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 5.007544] Call Trace:
>> [ 5.007544] local_pci_probe+0x3d/0x90
>> [ 5.007544] ? pci_match_device+0xd9/0x100
>> [ 5.007544] pci_device_probe+0x122/0x180
>> [ 5.007544] driver_probe_device+0x246/0x330
>> [ 5.007544] ? set_debug_rodata+0x11/0x11
>> [ 5.007544] __driver_attach+0x8a/0x90
>> [ 5.007544] ? driver_probe_device+0x330/0x330
>> [ 5.007544] bus_for_each_dev+0x5c/0x90
>> [ 5.007544] bus_add_driver+0x196/0x220
>> [ 5.007544] driver_register+0x57/0xc0
>> [ 5.007544] intel_uncore_init+0x1e3/0x249
>> [ 5.007544] ? uncore_type_init+0x193/0x193
>> [ 5.007544] ? set_debug_rodata+0x11/0x11
>> [ 5.007544] do_one_initcall+0x4b/0x190
>> [ 5.007544] kernel_init_freeable+0x16e/0x1f5
>> [ 5.007544] ? rest_init+0xd0/0xd0
>> [ 5.007544] kernel_init+0xa/0x100
>> [ 5.007544] ret_from_fork+0x1f/0x30
>> [ 5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0
>> [ 5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
>> [ 5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
>> [ 5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
>> [ 5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
>> [ 5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
>> [ 5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
>> [ 5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
>> [ 5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
>> [ 5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [ 5.298362] Modules linked in:
>> [ 5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G W 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [ 5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [ 5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
>> [ 5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [ 5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
>> [ 5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
>> [ 5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
>> [ 5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
>> [ 5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
>> [ 5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
>> [ 5.298362] FS: 0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
>> [ 5.298362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [ 5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 5.298362] Call Trace:
>> [ 5.298362] uncore_event_cpu_online+0x283/0x340
>> [ 5.298362] ? uncore_event_cpu_offline+0x180/0x180
>> [ 5.298362] cpuhp_invoke_callback+0x8c/0x620
>> [ 5.298362] ? __schedule+0x1ad/0x6c0
>> [ 5.298362] ? sort_range+0x20/0x20
>> [ 5.298362] cpuhp_thread_fun+0xbc/0x140
>> [ 5.298362] smpboot_thread_fn+0x114/0x1d0
>> [ 5.298362] kthread+0x111/0x130
>> [ 5.298362] ? kthread_create_on_node+0x40/0x40
>> [ 5.298362] ret_from_fork+0x1f/0x30
>> [ 5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f
>> [ 5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
>> [ 5.504808] Scanning for low memory corruption every 60 seconds
>> [ 5.512347] Initialise system trusted keyrings
>> [ 5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
>> [ 5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
>> [ 5.528761] IP: __kmalloc_track_caller+0xa8/0x210
>> [ 5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0
>> [ 5.528761] Oops: 0000 [#1] PREEMPT SMP
>> [ 5.528761] Modules linked in:
>> [ 5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G W 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [ 5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [ 5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [ 5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
>> [ 5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
>> [ 5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
>> [ 5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
>> [ 5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
>> [ 5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
>> [ 5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
>> [ 5.528761] FS: 0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [ 5.528761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
>> [ 5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 5.528761] Call Trace:
>> [ 5.528761] kstrdup+0x2d/0x60
>> [ 5.528761] __kernfs_new_node+0x29/0x130
>> [ 5.528761] kernfs_new_node+0x24/0x50
>> [ 5.528761] kernfs_create_link+0x29/0x90
>> [ 5.528761] sysfs_do_create_link_sd.isra.0+0x5d/0xc0
>> [ 5.528761] sysfs_slab_add+0x1f5/0x270
>> [ 5.528761] ? set_debug_rodata+0x11/0x11
>> [ 5.528761] slab_sysfs_init+0x8b/0xfa
>> [ 5.528761] ? kmem_cache_init+0xf9/0xf9
>> [ 5.528761] do_one_initcall+0x4b/0x190
>> [ 5.528761] kernel_init_freeable+0x16e/0x1f5
>> [ 5.528761] ? rest_init+0xd0/0xd0
>> [ 5.528761] kernel_init+0xa/0x100
>> [ 5.528761] ret_from_fork+0x1f/0x30
>> [ 5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7
>> [ 5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
>> [ 5.528761] CR2: ffffffffffffffff
>> [ 5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
>> [ 5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>> [ 5.773089]
>> [ 5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>