Re: [PATCH] perf/x86/intel/uncore: Initialize with correct logical package ID

From: Prarit Bhargava
Date: Tue Jan 03 2017 - 18:52:43 EST




On 01/03/2017 02:24 PM, Prarit Bhargava wrote:
> On multi-socket Intel v3 processor systems (aka Haswell) kdump can fail with:
>
> BUG: unable to handle kernel paging request at 00000000006563a1
> IP: [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0
> PGD 0 [ 2.313897]
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0 #1
> Hardware name: NEC Express5800/T120f [N8100-2285Y]/GA-7WESV-NJ, BIOS 5.0.4009 08/01/2016
> task: ffff88002bdb8000 task.stack: ffffc90000014000
> RIP: 0010:[<ffffffff8101b582>] [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0
> RSP: 0000:ffffc90000017db8 EFLAGS: 00010206
> RAX: 0000000000656369 RBX: 0000000000000000 RCX: 0000000000001e03
> RDX: ffff88002b224780 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffc90000017dc8 R08: 000000000001c880 R09: ffffffff813667e1
> R10: ffff880030c1c880 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff81c1c090 R14: afafafafafafafaf R15: afafafafafafafaf
> FS: 0000000000000000(0000) GS:ffff880030c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006563a1 CR3: 000000002fc07000 CR4: 00000000001406b0
> Stack:
> ffffc90000017dc8 00000000352bd002 ffffc90000017e00 ffffffff81da17f8
> 0000000000000000 ffffffff81da16f9 00000000000000f0 afafafafafafafaf
> afafafafafafafaf ffffc90000017e78 ffffffff81002190 ffffc90000017e00
> Call Trace:
> [<ffffffff81da17f8>] intel_uncore_init+0xff/0x2e6
> [<ffffffff81da16f9>] ? uncore_type_init+0x158/0x158
> [<ffffffff81002190>] do_one_initcall+0x50/0x190
> [<ffffffff810af27b>] ? parse_args+0x27b/0x460
> [<ffffffff81d9c357>] kernel_init_freeable+0x1a5/0x249
> [<ffffffff81d9ba27>] ? set_debug_rodata+0x12/0x12
> [<ffffffff81702010>] ? rest_init+0x80/0x80
> [<ffffffff8170201e>] kernel_init+0xe/0x110
> [<ffffffff8170f715>] ret_from_fork+0x25/0x30
> Code: 1a d5 00 39 15 cc 1c c0 00 7e 06 89 15 c4 1c c0 00 48 98 48 8b 15 d7 c3 f7 00 48 8d 04 40 48 8d 04 c2 48 8b 40 10 48 85 c0 74 1b <8b> 70 38 48 8b 78 10 48 8d 4d f4 ba 94 00 00 00 e8 b9 db 38 00
> RIP [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0
>
> This is now occuring because 9d85eb9119f4 ("x86/smpboot: Make logical package
> management more robust") corrected the physical ID to logical ID mapping of the
> threads. hswep_uncore_cpu_init() is hard coded for physical socket 0 and if
> the system is kdump'ing on any other socket the logical package value will be
> incorrect. The code should not use 0 as the physical ID, and should use
> the boot cpu's physical package ID in this calculation.
>
> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Kan Liang <kan.liang@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: Harish Chegondi <harish.chegondi@xxxxxxxxx>
> ---
> arch/x86/events/intel/uncore_snbep.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
> index e6832be714bc..b5fbb59fdc64 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -2686,7 +2686,7 @@ static int hswep_pcu_hw_config(struct intel_uncore_box *box, struct perf_event *
>
> void hswep_uncore_cpu_init(void)
> {
> - int pkg = topology_phys_to_logical_pkg(0);
> + int pkg = topology_phys_to_logical_pkg(boot_cpu_data.phys_proc_id);

One thing that just occurred to me as I was looking at other code.
boot_cpu_data has logical_proc_id, so it may be better to use that instead of
the lookup function.

I'm not sure of the usage of physical_to_logical_pkg[] and logical_proc_id.
Unless tglx or someone already knows of a reason not to use logical_proc_id I
certainly can change the patch.

P.

>
> if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
> hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
>