[rmk-arm:aarch64/hotplug-vcpu/v6.6-rc1] [ACPI] 9d0b332731: Kernel panic - not syncing: Fatal exception

From: kernel test robot
Date: Wed Oct 11 2023 - 04:09:19 EST




Hello,

kernel test robot noticed "Kernel panic - not syncing: Fatal exception" on:

commit: 9d0b33273119d6c0d9112a28c2cc2eb8c671fbeb ("ACPI: processor: Register all CPUs from acpi_processor_get_info()")
git://git.armlinux.org.uk/~rmk/linux-arm aarch64/hotplug-vcpu/v6.6-rc1

in testcase: boot

compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202310111516.ba5ea8dc-oliver.sang@xxxxxxxxx



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231011/202310111516.ba5ea8dc-oliver.sang@xxxxxxxxx



we are sorry that the dmesg in above link is not full and misses final crash.
this is due to some of our service issues which is under investigation now.

below further information is captured from serial:

[ 4.434655][ T1] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
[ 4.442711][ T1] .... node #1, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[ 0.676292][ T0] masked ExtINT on CPU#1
[ 0.676292][ T0] masked ExtINT on CPU#2
[ 0.676292][ T0] masked ExtINT on CPU#3
[ 0.676292][ T0] masked ExtINT on CPU#4
[ 0.676292][ T0] masked ExtINT on CPU#5
[ 0.676292][ T0] masked ExtINT on CPU#6
[ 0.676292][ T0] masked ExtINT on CPU#7
[ 0.676292][ T0] masked ExtINT on CPU#8
[ 0.676292][ T0] masked ExtINT on CPU#9
[ 0.676292][ T0] masked ExtINT on CPU#10
[ 0.676292][ T0] masked ExtINT on CPU#11
[ 0.676292][ T0] masked ExtINT on CPU#12
[ 0.676292][ T0] masked ExtINT on CPU#13
[ 0.676292][ T0] masked ExtINT on CPU#14
[ 0.676292][ T0] masked ExtINT on CPU#15
[ 0.676292][ T0] masked ExtINT on CPU#16
[ 0.676292][ T0] masked ExtINT on CPU#17
[ 0.676292][ T0] masked ExtINT on CPU#18
[ 0.676292][ T0] masked ExtINT on CPU#19
[ 0.676292][ T0] masked ExtINT on CPU#20
[ 0.676292][ T0] masked ExtINT on CPU#21
[ 0.676292][ T0] masked ExtINT on CPU#22
[ 0.676292][ T0] masked ExtINT on CPU#23
[ 0.676292][ T0] masked ExtINT on CPU#24
[ 0.676292][ T0] smpboot: CPU 24 Converting physical 0 to logical die 1
[ 0.676292][ T0] masked ExtINT on CPU#25
[ 0.676292][ T0] masked ExtINT on CPU#26
[ 0.676292][ T0] masked ExtINT on CPU#27
[ 0.676292][ T0] masked ExtINT on CPU#28
[ 0.676292][ T0] masked ExtINT on CPU#29
[ 0.676292][ T0] masked ExtINT on CPU#30
[ 0.676292][ T0] masked ExtINT on CPU#31
[ 0.676292][ T0] masked ExtINT on CPU#32
[ 0.676292][ T0] masked ExtINT on CPU#33
[ 0.676292][ T0] masked ExtINT on CPU#34
[ 0.676292][ T0] masked ExtINT on CPU#35
[ 0.676292][ T0] masked ExtINT on CPU#36
[ 0.676292][ T0] masked ExtINT on CPU#37
[ 0.676292][ T0] masked ExtINT on CPU#38
[ 0.676292][ T0] masked ExtINT on CPU#39
[ 0.676292][ T0] masked ExtINT on CPU#40
[ 0.676292][ T0] masked ExtINT on CPU#41
[ 0.676292][ T0] masked ExtINT on CPU#42
[ 0.676292][ T0] masked ExtINT on CPU#43
[ 0.676292][ T0] masked ExtINT on CPU#44
[ 0.676292][ T0] masked ExtINT on CPU#45
[ 0.676292][ T0] masked ExtINT on CPU#46
[ 0.676292][ T0] masked ExtINT on CPU#47
[ 4.669736][ T1]
[ 4.677645][ T1] .... node #0, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71
[ 4.689709][ T1] .... node #1, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95
[ 0.676292][ T0] masked ExtINT on CPU#48
[ 4.712712][ T1] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html fo
r more details.
[ 4.714070][ T1] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_m
mio_stale_data.html for more details.
[ 0.676292][ T0] masked ExtINT on CPU#49
[ 0.676292][ T0] masked ExtINT on CPU#50
[ 0.676292][ T0] masked ExtINT on CPU#51
[ 0.676292][ T0] masked ExtINT on CPU#52
[ 0.676292][ T0] masked ExtINT on CPU#53
[ 0.676292][ T0] masked ExtINT on CPU#54
[ 0.676292][ T0] masked ExtINT on CPU#55
[ 0.676292][ T0] masked ExtINT on CPU#56
[ 0.676292][ T0] masked ExtINT on CPU#57
[ 0.676292][ T0] masked ExtINT on CPU#58
[ 0.676292][ T0] masked ExtINT on CPU#59
[ 0.676292][ T0] masked ExtINT on CPU#60
[ 0.676292][ T0] masked ExtINT on CPU#61
[ 0.676292][ T0] masked ExtINT on CPU#62
[ 0.676292][ T0] masked ExtINT on CPU#63
[ 0.676292][ T0] masked ExtINT on CPU#64
[ 0.676292][ T0] masked ExtINT on CPU#65
[ 0.676292][ T0] masked ExtINT on CPU#66
[ 0.676292][ T0] masked ExtINT on CPU#67
[ 0.676292][ T0] masked ExtINT on CPU#68
[ 0.676292][ T0] masked ExtINT on CPU#69
[ 0.676292][ T0] masked ExtINT on CPU#70
[ 0.676292][ T0] masked ExtINT on CPU#71
[ 0.676292][ T0] masked ExtINT on CPU#72
[ 0.676292][ T0] masked ExtINT on CPU#73
[ 0.676292][ T0] masked ExtINT on CPU#74
[ 0.676292][ T0] masked ExtINT on CPU#75
[ 0.676292][ T0] masked ExtINT on CPU#76
[ 0.676292][ T0] masked ExtINT on CPU#77
[ 0.676292][ T0] masked ExtINT on CPU#78
[ 0.676292][ T0] masked ExtINT on CPU#79
[ 0.676292][ T0] masked ExtINT on CPU#80
[ 0.676292][ T0] masked ExtINT on CPU#81
[ 0.676292][ T0] masked ExtINT on CPU#82
[ 0.676292][ T0] masked ExtINT on CPU#83
[ 0.676292][ T0] masked ExtINT on CPU#84
[ 0.676292][ T0] masked ExtINT on CPU#85
[ 0.676292][ T0] masked ExtINT on CPU#86
[ 0.676292][ T0] masked ExtINT on CPU#87
[ 0.676292][ T0] masked ExtINT on CPU#88
[ 0.676292][ T0] masked ExtINT on CPU#89
[ 0.676292][ T0] masked ExtINT on CPU#90
[ 0.676292][ T0] masked ExtINT on CPU#91
[ 0.676292][ T0] masked ExtINT on CPU#92
[ 0.676292][ T0] masked ExtINT on CPU#93
[ 0.676292][ T0] masked ExtINT on CPU#94
[ 0.676292][ T0] masked ExtINT on CPU#95
[ 4.917591][ T1] smp: Brought up 2 nodes, 96 CPUs
[ 4.924513][ T1] smpboot: Max logical packages: 2
[ 4.925656][ T1] smpboot: Total of 96 processors activated (403200.00 BogoMIPS)
[ 5.186590][ T593] node 1 deferred pages initialised in 257ms
[ 5.186860][ T592] node 0 deferred pages initialised in 257ms
[ 5.212648][ T1] devtmpfs: initialized
[ 5.216686][ T1] x86/mm: Memory block size: 2048MB
[ 5.224648][ T1] ACPI: PM: Registering ACPI NVS region [mem 0x67a30000-0x6845ffff] (10682368 bytes)
[ 5.234938][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 5.244753][ T1] futex hash table entries: 32768 (order: 9, 2097152 bytes, vmalloc)
[ 5.253275][ T1] pinctrl core: initialized pinctrl subsystem
[ 5.260790][ T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 5.267923][ T1] audit: initializing netlink subsys (disabled)
[ 5.274549][ T690] audit: type=2000 audit(1696931170.998:1): state=initialized audit_enabled=0 res=1
[ 5.274803][ T1] thermal_sys: Registered thermal governor 'fair_share'
[ 5.283695][ T1] thermal_sys: Registered thermal governor 'bang_bang'
[ 5.290657][ T1] thermal_sys: Registered thermal governor 'step_wise'
[ 5.297657][ T1] thermal_sys: Registered thermal governor 'user_space'
[ 5.304554][ T1] cpuidle: using governor menu
[ 5.315654][ T1] Detected 1 PCC Subspaces
[ 5.320661][ T1] Registering PCC driver as Mailbox controller
[ 5.327622][ T1] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[ 5.335700][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 5.343778][ T1] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 5.353697][ T1] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved as E820 entry
[ 5.361700][ T1] PCI: Using configuration type 1 for base access
[ 5.368813][ T19] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 5.369510][ T19] #PF: supervisor read access in kernel mode
[ 5.369510][ T19] #PF: error_code(0x0000) - not-present page
[ 5.369510][ T19] PGD 0 P4D 0
[ 5.369510][ T19] Oops: 0000 [#1] SMP NOPTI
[ 5.369510][ T19] CPU: 0 PID: 19 Comm: cpuhp/0 Not tainted 6.6.0-rc1-00015-g9d0b33273119 #1
[ 5.369510][ T19] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 5.369510][ T19] RIP: 0010:sysfs_merge_group+0x1e/0x130
[ 5.369510][ T19] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 55 31 d2 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 10 48 8b 36 <48> 8
b 7f 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 c7 04
[ 5.369510][ T19] RSP: 0000:ffffc90008887e10 EFLAGS: 00010282
[ 5.369510][ T19] RAX: 0000000000000007 RBX: ffffffff82207680 RCX: 00000000000001b0
[ 5.369510][ T19] RDX: 0000000000000000 RSI: ffffffff822b9088 RDI: 0000000000000000
[ 5.369510][ T19] RBP: 0000000000000000 R08: ffff88bf7f61c148 R09: ffff888107728090
[ 5.369510][ T19] R10: ffff88bf7f61c120 R11: 00000000ffffffff R12: 0000000000000000
[ 5.369510][ T19] R13: ffff88bf7f61c120 R14: ffffffff81046350 R15: ffff88bf7f61c148
[ 5.369510][ T19] FS: 0000000000000000(0000) GS:ffff88bf7f600000(0000) knlGS:0000000000000000
[ 5.369510][ T19] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.369510][ T19] CR2: 0000000000000030 CR3: 000000807ea18001 CR4: 00000000007706f0
[ 5.369510][ T19] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5.369510][ T19] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5.369510][ T19] PKRU: 55555554
[ 5.369510][ T19] Call Trace:
[ 5.369510][ T19] <TASK>
[ 5.369510][ T19] ? __die+0x23/0x70
[ 5.369510][ T19] ? page_fault_oops+0xa4/0x170
[ 5.369510][ T19] ? exc_page_fault+0x67/0x130
[ 5.369510][ T19] ? asm_exc_page_fault+0x26/0x30
[ 5.369510][ T19] ? __pfx_intel_epb_online+0x10/0x10
[ 5.369510][ T19] ? sysfs_merge_group+0x1e/0x130
[ 5.369510][ T19] ? __switch_to_asm+0x38/0x70
[ 5.369510][ T19] intel_epb_online+0x37/0x70
[ 5.369510][ T19] cpuhp_invoke_callback+0xf1/0x3b0
[ 5.369510][ T19] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 5.369510][ T19] cpuhp_thread_fun+0xde/0x170
[ 5.369510][ T19] smpboot_thread_fn+0xb0/0x170
[ 5.369510][ T19] kthread+0xcd/0x130
[ 5.369510][ T19] ? __pfx_kthread+0x10/0x10
[ 5.369510][ T19] ret_from_fork+0x31/0x70
[ 5.369510][ T19] ? __pfx_kthread+0x10/0x10
[ 5.369510][ T19] ret_from_fork_asm+0x1b/0x30
[ 5.369510][ T19] </TASK>
[ 5.369510][ T19] Modules linked in:
[ 5.369510][ T19] CR2: 0000000000000030
[ 5.369510][ T19] ---[ end trace 0000000000000000 ]---
[ 5.369510][ T19] RIP: 0010:sysfs_merge_group+0x1e/0x130
[ 5.369510][ T19] Code: 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 55 31 d2 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 10 48 8b 36 <48> 8
b 7f 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 c7 04
[ 5.369510][ T19] RSP: 0000:ffffc90008887e10 EFLAGS: 00010282
[ 5.369510][ T19] RAX: 0000000000000007 RBX: ffffffff82207680 RCX: 00000000000001b0
[ 5.369510][ T19] RDX: 0000000000000000 RSI: ffffffff822b9088 RDI: 0000000000000000
[ 5.369510][ T19] RBP: 0000000000000000 R08: ffff88bf7f61c148 R09: ffff888107728090
[ 5.369510][ T19] R10: ffff88bf7f61c120 R11: 00000000ffffffff R12: 0000000000000000
[ 5.369510][ T19] R13: ffff88bf7f61c120 R14: ffffffff81046350 R15: ffff88bf7f61c148
[ 5.369510][ T19] FS: 0000000000000000(0000) GS:ffff88bf7f600000(0000) knlGS:0000000000000000
[ 5.369510][ T19] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.369510][ T19] CR2: 0000000000000030 CR3: 000000807ea18001 CR4: 00000000007706f0
[ 5.369510][ T19] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5.369510][ T19] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5.369510][ T19] PKRU: 55555554
[ 5.369510][ T19] Kernel panic - not syncing: Fatal exception


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki