Re: [PATCH v3 16/16] objtool,x86: Rewrite retpoline thunk calls

From: Josh Poimboeuf
Date: Wed Jun 02 2021 - 16:44:05 EST


On Wed, Jun 02, 2021 at 05:51:01PM +0200, Lukasz Majczak wrote:
> Hi Peter,
>
> This patch seems to crash on Tigerlake platform (Chromebook delbin), I
> got the following error:
>
> [ 2.103054] pcieport 0000:00:1c.0: PME: Signaling with IRQ 122
> [ 2.110148] pcieport 0000:00:1c.0: pciehp: Slot #7 AttnBtn-
> PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+
> IbPresDis- LLActRep+
> [ 2.126754] pcieport 0000:00:1d.0: PME: Signaling with IRQ 123
> [ 2.133946] ACPI: \_SB_.CP00: Found 3 idle states
> [ 2.139708] BUG: kernel NULL pointer dereference, address: 000000000000012b
> [ 2.140704] #PF: supervisor read access in kernel mode
> [ 2.140704] #PF: error_code(0x0000) - not-present page
> [ 2.140704] PGD 0 P4D 0
> [ 2.140704] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 2.140704] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U
> 5.13.0-rc1 #31
> [ 2.140704] Hardware name: Google Delbin/Delbin, BIOS
> Google_Delbin.13672.156.3 05/14/2021
> [ 2.140704] RIP: 0010:cpuidle_poll_time+0x9/0x6a
> [ 2.140704] Code: 44 00 00 85 f6 78 19 55 48 89 e5 48 8b 05 16 44
> 44 01 4c 8b 58 40 4d 85 db 5d 41 ff d3 66 90 00 c3 0f 1f 44 00 00 55
> 48 89 e5 <48> 8b 46 20 48 85 c0 75 56 4c 63 87 28 04 00 00 b8 24 f49
> [ 2.140704] RSP: 0000:ffffffff9cc03ea8 EFLAGS: 00010282
> [ 2.140704] RAX: 0000000000008e7d RBX: ffffffff9cc1c5fd RCX: 000000007f894e5a
> [ 2.140704] RDX: 000000007f894d4f RSI: 000000000000010b RDI: 0000000002fa1cf6
> [ 2.140704] RBP: ffffffff9cc03ea8 R08: 0000000000000000 R09: 00000000ca948246
> [ 2.140704] R10: 0000000000000000 R11: ffffffff9bf132cb R12: 0000000000000003
> [ 2.140704] R13: ffffbbfdffc21960 R14: 0000000000000000 R15: ffffffff9cdba638
> [ 2.140704] FS: 0000000000000000(0000) GS:ffff928280000000(0000)
> knlGS:0000000000000000
> [ 2.140704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.140704] CR2: 000000000000012b CR3: 000000027e414001 CR4: 0000000000770ef0
> [ 2.140704] PKRU: 55555554
> [ 2.140704] Call Trace:
> [ 2.140704] do_idle+0x175/0x1f6
> [ 2.140704] cpu_startup_entry+0x1d/0x1f
> [ 2.140704] start_kernel+0x3be/0x420
> [ 2.140704] secondary_startup_64_no_verify+0xb0/0xbb

Assuming I'm looking at the right code, this is weird.

cpuidle_poll_time()'s only caller is poll_idle(), which isn't even
listed in the stack trace. Maybe the function before
cpuidle_poll_time() fell through into it somehow. Or execution got
otherwise hosed. That would also explain the bad function argument.

In addition to the data Peter requested, it would also be interesting to
see the disassembly of do_idle() with objdump -dr, to see which function
got called before it went off the rails.

--
Josh