Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)
From: Chris Packham
Date: Mon Jun 05 2023 - 21:41:17 EST
On 2/06/23 16:19, Chris Packham wrote:
>
> On 2/06/23 16:10, Bagas Sanjaya wrote:
>> On 5/29/23 09:37, Chris Packham wrote:
>>> On 29/05/23 14:04, Bagas Sanjaya wrote:
>>>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
>>>>> Hi,
>>>>>
>>>>> We have an embedded product with an Infineon SLM9670 TPM. After
>>>>> updating
>>>>> to a newer LTS kernel version we started seeing the following
>>>>> warning at
>>>>> boot.
>>>>>
>>>>> [ 4.741025] ------------[ cut here ]------------
>>>>> [ 4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled
>>>>> interrupts
>>>>> [ 4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>>>> __handle_irq_event_percpu+0xf4/0x180
>>>>> [ 4.765557] Modules linked in:
>>>>> [ 4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>>>> [ 4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>>>> [ 4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [ 4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>>>> [ 4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>>>> [ 4.797220] sp : ffff800008003e40
>>>>> [ 4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>>>> ffff80000902a9b8
>>>>> [ 4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>>>> ffff000001b92400
>>>>> [ 4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>>>> 0000000000000000
>>>>> [ 4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>>>> ffffffffffffffff
>>>>> [ 4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>>>> ffff800088003b57
>>>>> [ 4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>>>> 000000000000035d
>>>>> [ 4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>>>> ffff8000093a5078
>>>>> [ 4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>>>> ffff8000093fd078
>>>>> [ 4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>>>> 0000000000000000
>>>>> [ 4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>>>> ffff8000093951c0
>>>>> [ 4.872230] Call trace:
>>>>> [ 4.874686] __handle_irq_event_percpu+0xf4/0x180
>>>>> [ 4.879411] handle_irq_event+0x64/0xec
>>>>> [ 4.883264] handle_level_irq+0xc0/0x1b0
>>>>> [ 4.887202] generic_handle_irq+0x30/0x50
>>>>> [ 4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>>>>> [ 4.895780] handle_domain_irq+0x60/0x90
>>>>> [ 4.899720] gic_handle_irq+0x4c/0xd0
>>>>> [ 4.903398] call_on_irq_stack+0x20/0x4c
>>>>> [ 4.907338] do_interrupt_handler+0x54/0x60
>>>>> [ 4.911538] el1_interrupt+0x30/0x80
>>>>> [ 4.915130] el1h_64_irq_handler+0x18/0x24
>>>>> [ 4.919244] el1h_64_irq+0x78/0x7c
>>>>> [ 4.922659] arch_cpu_idle+0x18/0x2c
>>>>> [ 4.926249] do_idle+0xc4/0x150
>>>>> [ 4.929404] cpu_startup_entry+0x28/0x60
>>>>> [ 4.933343] rest_init+0xe4/0xf4
>>>>> [ 4.936584] arch_call_rest_init+0x10/0x1c
>>>>> [ 4.940699] start_kernel+0x600/0x640
>>>>> [ 4.944375] __primary_switched+0xbc/0xc4
>>>>> [ 4.948402] ---[ end trace 940193047b35b311 ]---
>>>>>
>>>>> Initially I dismissed this as a warning that would probably be
>>>>> cleaned
>>>>> up when we did more work on the TPM support for our product but we
>>>>> also
>>>>> seem to be getting some new i2c issues and possibly a kernel stack
>>>>> corruption that we've conflated with this TPM warning.
>>>> Can you reproduce this issue on mainline? Can you also bisect to find
>>>> the culprit?
>>> No the error doesn't appear on a recent mainline kernel. I do still get
>>>
>>> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
>>> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
>>> tpm tpm0: A TPM error (256) occurred attempting the self test
>>>
>>> but I think I was getting that on v5.15.110
>>>
>> I repeat: Can you bisect between v5.15 and v5.15.112?
>
> It's definitely between v5.15.110 and v5.15.112.
>
> I'll do a proper bisect next week but I'm pretty sure it's related to
> the "tpm, tpm_tis:" series. The problem can be worked around by
> removing the TPM interrupt from the device tree for the board.
Bisecting between v5.15.110 and v5.15.112 points to
51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit
commit 51162b05a44cb5d98fb0ae2519a860910a47fd4b
Author: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx>
Date: Thu Nov 24 14:55:29 2022 +0100
tpm, tpm_tis: Claim locality before writing interrupt registers
[ Upstream commit 15d7aa4e46eba87242a320f39773aa16faddadee ]
In tpm_tis_probe_single_irq() interrupt registers TPM_INT_VECTOR,
TPM_INT_STATUS and TPM_INT_ENABLE are modified to setup the interrupts.
Currently these modifications are done without holding a locality
thus they
have no effect. Fix this by claiming the (default) locality before the
registers are written.
Since now tpm_tis_gen_interrupt() is called with the locality already
claimed remove locality request and release from this function.
Signed-off-by: Lino Sanfilippo <l.sanfilippo@xxxxxxxxxx>
Tested-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
Signed-off-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
Stable-dep-of: 955df4f87760 ("tpm, tpm_tis: Claim locality when
interrupts are reenabled on resume")
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
drivers/char/tpm/tpm_tis_core.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(