Re: [PATCH] tpm_tis: Disable interrupts on ThinkPad T490s

From: Hans de Goede
Date: Mon Nov 23 2020 - 07:20:44 EST


Hi,

On 11/19/20 6:05 PM, Jerry Snitselaar wrote:
>
> Hans de Goede @ 2020-11-19 07:42 MST:
>
>> Hi,
>>
>> On 11/19/20 7:36 AM, Jerry Snitselaar wrote:
>>>
>>> Matthew Garrett @ 2020-10-15 15:39 MST:
>>>
>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@xxxxxxxxxx> wrote:
>>>>>
>>>>> There is a misconfiguration in the bios of the gpio pin used for the
>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis
>>>>> driver code this results in an interrupt storm. This was initially
>>>>> reported when we attempted to enable the interrupt code in the tpm_tis
>>>>> driver, which previously wasn't setting a flag to enable it. Due to
>>>>> the reports of the interrupt storm that code was reverted and we went back
>>>>> to polling instead of using interrupts. Now that we know the T490s problem
>>>>> is a firmware issue, add code to check if the system is a T490s and
>>>>> disable interrupts if that is the case. This will allow us to enable
>>>>> interrupts for everyone else. If the user has a fixed bios they can
>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the
>>>>> kernel command line.
>>>>
>>>> I think an implication of this is that systems haven't been
>>>> well-tested with interrupts enabled. In general when we've found a
>>>> firmware issue in one place it ends up happening elsewhere as well, so
>>>> it wouldn't surprise me if there are other machines that will also be
>>>> unhappy with interrupts enabled. Would it be possible to automatically
>>>> detect this case (eg, if we get more than a certain number of
>>>> interrupts in a certain timeframe immediately after enabling the
>>>> interrupt) and automatically fall back to polling in that case? It
>>>> would also mean that users with fixed firmware wouldn't need to pass a
>>>> parameter.
>>>
>>> I believe Matthew is correct here. I found another system today
>>> with completely different vendor for both the system and the tpm chip.
>>> In addition another Lenovo model, the L490, has the issue.
>>>
>>> This initial attempt at a solution like Matthew suggested works on
>>> the system I found today, but I imagine it is all sorts of wrong.
>>> In the 2 systems where I've seen it, there are about 100000 interrupts
>>> in around 1.5 seconds, and then the irq code shuts down the interrupt
>>> because they aren't being handled.
>>
>> Is that with your patch? The IRQ should be silenced as soon as
>> devm_free_irq(chip->dev.parent, priv->irq, chip); is called.
>>
>
> No that is just with James' patchset that enables interrupts for
> tpm_tis. It looks like the irq is firing, but the tpm's int_status
> register is clear, so the handler immediately returns IRQ_NONE. After
> that happens 100000 times the core irq code shuts down the irq, but it
> isn't released so I could still see the stats in /proc/interrupts.

I see, yes I have seen this behavior on the X1C8 with a pre-production BIOS.

> With
> my attempt below on top of that patchset it releases the irq. I had to
> stick the check prior to it checking the int_status register because it
> is cleared and the handler returns,

Ack.

> and I couldn't do the devm_free_irq
> directly in tis_int_handler, so I tried sticking it in tpm_tis_send
> where the other odd irq testing code was already located. I'm not sure
> if there is another place that would work better for calling the
> devm_free_irq.

Adding it together with the other IRQ testing related code seems fine
to me.

>> Depending on if we can get your storm-detection to work or not,
>> we might also choose to just never try to use the IRQ (at least on
>> x86 systems). AFAIK the TPM is never used for high-throughput stuff
>> so the polling overhead should not be a big deal (and I'm getting the feeling
>> that Windows always polls).
>>
>
> I was wondering about Windows as well. In addition to the Lenovo systems
> which the T490s had Nuvoton tpm, the system I found yesterday was a development
> system we have from a partner with an Infineon tpm. Dan Williams has
> seen it internally at Intel as well on some system.

Sounds to me like Windows never uses the IRQ, so we get the fun of finding
all the untested firmware bugs.

So if we cannot get this detection code to work reliable, maybe we should
just not use the IRQ at all, at least on X86 + ACPI systems?

Anyways lets try this storm-detection thing first, but I have the feeling
we should not spend too much time on this. Just outright disabling IRQ
support might be better.

REgards,

Hans