Re: [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm

From: Lino Sanfilippo
Date: Mon May 29 2023 - 09:17:07 EST


Hi Péter,

On 29.05.23 at 08:46, Péter Ujfalusi wrote:

>> My hope was that it could also avoid some of (existing or future) DMI entries. But even if it does not
>> (e.g. the problem Péter Ujfalusi reported with the UPX-i11 cannot be fixed by this patch set and thus
>> needs the DMI quirk) we may at least avoid more bug reports due to interrupt storms once
>> 6.4 is released.
>
> I'm surprised that there is a need for a storm detection in the first
> place... Do we have something else on the same IRQ line on the affected
> devices which might have a bug or no driver at all?
> It is hard to believe that a TPM (Trusted Platform Module) is integrated
> so poorly ;)
>
> But put that aside: I think the storm detection is good given that there
> is no other way to know which machine have sloppy TPM integration.
> There are machines where this happens, so it is a know integration
> issue, right?
>> My only 'nitpick' is with the printk level to be used.
> The ERR level is not correct as we know the issue and we handle it, so
> all is under control.
> If we want to add these machines to the quirk list then WARN is a good
> level to gain attention but I'm not sure if a user will know how to get
> the machine in the quirk (where to file a bug).
> If we only want the quirk to be used for machines like UPX-i11 which
> simply just have broken (likely floating) IRQ line then the WARN is too
> high level, INFO or even DBG would be appropriate as you are not going
> to update the quirk, it is just handled under the hood (which is a great
> thing, but on the other hand you will have the storm never the less and
> that is not a nice thing).
>
> It is a matter on how this is going to be handled in a long term. Add
> quirk for all the known machines with either stormy or plain broken IRQ
> line or handle the stormy ones and quirk the broken ones only.
>

Even in the long run I would always prefer a generic solution whenever it
is possible. Quirks are fine for issues that cant be solved in another way
or really require individual treatment.
While I agree that ERR is not a good choice for the "falling back to polling"
message I do not have a strong opinion on whether WARN, NOTICE or INFO is more
appropriate. Jarko seems to prefer WARN.


>>>> Detect an interrupt storm by counting the number of unhandled interrupts
>>>> within a 10 ms time interval. In case that more than 1000 were unhandled
>>>> deactivate interrupts, deregister the handler and fall back to polling.
>>>
>>> I know it can be sometimes hard to evaluate but can you try to explain
>>> how you came up to the 10 ms sampling period and 1000 interrupt
>>> threshold? I just don't like abritrary numbers.
>>
>> At least the 100 ms is not plucked out of thin air but its the same time period
>> that the generic code in note_interrupt() uses - I assume for a good reason.
>> Not only this number but the whole irq storm detection logic is taken from
>> there:
>>
>>>
>>>> This equals the implementation that handles interrupt storms in
>>>> note_interrupt() by means of timestamps and counters in struct irq_desc.
>>> The number of 1000 unhandled interrupts is still far below the 99900
> used in
>> note_interrupt() but IMHO enough to indicate that there is something seriously
>> wrong with interrupt processing and it is probably saver to fall back to polling.
>
> Except that if the line got the spurious designation in core, the
> interrupt line will be disabled while the TPM driver will think that it
> is still using IRQ mode and will not switch to polling.

In the case that an interrupt storm cant be detected (since there might not even
be one) I am fine with adding a quirk.

>
> A storm of 1000 is better than a storm of 99900 for sure but quirking
> these would be the desired final solution. imho
>
As I wrote above I prefer a generic solution whenever possible.

> There are many buts around this ;)
>


Regards,
Lino