Re: [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain

From: Thomas Gleixner
Date: Thu Jul 23 2020 - 20:36:32 EST


Jason Gunthorpe <jgg@xxxxxxxxxxxx> writes:
> On Thu, Jul 23, 2020 at 09:51:52AM +0100, Marc Zyngier wrote:
>> > IIRC on Intel/AMD at least once a MSI is launched it is not maskable.
>>
>> Really? So you can't shut a device with a screaming interrupt,
>> for example, should it become otherwise unresponsive?
>
> Well, it used to be like that in the APICv1 days. I suppose modern
> interrupt remapping probably changes things.

The MSI side of affairs has nothing to do with Intel and neither with
ACPIv1. It's a trainwreck on the PCI side.

MSI interrupts do not have mandatory masking. For those which do not
implement it (and that's still the case with devices designed today
especially CPU internal peripherals) there are only a few options to
shut them up:

1) Disable MSI which has the problem that the interrupt gets
redirected to legacy PCI #A-#D interrupt unless the hardware
supports to disable that redirection, which is another optional
thing and hopeless case

2) Disable it at the IRQ remapping level which fortunately allows by
design to do so.

3) Disable it at the device level which is feasible for a device
driver but impossible for the irq side

>> > So the model for MSI is always "mask at source". The closest mapping
>> > to the Linux IRQ model is to say the end device has a irqchip that
>> > encapsulates the ability of the device to generate the MSI in the
>> > first place.
>>
>> This is an x86'ism, I'm afraid. Systems I deal with can mask any
>> interrupt at the interrupt controller level, MSI or not.

Yes, it's a pain, but reality.

> Sure. However it feels like a bad practice to leave the source
> unmasked and potentially continuing to generate messages if the
> intention was to disable the IRQ that was assigned to it - even if the
> messages do not result in CPU interrupts they will still consume
> system resources.

See above. You cannot reach out to the device driver to disable the
underlying interrupt source, which is the ultimate ratio if #1 or #2 are
not working or not there. That would be squaring the circle and
violating all rules of layering and locking at once.

The bad news is that we can't change the hardware. We have to deal with
it. And yes, I told HW people publicly and in private conversations that
unmaskable interrupts are broken by definition for more than a
decade. They still get designed that way ...

>> If masking at the source is the only way to shut the device up,
>> and assuming that the device provides the expected semantics
>> (a MSI raised by the device while the interrupt is masked
>> isn't lost and gets sent when unmasked), that's fair enough.
>> It's just ugly.
>
> It makes sense that the masking should follow the same semantics for
> PCI MSI masking.

Which semantics? The horrors of MSI or the halfways reasonable MSI-X
variant?

Thanks,

tglx