Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities

From: Marc Zyngier
Date: Tue Mar 15 2022 - 05:47:36 EST


On Mon, 14 Mar 2022 19:03:49 +0000,
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 14 2022 at 16:00, Marc Zyngier wrote:
> > On Mon, 14 Mar 2022 15:27:10 +0000,
> > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >>
> >> On Mon, Mar 07 2022 at 19:06, Marc Zyngier wrote:
> >> > When booting with maxcpus=<small number>, interrupt controllers
> >> > such as the GICv3 ITS may not be able to satisfy the affinity of
> >> > some managed interrupts, as some of the HW resources are simply
> >> > not available.
> >>
> >> This is also true if you have offlined lots of CPUs, right?
> >
> > Not quite. If you offline the CPUs, the interrupts will be placed in
> > the shutdown state as expected, having initially transitioned via an
> > activation state with an online CPU. The issue here is with the
> > initial activation of the interrupt, which currently happens even if
> > no matching CPU is present.
>
> Yes. But if you load the driver _after_ offlining lots of CPUs first
> then the same thing should happen, right?

Ah! yes, that's the exact same problem (modular drivers? that's an
idea that will never catch on...).

>
> >> > + /*
> >> > + * If the interrupt is managed but no CPU is available
> >> > + * to service it, shut it down until better times.
> >> > + */
> >> > + if ((vflags & VIRQ_ACTIVATE) &&
> >> > + irqd_affinity_is_managed(irqd) &&
> >> > + !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> >> > + cpu_online_mask)) {
> >> > + irqd_set_managed_shutdown(irqd);
> >>
> >> Hrm. Why is this in the !CAN_RESERVE path and not before the actual
> >> activation call?
> >
> > VIRQ_CAN_RESERVE can only happen as a consequence of
> > GENERIC_IRQ_RESERVATION_MODE, which only exists on x86. Given that x86
> > is already super careful not to activate an interrupt that is not
> > immediately required, I though we could avoid putting this check on
> > that path.
> >
> > But if I got the above wrong (which is, let's face it, extremely
> > likely), I'm happy to kick it down the road next to the activation
> > call.
>
> I just rechecked. Yes, we could push it there, but actually on x86 the
> reservation mode activation sets the entry to a spurious catch all on an
> online CPU, which is intentional.
>
> So yes, we can keep it where it is now, but that needs a comment.

Yup, I'll add that.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.