Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug

From: Raj, Ashok
Date: Mon May 11 2020 - 15:03:54 EST


Hi Thomas,

On Fri, May 08, 2020 at 06:49:15PM +0200, Thomas Gleixner wrote:
> Ashok,
>
> "Raj, Ashok" <ashok.raj@xxxxxxxxx> writes:
> > With legacy MSI we can have these races and kernel is trying to do the
> > song and dance, but we see this happening even when IR is turned on.
> > Which is perplexing. I think when we have IR, once we do the change vector
> > and flush the interrupt entry cache, if there was an outstandng one in
> > flight it should be in IRR. Possibly should be clearned up by the
> > send_cleanup_vector() i suppose.
>
> Ouch. With IR this really should never happen and yes the old vector
> will catch one which was raised just before the migration disabled the
> IR entry. During the change nothing can go wrong because the entry is
> disabled and only reenabled after it's flushed which will send a pending
> one to the new vector.

with IR, I'm not sure if we actually mask the interrupt except when
its a Posted Interrupt.

We do an atomic update to IRTE, with cmpxchg_double

ret = cmpxchg_double(&irte->low, &irte->high,
irte->low, irte->high,
irte_modified->low, irte_modified->high);

followed by flushing the interrupt entry cache. After which any
old ones in flight before the flush should be sittig in IRR
on the outgoing cpu.

The send_cleanup_vector() sends IPI to the apic_id->old_cpu which
would be the cpu we are running on correct? and this is a self_ipi
to IRQ_MOVE_CLEANUP_VECTOR.

smp_irq_move_cleanup_interrupt() seems to check IRR with
apicid_prev_vector()

irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
if (irr & (1U << (vector % 32))) {
apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
continue;
}

And this would allow any pending IRR bits in the outgoing CPU to
call the relevant ISR's before draining all vectors on the outgoing
CPU.

Does it sound right?

I couldn't quite pin down how the device ISR's are hooked up through
this send_cleanup_vector() and what follows.