Re: [PATCH 09/15] x86/irq: Install posted MSI notification handler

From: Jacob Pan
Date: Tue Apr 02 2024 - 22:39:39 EST


Hi Zeng,

On Fri, 29 Mar 2024 15:32:00 +0800, Zeng Guang <guang.zeng@xxxxxxxxx> wrote:

> On 1/27/2024 7:42 AM, Jacob Pan wrote:
> > @@ -353,6 +360,111 @@ void intel_posted_msi_init(void)
> > pid->nv = POSTED_MSI_NOTIFICATION_VECTOR;
> > pid->ndst = this_cpu_read(x86_cpu_to_apicid);
> > }
> > +
> > +/*
> > + * De-multiplexing posted interrupts is on the performance path, the
> > code
> > + * below is written to optimize the cache performance based on the
> > following
> > + * considerations:
> > + * 1.Posted interrupt descriptor (PID) fits in a cache line that is
> > frequently
> > + * accessed by both CPU and IOMMU.
> > + * 2.During posted MSI processing, the CPU needs to do 64-bit read and
> > xchg
> > + * for checking and clearing posted interrupt request (PIR), a 256
> > bit field
> > + * within the PID.
> > + * 3.On the other side, the IOMMU does atomic swaps of the entire PID
> > cache
> > + * line when posting interrupts and setting control bits.
> > + * 4.The CPU can access the cache line a magnitude faster than the
> > IOMMU.
> > + * 5.Each time the IOMMU does interrupt posting to the PIR will evict
> > the PID
> > + * cache line. The cache line states after each operation are as
> > follows:
> > + * CPU IOMMU PID Cache line
> > state
> > + * ---------------------------------------------------------------
> > + *...read64 exclusive
> > + *...lock xchg64 modified
> > + *... post/atomic swap invalid
> > + *...-------------------------------------------------------------
> > + *
> > + * To reduce L1 data cache miss, it is important to avoid contention
> > with
> > + * IOMMU's interrupt posting/atomic swap. Therefore, a copy of PIR is
> > used
> > + * to dispatch interrupt handlers.
> > + *
> > + * In addition, the code is trying to keep the cache line state
> > consistent
> > + * as much as possible. e.g. when making a copy and clearing the PIR
> > + * (assuming non-zero PIR bits are present in the entire PIR), it does:
> > + * read, read, read, read, xchg, xchg, xchg, xchg
> > + * instead of:
> > + * read, xchg, read, xchg, read, xchg, read, xchg
> > + */
> > +static __always_inline inline bool handle_pending_pir(u64 *pir, struct
> > pt_regs *regs) +{
> > + int i, vec = FIRST_EXTERNAL_VECTOR;
> > + unsigned long pir_copy[4];
> > + bool handled = false;
> > +
> > + for (i = 0; i < 4; i++)
> > + pir_copy[i] = pir[i];
> > +
> > + for (i = 0; i < 4; i++) {
> > + if (!pir_copy[i])
> > + continue;
> > +
> > + pir_copy[i] = arch_xchg(pir, 0);
>
> Here is a problem that pir_copy[i] will always be written as pir[0].
> This leads to handle spurious posted MSIs later.
Yes, you are right. It should be
pir_copy[i] = arch_xchg(&pir[i], 0);

Will fix in v2, really appreciated.

> > + handled = true;
> > + }
> > +
> > + if (handled) {
> > + for_each_set_bit_from(vec, pir_copy,
> > FIRST_SYSTEM_VECTOR)
> > + call_irq_handler(vec, regs);
> > + }
> > +
> > + return handled;
> > +}
> > +
> > +/*
> > + * Performance data shows that 3 is good enough to harvest 90+% of the
> > benefit
> > + * on high IRQ rate workload.
> > + */
> > +#define MAX_POSTED_MSI_COALESCING_LOOP 3
> > +
> > +/*
> > + * For MSIs that are delivered as posted interrupts, the CPU
> > notifications
> > + * can be coalesced if the MSIs arrive in high frequency bursts.
> > + */
> > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
> > +{
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > + struct pi_desc *pid;
> > + int i = 0;
> > +
> > + pid = this_cpu_ptr(&posted_interrupt_desc);
> > +
> > + inc_irq_stat(posted_msi_notification_count);
> > + irq_enter();
> > +
> > + /*
> > + * Max coalescing count includes the extra round of
> > handle_pending_pir
> > + * after clearing the outstanding notification bit. Hence, at
> > most
> > + * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
> > + */
> > + while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
> > + if (!handle_pending_pir(pid->pir64, regs))
> > + break;
> > + }
> > +
> > + /*
> > + * Clear outstanding notification bit to allow new IRQ
> > notifications,
> > + * do this last to maximize the window of interrupt coalescing.
> > + */
> > + pi_clear_on(pid);
> > +
> > + /*
> > + * There could be a race of PI notification and the clearing
> > of ON bit,
> > + * process PIR bits one last time such that handling the new
> > interrupts
> > + * are not delayed until the next IRQ.
> > + */
> > + handle_pending_pir(pid->pir64, regs);
> > +
> > + apic_eoi();
> > + irq_exit();
> > + set_irq_regs(old_regs);
> > }
> > #endif /* X86_POSTED_MSI */
> >


Thanks,

Jacob