Re: [PATCH 00/15] Coalesced Interrupt Delivery with posted MSI

From: Jacob Pan
Date: Thu Apr 04 2024 - 13:33:25 EST


Hi Robert,

On Thu, 4 Apr 2024 21:45:05 +0800, Robert Hoo <robert.hoo.linux@xxxxxxxxx>
wrote:

> On 1/27/2024 7:42 AM, Jacob Pan wrote:
> > Hi Thomas and all,
> >
> > This patch set is aimed to improve IRQ throughput on Intel Xeon by
> > making use of posted interrupts.
> >
> > There is a session at LPC2023 IOMMU/VFIO/PCI MC where I have presented
> > this topic.
> >
> > https://lpc.events/event/17/sessions/172/#20231115
> >
> > Background
> > ==========
> > On modern x86 server SoCs, interrupt remapping (IR) is required and
> > turned on by default to support X2APIC. Two interrupt remapping modes
> > can be supported by IOMMU/VT-d:
> >
> > - Remappable (host)
> > - Posted (guest only so far)
> >
> > With remappable mode, the device MSI to CPU process is a HW flow
> > without system software touch points, it roughly goes as follows:
> >
> > 1. Devices issue interrupt requests with writes to 0xFEEx_xxxx
> > 2. The system agent accepts and remaps/translates the IRQ
> > 3. Upon receiving the translation response, the system agent
> > notifies the destination CPU with the translated MSI
> > 4. CPU's local APIC accepts interrupts into its IRR/ISR registers
> > 5. Interrupt delivered through IDT (MSI vector)
> >
> > The above process can be inefficient under high IRQ rates. The
> > notifications in step #3 are often unnecessary when the destination CPU
> > is already overwhelmed with handling bursts of IRQs. On some
> > architectures, such as Intel Xeon, step #3 is also expensive and
> > requires strong ordering w.r.t DMA.
>
> Can you tell more on this "step #3 requires strong ordering w.r.t. DMA"?
>
I am not sure how much micro architecture details I can disclose but the
point is that there are ordering rules related to DMA read/writes
and posted MSI writes. I am not a hardware expert.

From PCIe pov, my understanding is that the upstream writes tested here on
NVMe drives as the result of 4K random reads are relaxed ordered. I can see
lspci showing: RlxdOrd+ on my Samsung drives.

DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 4096 bytes

But MSIs are strictly ordered afaik.

> > As a result, slower
> > IRQ rates can become a limiting factor for DMA I/O performance.
> >
>
>


Thanks,

Jacob