Re: [RFC PATCH 01/15] irqchip/riscv-imsic: Use hierarchy to reach irq_set_affinity

From: Anup Patel
Date: Tue Dec 03 2024 - 22:43:50 EST


On Wed, Dec 4, 2024 at 4:29 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, Dec 03 2024 at 21:55, Thomas Gleixner wrote:
> > On Tue, Dec 03 2024 at 22:07, Anup Patel wrote:
> >> On Tue, Dec 3, 2024 at 7:23 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >>> Sorry, I missed that when reviewing the original IMSIC MSI support.
> >>>
> >>> The whole IMSIC MSI support can be moved over to MSI LIB which makes all
> >>> of this indirection go away and your intermediate domain will just fit
> >>> in.
> >>>
> >>> Uncompiled patch below. If that works, it needs to be split up properly.
> >>>
> >>> Note, this removes the setup of the irq_retrigger callback, but that's
> >>> fine because on hierarchical domains irq_chip_retrigger_hierarchy() is
> >>> invoked anyway. See try_retrigger().
> >>
> >> The IMSIC driver was merged one kernel release before common
> >> MSI LIB was merged.
> >
> > Ah indeed.
> >
> >> We should definitely update the IMSIC driver to use MSI LIB, I will
> >> try your suggested changes (below) and post a separate series.
> >
> > Pick up the delta patch I gave Andrew...
>
> As I was looking at something else MSI related I had a look at
> imsic_irq_set_affinity() again.
>
> It's actually required to have the message write in that function and
> not afterwards as you invoke imsic_vector_move() from that function.
>
> That's obviously not true for the remap case as that will not change the
> message address/data pair because the remap table entry is immutable -
> at least I assume so for my mental sanity sake :)
>
> But that brings me to a related question. How is this supposed to work
> with non-atomic message updates? PCI/MSI does not necessarily provide
> masking, and the write of the address/data pair is done in bits and
> pieces. So you can end up with an intermediate state seen by the device
> which ends up somewhere in interrupt nirvana space.
>
> See the dance in msi_set_affinity() and commit 6f1a4891a592
> ("x86/apic/msi: Plug non-maskable MSI affinity race") for further
> explanation.
>
> The way how the IMSIC driver works seems to be pretty much the same as
> the x86 APIC mess:
>
> @address is the physical address of the per CPU MSI target
> address and @data is the vector ID on that CPU.
>
> So the non-atomic update in case of non-maskable MSI suffers from the
> same problem. It works most of the time, but if it doesn't you might
> stare at the occasionally lost interrupt and the stale device in
> disbelief for quite a while :)

Yes, we have the same challenges as x86 APIC when changing
MSI affinity.

>
> I might be missing something which magically prevent that though :)
>

Your understanding is correct. In fact, the IMSIC msi_set_affinity()
handling is inspired from x86 APIC approach due to similarity in
the overall MSI controller.

The high-level idea of imsic_irq_set_affinity() is as follows:

1) Allocate new_vector (new CPU IMSIC address + new ID on that CPU)

2) Update the MSI address and data programmed in the device
based on new_vector (see imsic_msi_update_msg())

3) At this point the device points to the new_vector but old_vector
(old CPU IMSIC address + old ID on that CPU) is still enabled and
we might have received MSI on old_vector while we were busy
setting up a new_vector for the device. To address this, we call
imsic_vector_move().

4) The imsic_vector_move() marks the old_vector as being
moved and schedules a lazy timer on the old CPU.

5) The lazy timer expires on the old CPU and results in
__imsic_local_sync() being called on the old CPU.

6) If there was a pending MSI on the old vector then the
__imsic_local_sync() function injects an MSI to the
new_vector using an MMIO write.

It is very unlikely that an MSI from device will be dropped
(unless I am missing something) but the unsolved issue
is that handling of in-flight MSI received on the old_vector
during the MSI re-programming is delayed which may have
side effects on the device driver side.

I believe in the future RISC-V AIA v2.0 (whenever that
happens) will address the gaps in AIA v1.0 (like this one).

Regards,
Anup