Re: [PATCH V3 04/10] virtio_pci: harden MSI-X interrupts

From: Michael S. Tsirkin
Date: Wed Mar 09 2022 - 07:13:35 EST


On Wed, Mar 09, 2022 at 11:08:41AM +0000, Marc Zyngier wrote:
> On Tue, 08 Mar 2022 16:35:52 +0000,
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > > On Tue, 19 Oct 2021 08:01:46 +0100,
> > > Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > >
> > > > We used to synchronize pending MSI-X irq handlers via
> > > > synchronize_irq(), this may not work for the untrusted device which
> > > > may keep sending interrupts after reset which may lead unexpected
> > > > results. Similarly, we should not enable MSI-X interrupt until the
> > > > device is ready. So this patch fixes those two issues by:
> > > >
> > > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > > handlers to be called after the device is reset.
> > > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > > >
> > > > This can make sure the virtio interrupt handler won't be called before
> > > > virtio_device_ready() and after reset.
> > > >
> > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > > ---
> > > > drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > > drivers/virtio/virtio_pci_common.h | 6 ++++--
> > > > drivers/virtio/virtio_pci_legacy.c | 5 +++--
> > > > drivers/virtio/virtio_pci_modern.c | 6 ++++--
> > > > 4 files changed, 32 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index b35bb2d57f62..8d8f83aca721 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > > "Force legacy mode for transitional virtio 1 devices");
> > > > #endif
> > > >
> > > > -/* wait for pending irq handlers */
> > > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > +/* disable irq handlers */
> > > > +void vp_disable_cbs(struct virtio_device *vdev)
> > > > {
> > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > int i;
> > > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> > > > synchronize_irq(vp_dev->pci_dev->irq);
> > > >
> > > > for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > - synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > + disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > > +}
> > > > +
> > > > +/* enable irq handlers */
> > > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > > +{
> > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > + int i;
> > > > +
> > > > + if (vp_dev->intx_enabled)
> > > > + return;
> > > > +
> > > > + for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > > + enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > >
> > > This results in a splat at boot time if you set maxcpus=<whatever>,
> > > see below. Enabling interrupts that are affinity managed is *bad*. You
> > > don't even know whether the CPU which is supposed to handle this is
> > > online or not.
> > >
> > > The core kernel notices it, shouts and keeps the interrupt disabled,
> > > but this should be fixed. The whole point of managed interrupts is to
> > > let them be dealt with outside of the drivers, and tied into the CPUs
> > > being brought up and down. If virtio needs (for one reason or another)
> > > to manage interrupts on its own, so be it. But this patch isn't the
> > > way to do it, I'm afraid.
> > >
> > > M.
> >
> > Thanks for reporting this!
> >
> > What virtio is doing here isn't unique however.
>
> Then it is even worse than I though. Can you point me to the other
> drivers doing such thing?


What are you asking? Whether any drivers that set PCI_IRQ_AFFINITY also
set IRQF_NO_AUTOEN? I could not find any other drivers doing that, no.
When I said "isn't unique" I rather meant that other drivers need
something like this, and they likely do it in a driver specific, complex
fashion.

just poking around at random I found
drivers/scsi/mpi3mr/mpi3mr_fw.c

which does this last thing during initialization:

void mpi3mr_ioc_enable_intr(struct mpi3mr_ioc *mrioc)
{
mrioc->intr_enabled = 1;
}

and the interrupt handler does:

static irqreturn_t mpi3mr_isr_primary(int irq, void *privdata)
{
struct mpi3mr_intr_info *intr_info = privdata;
struct mpi3mr_ioc *mrioc;
u16 midx;
u32 num_admin_replies = 0, num_op_reply = 0;

if (!intr_info)
return IRQ_NONE;

mrioc = intr_info->mrioc;

if (!mrioc->intr_enabled)
return IRQ_NONE;


which seems to be trying to accomplish exactly the same thing and might
or might not actually need WRITE_ONCE or some barriers if it were to be
made 100% foolproof.



> > If device driver is to be hardened against device sending interrupts
> > while driver is initializing/cleaning up, it needs kernel to provide
> > capability to disable these.
> >
> > If we then declare that that is impossible for managed interrupts
> > then that will mean most devices can't use managed interrupts
> > because ideally we'd have all drivers hardened.
>
> What I find odd is that you want to do the interrupt hardening in the
> individual endpoint drivers. This makes everything complicated, and
> just doesn't scale.
> The natural place for this sort of checks would be in the interrupt
> controller driver, which has all the state as its disposal, and is
> guaranteed to be able to take the right course of action if it sees
> something that contradicts its internal state tracking (affinity,
> masking, interrupt life cycle in general).

Exactly. So here's what we are trying to do: driver is initializing
both itself and the device. As part of that, it assigns some IRQs. Once that
happens, device can trigger the IRQ callback by sending the interrupt.
If this happens too soon driver is not yet fully initialized and might
access uninitialized memory (and generally get confused because the
state is inconsistent).

Getting the IRQ in a disabled state and only enabling when we are
ready sounds like a very reasonable way to go about it just from
API perspective.





>
> Because even if you were allowed to mess with the enable state, this
> doesn't give you any guarantee that the interrupt is delivered on the
> correct CPU either.

For virtio affinity is mostly an optimization, I don't think this
affects correctness.

> > Thomas I think you were the one who suggested enabling/disabling
> > interrupts originally - thoughts?
> >
> > Feedback appreciated.
>
> Feedback given.
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.