Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

From: Alex Williamson
Date: Fri May 06 2016 - 12:55:28 EST


On Fri, 6 May 2016 16:35:38 +1000
Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:

> On 05/06/2016 01:05 AM, Alex Williamson wrote:
> > On Thu, 5 May 2016 12:15:46 +0000
> > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >
> >>> From: Yongji Xie [mailto:xyjxie@xxxxxxxxxxxxxxxxxx]
> >>> Sent: Thursday, May 05, 2016 7:43 PM
> >>>
> >>> Hi David and Kevin,
> >>>
> >>> On 2016/5/5 17:54, David Laight wrote:
> >>>
> >>>> From: Tian, Kevin
> >>>>> Sent: 05 May 2016 10:37
> >>>> ...
> >>>>>> Acutually, we are not aimed at accessing MSI-X table from
> >>>>>> guest. So I think it's safe to passthrough MSI-X table if we
> >>>>>> can make sure guest kernel would not touch MSI-X table in
> >>>>>> normal code path such as para-virtualized guest kernel on PPC64.
> >>>>>>
> >>>>> Then how do you prevent malicious guest kernel accessing it?
> >>>> Or a malicious guest driver for an ethernet card setting up
> >>>> the receive buffer ring to contain a single word entry that
> >>>> contains the address associated with an MSI-X interrupt and
> >>>> then using a loopback mode to cause a specific packet be
> >>>> received that writes the required word through that address.
> >>>>
> >>>> Remember the PCIe cycle for an interrupt is a normal memory write
> >>>> cycle.
> >>>>
> >>>> David
> >>>>
> >>>
> >>> If we have enough permission to load a malicious driver or
> >>> kernel, we can easily break the guest without exposed
> >>> MSI-X table.
> >>>
> >>> I think it should be safe to expose MSI-X table if we can
> >>> make sure that malicious guest driver/kernel can't use
> >>> the MSI-X table to break other guest or host. The
> >>> capability of IRQ remapping could provide this
> >>> kind of protection.
> >>>
> >>
> >> With IRQ remapping it doesn't mean you can pass through MSI-X
> >> structure to guest. I know actual IRQ remapping might be platform
> >> specific, but at least for Intel VT-d specification, MSI-X entry must
> >> be configured with a remappable format by host kernel which
> >> contains an index into IRQ remapping table. The index will find a
> >> IRQ remapping entry which controls interrupt routing for a specific
> >> device. If you allow a malicious program random index into MSI-X
> >> entry of assigned device, the hole is obvious...
> >>
> >> Above might make sense only for a IRQ remapping implementation
> >> which doesn't rely on extended MSI-X format (e.g. simply based on
> >> BDF). If that's the case for PPC, then you should build MSI-X
> >> passthrough based on this fact instead of general IRQ remapping
> >> enabled or not.
> >
> > I don't think anyone is expecting that we can expose the MSI-X vector
> > table to the guest and the guest can make direct use of it. The end
> > goal here is that the guest on a power system is already
> > paravirtualized to not program the device MSI-X by directly writing to
> > the MSI-X vector table. They have hypercalls for this since they
> > always run virtualized. Therefore a) they never intend to touch the
> > MSI-X vector table and b) they have sufficient isolation that a guest
> > can only hurt itself by doing so.
> >
> > On x86 we don't have a), our method of programming the MSI-X vector
> > table is to directly write to it. Therefore we will always require QEMU
> > to place a MemoryRegion over the vector table to intercept those
> > accesses. However with interrupt remapping, we do have b) on x86, which
> > means that we don't need to be so strict in disallowing user accesses
> > to the MSI-X vector table. It's not useful for configuring MSI-X on
> > the device, but the user should only be able to hurt themselves by
> > writing it directly. x86 doesn't really get anything out of this
> > change, but it helps this special case on power pretty significantly
> > aiui. Thanks,
>
> Excellent short overview, saved :)
>
> How do we proceed with these patches? Nobody seems objecting them but also
> nobody seems taking them either...

Well, this series is still based on some non-upstream patches, so...
Once that dependency is resolved this series should probably be split
into functional areas for acceptance by the appropriate subsystem
maintainers.