RE: [PATCH] kvm: Enable device LTR/OBFF capibility before doingguest device assignment

From: Alex Williamson
Date: Tue May 08 2012 - 22:34:03 EST


On Wed, 2012-05-09 at 01:18 +0000, Hao, Xudong wrote:
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > Sent: Tuesday, May 08, 2012 11:18 PM
> > To: Hao, Xudong
> > Cc: Avi Kivity; Xudong Hao; mtosatti@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; Zhang, Xiantao
> > Subject: RE: [PATCH] kvm: Enable device LTR/OBFF capibility before doing guest
> > device assignment
> >
> > On Tue, 2012-05-08 at 09:16 +0000, Hao, Xudong wrote:
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > > Sent: Tuesday, May 08, 2012 12:16 AM
> > > > To: Hao, Xudong
> > > > Cc: Avi Kivity; Xudong Hao; mtosatti@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> > > > linux-kernel@xxxxxxxxxxxxxxx; Zhang, Xiantao
> > > > Subject: RE: [PATCH] kvm: Enable device LTR/OBFF capibility before doing
> > guest
> > > > device assignment
> > > >
> > > > On Mon, 2012-05-07 at 07:58 +0000, Hao, Xudong wrote:
> > > > > > -----Original Message-----
> > > > > > From: Avi Kivity [mailto:avi@xxxxxxxxxx]
> > > > > > Sent: Sunday, May 06, 2012 11:34 PM
> > > > > > To: Xudong Hao
> > > > > > Cc: mtosatti@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> > > > linux-kernel@xxxxxxxxxxxxxxx;
> > > > > > Zhang, Xiantao; Hao, Xudong; Alex Williamson
> > > > > > Subject: Re: [PATCH] kvm: Enable device LTR/OBFF capibility before doing
> > > > guest
> > > > > > device assignment
> > > > > >
> > > > > > On 05/06/2012 06:24 PM, Xudong Hao wrote:
> > > > > > > Enable device LTR/OBFF capibility before do device assignment, so that
> > > > guest
> > > > > > can benefit from them.
> > > > > >
> > > > > > cc += Alex
> > > > > >
> > > > > > > @@ -166,6 +166,10 @@ int kvm_assign_device(struct kvm *kvm,
> > > > > > > if (pdev == NULL)
> > > > > > > return -ENODEV;
> > > > > > >
> > > > > > > + /* Enable some device capibility before do device assignment,
> > > > > > > + * so that guest can benefit from them.
> > > > > > > + */
> > > > > > > + kvm_iommu_enable_dev_caps(pdev);
> > > > > > > r = iommu_attach_device(domain, &pdev->dev);
> > > > > >
> > > > > > Suppose we fail here. Do we need to disable_dev_caps()?
> > > > > >
> > > >
> > > > If kvm_assign_device() fails we'll try to restore the state we saved in
> > > > kvm_vm_ioctl_assign_device(), so ltr/obff should be brought back to
> > > > initial state.
> > > >
> > > Right, more clear.
> > >
> > > > > I don't think so. When a device will be assigned to guest, it's be
> > > > > owned by a pci-stub driver, attach_device fail here do not affect
> > > > > everything. If host want to use it, host device driver has its own
> > > > > enable/disable dev_caps.
> > > >
> > > > Why is device assignment unique here? If there's a default value that's
> > > > known to be safe, why doesn't pci_enable_device set it for everyone?
> > > > Host drivers can fine tune the value later if they want.
> > > >
>
> If host did not have this device driver or host did not load the
> driver, who will enable them? Guest? But in guest, it really need qemu
> PCIe support.

The kvm driver does pci_enable_device(), just like any other PCI driver.
If there's a safe default value, why isn't it set there?

> > > > > > > if (r) {
> > > > > > > printk(KERN_ERR "assign device %x:%x:%x.%x failed",
> > > > > > > @@ -228,6 +232,7 @@ int kvm_deassign_device(struct kvm *kvm,
> > > > > > > PCI_SLOT(assigned_dev->host_devfn),
> > > > > > > PCI_FUNC(assigned_dev->host_devfn));
> > > > > > >
> > > > > > > + kvm_iommu_disable_dev_caps(pdev);
> > > > > > > return 0;
> > > > > > > }
> > > > > > >
> > > > > > > @@ -351,3 +356,30 @@ int kvm_iommu_unmap_guest(struct kvm
> > *kvm)
> > > > > > > iommu_domain_free(domain);
> > > > > > > return 0;
> > > > > > > }
> > > > > > > +
> > > > > > > +static void kvm_iommu_enable_dev_caps(struct pci_dev *pdev)
> > > > > > > +{
> > > > > > > + /* set default value */
> > > > > > > + unsigned long type = PCI_EXP_OBFF_SIGNAL_ALWAYS;
> > > > > > > + int snoop_lat_ns = 1024, nosnoop_lat_ns = 1024;
> > > > > >
> > > > > > Where does this magic number come from?
> > > > > >
> > > > > The number is the max value that register support, set it as default
> > > > > here, we did not have any device here, and we do not know what's the
> > > > > proper value, so it set a default value firstly.
> > > >
> > > > The register is composed of latency scale and latency value fields.
> > > > 1024 is simply the largest value the latency value can hold (+1). The
> > > > scale field allows latencies up to 34,326,183,936ns to be specified, so
> > > > please explain how 1024 is a universal default.
> > > >
> > >
> > > Since each platform will have its own max supported latency, I think
> > > the best way is setting the value to 0 because we have such a device
> > > now.
> >
> > What's the benefit to that device vs the risk to other devices?
>
> Default value 0 does not affect any device, right?

I don't know, but if it doesn't affect any device, why bother? On the
risk side we have to question whether the device ltr/obff support works,
whether the values we use are appropriate, whether the upstream
interconnects support the same, and work, and whether the guest driver
will behave appropriately with these enabled versus a gain of what? So
far it looks like we're turning it on simply because it's there.

> > Again,
> > if there's a safe default value for both LTR and OBFF, why isn't PCI
> > core setting it for everyone? I'm inclined to wait for qemu express
> > support and expose LTR/OBFF control to the guest if and only if we can
> > enable it on the root complex and intermediate switches. Thanks,
> >
>
> Alex, do you means you're working on the qemu express support and
> ltr/obff exposing? If so, when will this support finish?

Qemu express support is being worked on by the community, once
available, it may be simple to expose these register if we determine
it's safe for the guest to manipulate them. I think there's a goal of
incorporating express support by qemu 1.2, but I'm not driving it.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/