Re: [PATCH] xen: do not re-use pirq number cached in pci device msi msg data
From: Boris Ostrovsky
Date: Mon Jan 09 2017 - 10:02:28 EST
On 01/06/2017 08:06 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 05, 2017 at 02:28:56PM -0500, Dan Streetman wrote:
>> Do not read a pci device's msi message data to see if a pirq was
>> previously configured for the device's msi/msix, as the old pirq was
>> unmapped and may now be in use by another pci device. The previous
>> pirq should never be re-used; instead a new pirq should always be
>> allocated from the hypervisor.
> Won't this cause a starvation problem? That is we will run out of PIRQs
> as we are not reusing them?
Don't we free the pirq when we unmap it?
-boris
>> The xen_hvm_setup_msi_irqs() function currently checks the pci device's
>> msi descriptor message data for each msi/msix vector it sets up, and if
>> it finds the vector was previously configured with a pirq, and that pirq
>> is mapped to an irq, it re-uses the pirq instead of requesting a new pirq
>> from the hypervisor. However, that pirq was unmapped when the pci device
>> disabled its msi/msix, and it cannot be re-used; it may have been given
>> to a different pci device.
> Hm, but this implies that we do keep track of it.
>
>
> while (true)
> do
> rmmod nvme
> modprobe nvme
> done
>
> Things go boom without this patch. But with this patch does this
> still work? As in we don't run out of PIRQs?
>
> Thanks.
>> This exact situation is happening in a Xen guest where multiple NVMe
>> controllers (pci devices) are present. The NVMe driver configures each
>> pci device's msi/msix twice; first to configure a single vector (to
>> talk to the controller for its configuration info), and then it disables
>> that msi/msix and re-configures with all the msi/msix it needs. When
>> multiple NVMe controllers are present, this happens concurrently on all
>> of them, and in the time between controller A calling pci_disable_msix()
>> and then calling pci_enable_msix_range(), controller B enables its msix
>> and gets controller A's pirq allocated from the hypervisor. Then when
>> controller A re-configures its msix, its first vector tries to re-use
>> the same pirq that it had before; but that pirq was allocated to
>> controller B, and thus the Xen event channel for controller A's re-used
>> pirq fails to map its irq to that pirq; the hypervisor already has the
>> pirq mapped elsewhere.
>>
>> Signed-off-by: Dan Streetman <dan.streetman@xxxxxxxxxxxxx>
>> ---
>> arch/x86/pci/xen.c | 23 +++++++----------------
>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index bedfab9..a00a6c0 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -234,23 +234,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>> return 1;
>>
>> for_each_pci_msi_entry(msidesc, dev) {
>> - __pci_read_msi_msg(msidesc, &msg);
>> - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> - if (msg.data != XEN_PIRQ_MSI_DATA ||
>> - xen_irq_from_pirq(pirq) < 0) {
>> - pirq = xen_allocate_pirq_msi(dev, msidesc);
>> - if (pirq < 0) {
>> - irq = -ENODEV;
>> - goto error;
>> - }
>> - xen_msi_compose_msg(dev, pirq, &msg);
>> - __pci_write_msi_msg(msidesc, &msg);
>> - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>> - } else {
>> - dev_dbg(&dev->dev,
>> - "xen: msi already bound to pirq=%d\n", pirq);
>> + pirq = xen_allocate_pirq_msi(dev, msidesc);
>> + if (pirq < 0) {
>> + irq = -ENODEV;
>> + goto error;
>> }
>> + xen_msi_compose_msg(dev, pirq, &msg);
>> + __pci_write_msi_msg(msidesc, &msg);
>> + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>> irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>> (type == PCI_CAP_ID_MSI) ? nvec : 1,
>> (type == PCI_CAP_ID_MSIX) ?
>> --
>> 2.9.3
>>