Re: [PATCH 1/7] PCI: Make sriov work with hotplug remove

From: Yinghai Lu
Date: Mon Jan 23 2012 - 14:36:21 EST


On Mon, Jan 23, 2012 at 10:45 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jan 23, 2012 at 10:30 AM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> in list_for_each_safe()
>>
>> #define list_for_each_safe(pos, n, head) \
>>        for (pos = (head)->next, n = pos->next; pos != (head); \
>>                pos = n, n = pos->next)
>>
>> n is saved before, and safe only mean pos could be freed from the
>> list, but n still can be used for next loop.
>>
>> in our case, the list have PF and several VFs, when
>> pci_stop_bus_device() is called for PF, pos are still valid, but
>> VFs are removed from the list. so n will not be valid.
>
> That still doesn't explain anything.
>
> The whole point of the safe version is that if the entry that is being
> worked on is removed, then we can still use the next one.
>
> Why isn't this magically true in this case? If some *other* random
> entry than the one that is being iterated over can magically be
> removed, then the whole thing is just pure and utter crap, and no
> amount of list maintenance can ever fix it.
>
> So explain.

Now current design with SRIOV is:
when driver for PF is loaded, it will enable sriov on PF, and VFs are
created and added to
bus->devices list.

when driver for PF is removed in pci_stop_bus_device(), it will
disable sriov on PF, and VFs are removed for the list.

>
>> https://lkml.org/lkml/2011/10/15/141
>> or from attached one.
>
> This still doesn't make sense. Why do that stupid allocation? Why not
> just move the entry? Why doesn't just the sane approach work?
>
> Exactly why does not the obvious
>
>    /* stop bus device for phys_fn at first */
>    list_for_each_safe(l, n, &bus->devices) {
>        struct pci_dev *dev = pci_dev_b(l);
>        if (!dev->is_virtfn)
>            pci_stop_bus_device(dev);
>    }
>
> work? What the f*^& does that pci_stop_bus_device() function do to
> that bus->devices list that isn't just a "list_del()" of that single
> entry? And if it does anything else, it should damn well stop doing
> that.

it does not work. because pci_stop_bus_device for PF will remove VF,
and n is not
valid.

>
> The *exact* same loop is then apparently working for the virtual
> device case, so what the hell is going on that it wouldn't work for
> the physical one?
>
> What's the confusion? Why all the crazy idiotic code that makes no sense?

Maybe we can put VF and PF in bus->devices like:
VF come first than PF?

something like:

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 0321fa3..3f63b55 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -116,11 +116,11 @@ static int virtfn_add(struct pci_dev *dev, int
id, int reset)
if (reset)
__pci_reset_function(virtfn);

+ virtfn->is_virtfn = 1;
pci_device_add(virtfn, virtfn->bus);
mutex_unlock(&iov->dev->sriov->lock);

virtfn->physfn = pci_dev_get(dev);
- virtfn->is_virtfn = 1;

rc = pci_bus_add_device(virtfn);
if (rc)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 7cc9e2f..e43d81c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1218,7 +1218,10 @@ void pci_device_add(struct pci_dev *dev, struct
pci_bus *bus)
* and the bus list for fixup functions, etc.
*/
down_write(&pci_bus_sem);
- list_add_tail(&dev->bus_list, &bus->devices);
+ if (dev->is_virtfn)
+ list_add(&dev->bus_list, &bus->devices);
+ else
+ list_add_tail(&dev->bus_list, &bus->devices);
up_write(&pci_bus_sem);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/