Re: [PATCH] sysfs: add per pci device msi[x] irq listing (v3)

From: Konrad Rzeszutek Wilk
Date: Thu Sep 22 2011 - 06:49:23 EST


On Mon, Sep 19, 2011 at 11:47:15AM -0400, Neil Horman wrote:
> So a while back, I wanted to provide a way for irqbalance (and other apps) to
> definitively map irqs to devices, which, for msi[x] irqs is currently not really
> possible in user space. My first attempt wen't not so well:
> https://lkml.org/lkml/2011/4/21/308
>
> It was plauged by the same issues that prior attempts were, namely that it
> violated the one-file-one-value sysfs rule. I wandered off but have recently
> come back to this. I've got a new implementation here that exports a new
> subdirectory for every pci device, called msi_irqs. This subdirectory contanis
> a variable number of numbered subdirectories, in which the number represents an
> msi irq. Each numbered subdirectory contains attributes for that irq, which
> currently is only the mode it is operating in (msi vs. msix). I think fits
> within the constraints sysfs requires, and will allow irqbalance to properly map
> msi irqs to devices without having to rely on rickety, best guess methods like
> interface name matching.

Are there irqbalance patches that correspond to this? Where would they be available?

>
> Change Notes:
>
> (v2)
> Fixed up Documentation to put new sysfs interface descriptions in the right
> place, as per request by Greg K-H
>
> Fixed up oops that resulted from removing pci device. Not 100% sure I did this
> exactly right, but looking at the crash (triggered by echo 1 >
> /sys/class/net/eth0/device/remove), it looked as though we were freeing the
> pci_dev struct prior to all sysfs objects releasing their use of the device. AS
> such it seemed most appropriate to hold references on the pci_dev for each msi
> irq sysfs object that we create, and release them on free accordingly. With
> this change in place, I can remove, and add (via rescan) msi enabled devices
> ad-nauseum without a panic. Again thanks to Greg K-H
>
> (v3)
> As per Gregs suggestion, I looked further and noted that in fact, yes, it wasn't
> producing any errors on remove, but only because I had a refcounting problem,
> and my new sysfs objects were left orphaned with a dangling refcount. I've
> fixed that, added a release method to the new ktype, which now drops the
> reference I hold on the pci_dev for us and I've validated that all objects I've
> created, along with the parent directory and pci device are cleaned up and freed
> by enabling the kobject dyanic_debug set and observing the appropriate release
> calls. I can provide the logs if anyone wants to review them specifically.
>
> Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
> CC: Greg Kroah-Hartman <gregkh@xxxxxxx>
> CC: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx>
> CC: linux-pci@xxxxxxxxxxxxxxx
> ---
> Documentation/ABI/testing/sysfs-bus-pci | 18 +++++
> drivers/pci/msi.c | 111 +++++++++++++++++++++++++++++++
> include/linux/msi.h | 3 +
> include/linux/pci.h | 1 +
> 4 files changed, 133 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index 349ecf2..699da99 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -66,6 +66,24 @@ Description:
> re-discover previously removed devices.
> Depends on CONFIG_HOTPLUG.
>
> +What: /sys/bus/pci/devices/.../msi_irqs/
> +Date: September, 2011
> +Contact: Neil Horman <nhorman@xxxxxxxxxxxxx>
> +Description:
> + The /sys/devices/.../msi_irqs directory contains a variable set
> + subdirectories, with each subdirectory being named after a
> + corresponding msi irq vector allocated to that device. Each
> + numbered subdirectory N contains attributes of that irq.
> + Note that this directory is not created for device drivers which
> + do not support msi irqs
> +
> +What: /sys/bus/pci/devices/.../msi_irqs/<N>/mode
> +Date: September 2011
> +Contact: Neil Horman <nhorman@xxxxxxxxxxxxx>
> +Description:
> + This attribute indicates the mode that the irq vecotor named by

vector
> + the parent directory is in (msi vs. msix)
> +
> What: /sys/bus/pci/devices/.../remove
> Date: January 2009
> Contact: Linux PCI developers <linux-pci@xxxxxxxxxxxxxxx>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 2f10328..73613e2 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -322,6 +322,8 @@ static void free_msi_irqs(struct pci_dev *dev)
> if (list_is_last(&entry->list, &dev->msi_list))
> iounmap(entry->mask_base);
> }
> + kobject_del(&entry->kobj);
> + kobject_put(&entry->kobj);
> list_del(&entry->list);
> kfree(entry);
> }
> @@ -402,6 +404,98 @@ void pci_restore_msi_state(struct pci_dev *dev)
> }
> EXPORT_SYMBOL_GPL(pci_restore_msi_state);
>
> +
> +#define to_msi_attr(obj) container_of(obj, struct msi_attribute, attr)
> +#define to_msi_desc(obj) container_of(obj, struct msi_desc, kobj)
> +
> +struct msi_attribute {
> + struct attribute attr;
> + ssize_t (*show)(struct msi_desc *entry, struct msi_attribute *attr,
> + char *buf);
> + ssize_t (*store)(struct msi_desc *entry, struct msi_attribute *attr,
> + const char *buf, size_t count);
> +};
> +
> +static ssize_t show_msi_mode(struct msi_desc *entry, struct msi_attribute *atr,
> + char *buf)
> +{
> + return sprintf(buf, "%s\n", entry->msi_attrib.is_msix ? "msix" : "msi");
> +}
> +
> +static ssize_t msi_irq_attr_show(struct kobject *kobj,
> + struct attribute *attr, char *buf)
> +{
> + struct msi_attribute *attribute = to_msi_attr(attr);
> + struct msi_desc *entry = to_msi_desc(kobj);
> +
> + if (!attribute->show)
> + return -EIO;
> +
> + return attribute->show(entry, attribute, buf);
> +}
> +
> +static const struct sysfs_ops msi_irq_sysfs_ops = {
> + .show = msi_irq_attr_show,
> +};
> +
> +static struct msi_attribute mode_attribute =
> + __ATTR(mode, S_IRUGO, show_msi_mode, NULL);
> +
> +
> +struct attribute *msi_irq_default_attrs[] = {
> + &mode_attribute.attr,
> + NULL
> +};
> +
> +void msi_kobj_release(struct kobject *kobj)
> +{
> + struct msi_desc *entry = to_msi_desc(kobj);
> +
> + pci_dev_put(entry->dev);
> +}
> +
> +static struct kobj_type msi_irq_ktype = {
> + .release = msi_kobj_release,
> + .sysfs_ops = &msi_irq_sysfs_ops,
> + .default_attrs = msi_irq_default_attrs,
> +};
> +
> +static int populate_msi_sysfs(struct pci_dev *pdev)

So, are there any cases where CONFIG_SYSFS is turned off and
CONFIG_MSI is set? Should there be some #ifdef CONFIG_SYSFS
magic tricks?

> +{
> + struct msi_desc *entry;
> + struct kobject *kobj;
> + int ret;
> + int count = 0;
> +
> + pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, &pdev->dev.kobj);
> + if (!pdev->msi_kset)
> + return -ENOMEM;
> +
> + list_for_each_entry(entry, &pdev->msi_list, list) {
> + kobj = &entry->kobj;
> + kobj->kset = pdev->msi_kset;
> + pci_dev_get(pdev);
> + ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL,
> + "%u", entry->irq);
> + if (ret)
> + goto out_unroll;
> +
> + count++;
> + }
> +
> + return 0;
> +
> +out_unroll:
> + list_for_each_entry(entry, &pdev->msi_list, list) {
> + if (!count)
> + break;
> + kobject_del(&entry->kobj);
> + kobject_put(&entry->kobj);
> + count--;
> + }
> + return ret;
> +}
> +
> /**
> * msi_capability_init - configure device's MSI capability structure
> * @dev: pointer to the pci_dev data structure of MSI device function
> @@ -453,6 +547,13 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
> return ret;
> }
>
> + ret = populate_msi_sysfs(dev);
> + if (ret) {
> + msi_mask_irq(entry, mask, ~mask);
> + free_msi_irqs(dev);
> + return ret;
> + }
> +

That is rather draconian way of doing it. I mean if the SysFS entries
can't be created, then abonden the whole thing? Why not just WARN
and continue on without creating the SysFS entries?

> /* Set MSI enabled bits */
> pci_intx_for_msi(dev, 0);
> msi_set_enable(dev, pos, 1);
> @@ -573,6 +674,12 @@ static int msix_capability_init(struct pci_dev *dev,
>
> msix_program_entries(dev, entries);
>
> + ret = populate_msi_sysfs(dev);
> + if (ret) {
> + ret = 0;

Why the reset to zero?

> + goto error;
> + }
> +
> /* Set MSI-X enabled bits and unmask the function */
> pci_intx_for_msi(dev, 0);
> dev->msix_enabled = 1;
> @@ -731,6 +838,8 @@ void pci_disable_msi(struct pci_dev *dev)
>
> pci_msi_shutdown(dev);
> free_msi_irqs(dev);
> + kset_unregister(dev->msi_kset);
> + dev->msi_kset = NULL;
> }
> EXPORT_SYMBOL(pci_disable_msi);
>
> @@ -829,6 +938,8 @@ void pci_disable_msix(struct pci_dev *dev)
>
> pci_msix_shutdown(dev);
> free_msi_irqs(dev);
> + kset_unregister(dev->msi_kset);
> + dev->msi_kset = NULL;
> }
> EXPORT_SYMBOL(pci_disable_msix);
>
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 05acced..ce93a34 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -1,6 +1,7 @@
> #ifndef LINUX_MSI_H
> #define LINUX_MSI_H
>
> +#include <linux/kobject.h>
> #include <linux/list.h>
>
> struct msi_msg {
> @@ -44,6 +45,8 @@ struct msi_desc {
>
> /* Last set MSI message */
> struct msi_msg msg;
> +
> + struct kobject kobj;
> };
>
> /*
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f27893b..fff3961 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -332,6 +332,7 @@ struct pci_dev {
> struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
> #ifdef CONFIG_PCI_MSI
> struct list_head msi_list;
> + struct kset *msi_kset;

Probably should be guarded by CONFIG_SYSFS

> #endif
> struct pci_vpd *vpd;
> #ifdef CONFIG_PCI_IOV
> --
> 1.7.6.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/