Re: [PATCH 1/4] iommu: Add iommu_device_group callback andiommu_group sysfs entry

From: Chris Wright
Date: Wed Nov 30 2011 - 21:05:23 EST


* Benjamin Herrenschmidt (benh@xxxxxxxxxxxxxxxxxxx) wrote:
> On Wed, 2011-11-30 at 17:04 -0800, Chris Wright wrote:
> > Heh. Put it another way. Generating the group ID is left up to the
> > IOMMU. This will break down when there's a system with multiple IOMMU's
> > on the same bus_type that don't have any awareness of one another. This
> > is not the case for the existing series and x86 hw.
> >
> > I'm not opposed to doing the allocation and ptr as id (taking care for
> > possibility that PCI hotplug/unplug/replug could reuse the same memory
> > for group id, however). Just pointing out that the current system works
> > as is, and there's some value in it's simplicity (overloading ID ==
> > group structure + pretty printing ID in sysfs, for example).
>
> Well, ID can work even with multiple domains since we have domains
> numbers. bdfn is 16-bit, which leaves 16-bit for the domain number,
> which is sufficient.
>
> So by encoding (domain << 16) | bdfn, we can get away with a 32-bit
> number... it just sucks.

Yup, that's just what Alex did for VT-d ;)

+ union {
+ struct {
+ u8 devfn;
+ u8 bus;
+ u16 segment;
+ } pci;
+ u32 group;
+ } id;

Just that the alias table used for AMD IOMMU to map bdf -> requestor ID
is not multi-segment aware, so the id is only bdf of bridge.

> Note that on pseries, I wouldn't use bdfn anyway, I would use my
> internal "PE#" which is also a number that I can constraint to 16-bits.
>
> So I can work with a number as long as it's at least an unsigned int
> (32-bit), but I think it somewhat sucks, and will impose gratuituous
> number <-> structure conversions all over, but if we keep the whole
> group thing an iommu specific data structure, then let's stick to the
> number and move on with life.
>
> We might get better results if we kept the number as
>
> struct iommu_group_id {
> u16 domain;
> u16 group;
> };
>
> (Or a union of that with an unsigned int)
>
> That way the domain information is available generically (can be match
> with pci_domain_nr() for example), and sysfs can then be layed out as
>
> /sys/bus/pci/groups/<domain>/<id>
>
> Which is nicer than having enormous id's

Seems fine to me (although I missed /sys/bus/pci/groups/ introduction),
except that I think the freescale folks aren't interested in PCI which
is one reason why the thing is just an opaque id.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/