Re: [RFC] Device isolation infrastructure v2

From: Alex Williamson
Date: Thu Dec 15 2011 - 23:49:24 EST

Previous message: Rik van Riel: "Re: [PATCH 11/11] mm: Isolate pages for immediate reclaim on theirown LRU"
In reply to: David Gibson: "Re: [RFC] Device isolation infrastructure v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2011-12-16 at 12:40 +1100, David Gibson wrote:
> On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> > On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > > Here's the second spin of my preferred approach to handling grouping
> > > of devices for safe assignment to guests.
> > >
> > > Changes since v1:
> > > * Many name changes and file moves for improved consistency
> > > * Bugfixes and cleanups
> > > * The interface to the next layer up is considerably fleshed out,
> > > although it still needs work.
> > > * Example initialization of groups for p5ioc2 and p7ioc.
> > >
> > > TODO:
> > > * Need sample initialization of groups for intel and/or amd iommus
> >
> > I think this very well might imposed significant bloat for those
> > implementations. On POWER you typically don't have singleton groups,
> > while it's the norm on x86. I don't know that either intel or amd iommu
> > code have existing structures that they can simply tack the group
> > pointer to.
>
> Actually, I think they can probably just use the group pointer in the
> struct device. Each PCI function will typically allocate a new group
> and put the pointer in the struct device and no-where else. Devices
> hidden under bridges copy the pointer from the bridge parent instead.
> I will have to check the unplug path to ensure we can manage the group
> lifetime properly, of course.
>
> > Again, this is one of the reasons that I think the current
> > vfio implementation is the right starting point. We keep groups within
> > vfio, imposing zero overhead for systems not making use of it and only
> > require iommu drivers to implement a trivial function to opt-in. As we
> > start to make groups more pervasive in the dma layer, independent of
> > userspace driver exposure, we can offload pieces to the core. Starting
> > with it in the core and hand waving some future use that we don't plan
> > to implement right now seems like the wrong direction.
>
> Well, I think we must agree to disagree here; I think treating groups
> as identifiable objects is worthwhile. That said, I am looking for
> ways to whittle down the overhead when they're not in use.
>
> > > * Use of sysfs attributes to control group permission is probably a
> > > mistake. Although it seems a bit odd, registering a chardev for
> > > each group is probably better, because perms can be set from udev
> > > rules, just like everything else.
> >
> > I agree, this is a horrible mistake. Reinventing file permissions via
> > sysfs is bound to be broken and doesn't account for selinux permissions.
> > Again, I know you don't like aspects of the vfio group management, but
> > it gets this right imho.
>
> Yeah. I came up with this because I was trying to avoid registering a
> device whose only purpose was to act as a permissioned "handle" on the
> group. But it is a better approach, despite that. I just wanted to
> send out the new patches for comment without waiting to do that
> rework.

So we end up with a chardev created by the core, whose only purpose is
setting the group access permissions for userspace usage, which only
becomes useful with something like vfio? It's an odd conflict that
isolation groups would get involved with userspace permissions to access
the group, but leave enforcement of isolation via iommu groups to the
"binder" driver (where it seems like vfio is still going to need some
kind of merge interface to share a domain between isolation groups).

Is this same chardev going to be a generic conduit for
read/write/mmap/ioctl to the "binder" driver or does vfio need to create
it's own chardev for that? In the former case, are we ok with a chardev
that has an entirely modular API behind it, or maybe you're planning to
define some basic API infrastructure, in which case this starts smelling
like implementing vfio in the core. In the latter case (isolation
chardev + vfio chardev) coordinating permissions sounds like a mess.

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Previous message: Rik van Riel: "Re: [PATCH 11/11] mm: Isolate pages for immediate reclaim on theirown LRU"
In reply to: David Gibson: "Re: [RFC] Device isolation infrastructure v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]