Re: Plan for /dev/ioasid RFC v2

From: Alex Williamson
Date: Tue Jun 15 2021 - 12:12:24 EST


On Tue, 15 Jun 2021 02:31:39 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Tuesday, June 15, 2021 12:28 AM
> >
> [...]
> > > IOASID. Today the group fd requires an IOASID before it hands out a
> > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it
> > > has a blocked DMA IOASID and is successefully joined to an iommu_fd.
> >
> > Which is the root of my concern. Who owns ioctls to the device fd?
> > It's my understanding this is a vfio provided file descriptor and it's
> > therefore vfio's responsibility. A device-level IOASID interface
> > therefore requires that vfio manage the group aspect of device access.
> > AFAICT, that means that device access can therefore only begin when all
> > devices for a given group are attached to the IOASID and must halt for
> > all devices in the group if any device is ever detached from an IOASID,
> > even temporarily. That suggests a lot more oversight of the IOASIDs by
> > vfio than I'd prefer.
> >
>
> This is possibly the point that is worthy of more clarification and
> alignment, as it sounds like the root of controversy here.
>
> I feel the goal of vfio group management is more about ownership, i.e.
> all devices within a group must be assigned to a single user. Following
> the three rules defined by Jason, what we really care is whether a group
> of devices can be isolated from the rest of the world, i.e. no access to
> memory/device outside of its security context and no access to its
> security context from devices outside of this group. This can be achieved
> as long as every device in the group is either in block-DMA state when
> it's not attached to any security context or attached to an IOASID context
> in IOMMU fd.
>
> As long as group-level isolation is satisfied, how devices within a group
> are further managed is decided by the user (unattached, all attached to
> same IOASID, attached to different IOASIDs) as long as the user
> understands the implication of lacking of isolation within the group. This
> is what a device-centric model comes to play. Misconfiguration just hurts
> the user itself.
>
> If this rationale can be agreed, then I didn't see the point of having VFIO
> to mandate all devices in the group must be attached/detached in
> lockstep.

In theory this sounds great, but there are still too many assumptions
and too much hand waving about where isolation occurs for me to feel
like I really have the complete picture. So let's walk through some
examples. Please fill in and correct where I'm wrong.

1) A dual-function PCIe e1000e NIC where the functions are grouped
together due to ACS isolation issues.

a) Initial state: functions 0 & 1 are both bound to e1000e driver.

b) Admin uses driverctl to bind function 1 to vfio-pci, creating
vfio device file, which is chmod'd to grant to a user.

c) User opens vfio function 1 device file and an iommu_fd, binds
device_fd to iommu_fd.

Does this succeed?
- if no, specifically where does it fail?
- if yes, vfio can now allow access to the device?

d) Repeat b) for function 0.

e) Repeat c), still using function 1, is it different? Where? Why?

2) The same NIC as 1)

a) Initial state: functions 0 & 1 bound to vfio-pci, vfio device
files granted to user, user has bound both device_fds to the same
iommu_fd.

AIUI, even though not bound to an IOASID, vfio can now enable access
through the device_fds, right? What specific entity has placed these
devices into a block DMA state, when, and how?

b) Both devices are attached to the same IOASID.

Are we assuming that each device was atomically moved to the new
IOMMU context by the IOASID code? What if the IOMMU cannot change
the domain atomically?

c) The device_fd for function 1 is detached from the IOASID.

Are we assuming the reverse of b) performed by the IOASID code?

d) The device_fd for function 1 is unbound from the iommu_fd.

Does this succeed?
- if yes, what is the resulting IOMMU context of the device and
who owns it?
- if no, well, that results in numerous tear-down issues.

e) Function 1 is unbound from vfio-pci.

Does this work or is it blocked? If blocked, by what entity
specifically?

f) Function 1 is bound to e1000e driver.

We clearly have a violation here, specifically where and by who in
this path should have prevented us from getting here or who pushes
the BUG_ON to abort this?

3) A dual-function conventional PCI e1000 NIC where the functions are
grouped together due to shared RID.

a) Repeat 2.a) and 2.b) such that we have a valid, user accessible
devices in the same IOMMU context.

b) Function 1 is detached from the IOASID.

I think function 1 cannot be placed into a different IOMMU context
here, does the detach work? What's the IOMMU context now?

c) A new IOASID is alloc'd within the existing iommu_fd and function
1 is attached to the new IOASID.

Where, how, by whom does this fail?

If vfio gets to offload all of it's group management to IOASID code,
that's great, but I'm afraid that IOASID is so focused on a
device-level API that we're instead just ignoring the group dynamics
and vfio will be forced to provide oversight to maintain secure
userspace access. Thanks,

Alex