Re: Plan for /dev/ioasid RFC v2

From: Jason Gunthorpe
Date: Mon Jun 14 2021 - 15:40:56 EST


On Mon, Jun 14, 2021 at 10:28:14AM -0600, Alex Williamson wrote:
> > To my mind it is these three things:
> >
> > 1. The device can only do DMA to memory put into its security context
>
> System memory or device memory, yes.
>
> Corollary: The IOMMU group defines the minimum set of devices where the
> IOMMU can control inter-device DMA.

Inter-device DMA is #2:

> > 2. No other security context can control this device
> > 3. No other security context can do DMA to my userspace memory
>
> Rule #1 is essentially the golden rule, the rest falls out from it.

But we can agree we can use these three principals to evaluate
any design? It is useful to split up the 'derived' ideas to make it
easier to understand.

> > Today in VFIO the security context is the group fd. I would like the
> > security context to be the iommu fd.
>
> The vfio group is simply a representation of the IOMMU group, which is
> the minimum isolation granularity. The group is therefore the minimum
> security context, but itself is not a security context. The overall
> security context for vfio is the set of containers (IOMMU contexts)
> owned by a user, where each container defines the IOMMU context for a
> set of groups. The user can map process and device memory between
> containers within the same security context.

A "security context" needs to be a concrete thing. It should be
something a process has some control over, in these models here all
security contexts are FDs.

If it isn't the group FD then all that is left is the container FD?

> As you know, we have various issues with invalidation of device
> mappings between containers, so simplifying the security context to the
> ioasidfd seems like a good plan. The vfio notion of a container is
> already encompassed in the IOASID of the ioasidfd.

Yes

> The significant difference is therefore the device level IOASID versus
> vfio's group level container granularity. This means the IOASID model
> needs to incorporate the group model not only in terms of isolation,
> but also address-ability.

Yes, we need a definition for what groups mean in this world. Groups
no longer mean a single IOASID for every device in the group.

> > 1 is achieved by ensuring the device is always connected to an
>
> s/device/group/

We have been talking about an iommu world where each device in a group
can have its own IOASIDs, so it no longer makes sense to talk about
groups as an assignable unit. Yes there is that degenerate case where
all devices in a group must have the same IOASID, but in general we
should be talkin about devices, not groups, being assigned IOASIDs.

> As you note in reply to Kevin, in a multi-device group rule #1 can be
> violated if only one device is connected to an IOASID.

I assume that all devices under VFIO control are connected to safe
IOASIDs. A safe IOASID is one that blocks all DMA, or one that is
from the same iommu_fd. A device under VFIO control should not be
pointed at some kernel-owned IOASID that is DMA capable. It is a
change from today.

Connecting two devices in the same group to IOASIDs in different
iommu_fd's would be blocked by the kernel, preventing #1.

> > IOASID. Today the group fd requires an IOASID before it hands out a
> > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it
> > has a blocked DMA IOASID and is successefully joined to an iommu_fd.
>
> Which is the root of my concern. Who owns ioctls to the device fd?
> It's my understanding this is a vfio provided file descriptor and it's
> therefore vfio's responsibility.

Yes, VFIO

> A device-level IOASID interface therefore requires that vfio manage
> the group aspect of device access.

I envision it as some kernel call that vfio will do as part of the
bind ioctl:

iommu_fd_bind_device(vfio_dev->dev, iommu_fd, ...);

If everything is secure it succeeds and VFIO can allow this FD to
operate and process the rest of the ioctls. The new iommu_fd would
make the security decision. The security decision would look at groups
internally.

Three emails ago I outlined what I thought the logic of this function
should look like

> AFAICT, that means that device access can therefore only begin when
> all devices for a given group are attached to the IOASID and must
> halt for all devices in the group if any device is ever detached
> from an IOASID, even temporarily.

Which rule is broken if one device is attached and the other device is
left with no working device_fd?

No working device_fd means no mmap, no MMIO access, no DMA control of
the device. It is very similar to not doing the group_fd IOCTL to get
a device_fd in the first place.

Remember the IOASID for the unused devices will be pointing at
something safe.

> > 2 is achieved by ensuring that two security contexts can't open
> > devices in the same group. Today the group fd deals with this by being
> > single open. With iommu_fd the kenerl would not permit splitting
> > groups between iommu_fds.
>
> "Who" within the kernel? Is it the IOASID code itself or is this
> another responsibility of vfio?

ioasid code. The iommu_fd_bind_device() would keep track of the single
iommu_fd that is allowed to use devices in this group.

> If IOASID knows about groups for this, it's not clear to me why we
> have a device-level bind interface. A group-level bind interface
> clearly makes this more explicit.

It does make it more explicit, but at the cost of introducing another
additional userspace object to manage - we still have to make the
whole API work on a per-device basis. Basically, adding the group
introduces a complexity, I would like us to all agree we need the
group and what exactly it is doing before we do that.

> > 3 is achieved today by the group_fd enforcing a single IOASID on all
> > devices. Under iommu_fd all devices in the group can use any IOASID in
> > their iommu_fd security domain.
>
> As above, while the group is the minimum "security context" for vfio,
> the overall security context is much more broad.

I don't understand this comment, can you describe what scenario would
be causing a problem?

> > It is a slightly different model than VFIO uses, but I don't think it
> > provides less isolation.
>
> I can be done correctly, but if IOASID isn't willing to take on
> responsibility of managing isolation of the group, then it implies a
> non-trivial degree of management by users like vfio to make sure
> userspace access is and remains secure.

I think it is important that the ioasid side do this - otherwise it
feels incomplete to me. VFIO handling it means that logic won't work
for other non-VFIO users, which feels wrong - even if those cases
probably have 1:1 device/group ratio.

> > > For example, is it VFIO's job to BIND every device in the group?
> >
> > I'm thinking no
>
> Then who? Userspace? IOASID?

Userspace would bind the device it wants

> > > Does binding the device represent the point at which the IOASID
> > > takes responsibility for the isolation of the device?
> >
> > Following Kevin's language BIND is when the device_fd and iommu_fd are
> > connected. That is when I see the device as becoming usable. Whatever
> > security/isolation requirements we decide should be met here
>
> If device access is usable after a BIND, then that suggests the IOASID
> must be managing the group. So why then do we have a device interface
> for BIND rather than a group interface?

This would be the only place the group would be used in the iommu_fd
API - and it is conveying redundant information - so do we need it?

> For example, given a group with devices A and B, the user performs a
> BIND of deviceA_fd through vfio and now has access to device A. The
> user then performs BIND of deviceB_fd through vfio and has access to
> device B.

Bind B would fail, iommu_fd_bind_device() will fail because a group
can only have devices in a single iommu_fd.

> So continuing the above example, releasing deviceA_fd does what at the
> group level? What if device A and B are DMA aliases of each other?
> How does the group remain secure relative to userspace access via
> device B?

It is very similar to today, if you close deviceA_fd the only way to
re-open it is is via the existing group_fd. It remains parked while
closed.

With iommufd If you close a deviceA_fd then it cannot be operated
until it is re-bound to the same iommu_fd that the other group members
are in. It remains parked, including with an IOASID that is either
block DMA, or an IOASID from the iommu_fd that is operating the other
devices - same as today.

> > Once the device is back to blocked DMA there is no further need for
> > the iommu_fd to touch it.
>
> Blocked by whom? An IOMMU group assumes we cannot block DMA between
> devices within the same group.

You asked if the iommu_fd needs to change things about the device -
the answer is no. Once the device's IOASID is set to 'block dma' there
is no further actions that can be done to it.

I'm still not seeing your objection concretely, sorry

Jason