RE: Plan for /dev/ioasid RFC v2
From: Tian, Kevin
Date: Mon Jun 14 2021 - 23:14:06 EST
> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Monday, June 14, 2021 11:23 AM
>
[...]
> > In the meantime, I'm thinking about another way whether group
> > security can be enforced in the iommu layer to relax the uAPI design.
> > If a device can be always blocked from accessing memory in the
> > IOMMU before it's bound to a driver or more specifically before
> > the driver moves it to a new security context, then there is no need
> > for VFIO to track whether IOASIDfd has taken over ownership of
> > the DMA context for all devices within a group.
>
> But we know we don't have IOMMU level isolation between devices in the
> same group, so I don't see how this helps us.
>
> > But as you said this cannot be achieved via existing default domain
> > approach. So far a device is always attached to a domain:
> >
> > - DOMAIN_IDENTITY: a default domain without DMA protection
> > - DOMAIN_DMA: a default domain with DMA protection via DMA
> > API and iommu core
> > - DOMAIN_UNMANAGED: a driver-created domain which is not
> > managed by iommu core.
> >
> > The special sequence in current vfio group design is to mitigate
> > the 1st case, i.e. if a device is left in passthrough mode before
> > bound to VFIO it's definitely insecure to allow user to access it.
> > Then the sequence ensures that the user access is granted on it
> > only after all devices within a group switch to a security context.
> >
> > Now if the new proposed scheme can be supported, a device
> > is always in a security context (block-dma) before it's switched
> > to a new security context and existing domain types should be
> > applied only in the new context when the device starts to do
> > DMAs. For VFIO case this switch happens explicitly when attaching
> > the device to an IOASID. For kernel driver it's implicit e.g. could
> > happen when the 1st DMA API call is received.
> >
> > If this works I didn't see the need for vfio to keep the sequence.
> > VFIO still keeps group fd to claim ownership of all devices in a
> > group. Once it's done, vfio doesn't need to track the device attach
> > status and user access can be always granted regardless of
> > how the attach status changes. Moving a device from IOASID1
> > to IOASID2 involves detaching from IOASID1 (back to blocked
> > dma context) and then reattaching to IOASID2 (switch to a
> > new security context).
> >
> > Following this direction even IOASIDfd doesn't need to verify
> > the group attach upon such guarantee from the iommu layer.
> > The devices within a group can be in different security contexts,
> > e.g. with some devices attached to GPA IOASID while others not
> > attached. In this way vfio userspace could choose to not attach
> > every device of a group to sustain the current semantics.
>
> It seems like this entirely misses the point of groups with multiple
> devices. If we had IOMMU level isolation between all devices, we'd
> never have multi-device groups. Thanks,
>
If multiple devices in a group are all in a block-DMA state when the
group is attached to vfio, why does vfio need to know whether they
have all been switched to a new security context via IOASIDfd before
it grants user access to a device in a group? Yes, there is no isolation
between devices within a group, but from iommu p.o.v they are all
blocked from touching the rest of the system thus having user access
them won't cause any security problem. Then it's just user's call about
how it tolerates lacking of isolation within that group:
1) User could attach all devices in the group to a single IOASID;
2) User could attach some devices in the group to a single IOASID,
leaving other devices still in block-DMA state;
3) User could attach some devices in the group to IOASID1 and others
to IOASID2, e.g. when the group is created due to !ACS and the two
address spaces are carefully tweaked to not cause undesired p2p
traffic;
In any point in above use cases, the devices within a group are always
in a security context which isolates them from the rest of the system
(though no isolation in-between).
Thanks
Kevin