RE: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
From: Michael Kelley
Date: Wed Apr 09 2025 - 21:16:43 EST
From: Dan Williams <dan.j.williams@xxxxxxxxx> Sent: Wednesday, April 9, 2025 4:30 PM
>
> [ add linux-coco ]
>
> Roman Kisel wrote:
> >
> >
> > On 4/9/2025 9:03 AM, Robin Murphy wrote:
> > > On 2025-04-09 1:08 am, Roman Kisel wrote:
> > >> Bounce-buffering makes the system spend more time copying
> > >> I/O data. When the I/O transaction take place between
> > >> a confidential and a non-confidential endpoints, there is
> > >> no other way around.
> > >>
> > >> Introduce a device bitfield to indicate that the device
> > >> doesn't need to perform bounce buffering. The capable
> > >> device may employ it to save on copying data around.
> > >
> > > It's not so much about bounce buffering, it's more fundamentally about
> > > whether the device is trusted and able to access private memory at all
> > > or not. And performance is hardly the biggest concern either - if you do
> > > trust a device to operate on confidential data in private memory, then
> > > surely it is crucial to actively *prevent* that data ever getting into
> > > shared SWIOTLB pages where anyone else could also get at it. At worst
> > > that means CoCo VMs might need an *additional* non-shared SWIOTLB to
> > > support trusted devices with addressing limitations (and/or
> > > "swiotlb=force" debugging, potentially).
> >
> > Thanks, I should've highlighted that facet most certainly!
>
> One would hope that no one is building a modern device with trusted I/O
> capability, *and* with a swiotlb addressing dependency. However, I agree
> that a non-shared swiotlb would be needed in such a scenario.
>
> Otherwise the policy around "a device should not even be allowed to
> bounce buffer any private page" is a userspace responsibility to either
> not load the driver, not release secrets to this CVM, or otherwise make
> sure the device is only ever bounce buffering private memory that does
> not contain secrets.
>
> > > Also whatever we do for this really wants to tie in with the nascent
> > > TDISP stuff as well, since we definitely don't want to end up with more
> > > than one notion of whether a device is in a trusted/locked/private/etc.
> > > vs. unlocked/shared/etc. state with respect to DMA (or indeed anything
> > > else if we can avoid it).
> >
> > Wouldn't TDISP be per-device as well? In which case, a flag would be
> > needed just as being added in this patch.
> >
> > Although, there must be a difference between a device with TDISP where
> > the flag would be the indication of the feature, and this code where the
> > driver may flip that back and forth...
> >
> > Do you feel this is shoehorned in `struct device`? I couldn't find an
> > appropriate private (== opaque pointer) part in the structure to store
> > that bit (`struct device_private` wouldn't fit the bill) and looked like
> > adding it to the struct itself would do no harm. However, my read of the
> > room is that folks see that as dubious :)
> >
> > What would be your opinion on where to store that flag to tie together
> > its usage in the Hyper-V SCSI and not bounce-buffering?
>
> The name and location of a flag bit is not the issue, it is the common
> expectation of how and when that flag is set.
>
> tl;dr Linux likely needs a "private_accepted" flag for devices
>
> Like Christoph said, a driver really has no business opting itself into
> different DMA addressing domains. For TDISP we are also being careful to
> make sure that flipping a device from shared to private is a suitably
> violent event. This is because the Linux DMA layer does not have a
> concept of allowing a device to have mappings from two different
> addressing domains simultaneously.
>
> In the current TDISP proposal, a device starts in shared mode and only
> after validating all of the launch state of the CVM, device
> measurements, and a device interface report is it granted access to
> private memory. Without dumping a bunch of golden measurement data into
> the kernel that validation can really only be performed by userspace.
>
> Enter this vmbus proposal that wants to emulate devices with a paravisor
> that is presumably within the TCB at launch, but the kernel can not
> really trust that until a "launch state of the CVM + paravisor"
> attestation event.
>
> Like PCIe TDISP the capability of this device to access private memory
> is a property of the bus and the iommu. However, acceptance of the
> device into private operation is a willful policy action. It needs to
> validate not only the device provenance and state, but also the Linux
> DMA layer requirements of not holding shared or swiotlb mappings over
> the "entry into private mode operation" event.
To flesh this out the swiotlb aspect a bit, once a TDISP device has
gone private, how does it prevent the DMA layer from ever doing
bounce buffering through the swiotlb? My understanding is that
the DMA layer doesn't make any promises to not do bounce buffering.
Given the vagaries of memory alignment, perhaps add in a virtual
IOMMU, etc., it seems like a device driver can't necessarily predict
what DMA operations might result in bounce buffering. Does TDISP
anticipate needing a formal way to tell the DMA layer "don't bounce
buffer"? (and return an error instead?) Or would there be a separate
swiotlb memory pool that is private memory so that bounce buffer
could be done when necessary and still maintain confidentiality?
Just wondering if there's any thinking on this topic ...
Thanks,
Michael