Re: [PATCH v3 1/1] Documentation/core-api: Add swiotlb documentation

From: Petr Tesařík
Date: Tue Apr 30 2024 - 16:09:37 EST


On Tue, 30 Apr 2024 15:48:42 +0000
Michael Kelley <mhklinux@xxxxxxxxxxx> wrote:

> From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Tuesday, April 30, 2024 4:24 AM
> > >
> > > +Usage Scenarios
> > > +---------------
> > > +swiotlb was originally created to handle DMA for devices with addressing
> > > +limitations. As physical memory sizes grew beyond 4 GiB, some devices could
> > > +only provide 32-bit DMA addresses. By allocating bounce buffer memory below
> > > +the 4 GiB line, these devices with addressing limitations could still work and
> > > +do DMA.
> >
> > IIRC the origins are even older and bounce buffers were used to
> > overcome the design flaws inherited all the way from the original IBM
> > PC. These computers used an Intel 8237 for DMA. This chip has a 16-bit
> > address register, but even the early 8088 CPUs had a 20-bit bus. So IBM
> > added a separate 74LS670 4-by-4 register file chip to provide the high 4
> > bits for each of the 4 DMA channels. As a side note, these bits were not
> > updated when the 8237 address register was incrementing from 0xffff, so
> > DMA would overflow at every 64K address boundary. PC AT then replaced
> > these 4 bits with an 8-bit DMA "page" register to match the 24-bit
> > address bus of an 80286. This design was not changed for 32-bit CPUs
> > (i.e. 80386).
> >
> > In short, bounce buffers were not introduced because of 64-bit CPUs.
> > They were already needed on 386 systems.
> >
> > OTOH this part of the history need not be mentioned in the
> > documentation (unless you WANT it).
>
> I knew there was some even earlier history, but I didn't know the
> details. :-( I'll add some qualifying wording about there being multiple
> DMA addressing limitations during the history of the x86 PCs, with
> the 32-bit addressing as a more recent example. But I won't try to
> cover the details of what you describe.

Yes, this sounds like a good level of detail.

> > > +
> > > +More recently, Confidential Computing (CoCo) VMs have the guest VM's memory
> > > +encrypted by default, and the memory is not accessible by the host hypervisor
> > > +and VMM. For the host to do I/O on behalf of the guest, the I/O must be
> > > +directed to guest memory that is unencrypted. CoCo VMs set a kernel-wide option
> > > +to force all DMA I/O to use bounce buffers, and the bounce buffer memory is set
> > > +up as unencrypted. The host does DMA I/O to/from the bounce buffer memory, and
> > > +the Linux kernel DMA layer does "sync" operations to cause the CPU to copy the
> > > +data to/from the original target memory buffer. The CPU copying bridges between
> > > +the unencrypted and the encrypted memory. This use of bounce buffers allows
> > > +existing device drivers to "just work" in a CoCo VM, with no modifications
> > > +needed to handle the memory encryption complexity.
> >
> > This part might be misleading. It sounds as if SWIOTLB would not be
> > needed if drivers were smarter.
>
> I'm not sure I understand the point you are making. It is possible for a
> driver to do its own manual bounce buffering to handle encrypted memory.
> For example, in adding support for CoCo VMs, we encountered such a
> driver/device with complex DMA and memory requirements that already
> did some manual bounce buffering. When used in a CoCo VM, driver
> modifications were needed to handle encrypted memory, but that was
> the preferred approach because of the pre-existing manual bounce
> buffering. In that case, indeed swiotlb was not needed by that driver/device.
> But in the general case, we don't want to require driver modifications for
> CoCo VMs. swiotlb bounce buffering makes it all work in the exactly the
> situation you describe where the buffer memory could have originated
> in a variety of places.
>
> Could you clarify your point? Or perhaps suggest alternate wording
> that would help avoid any confusion?

Ah, I wasn't aware that some drivers implement their own bounce
buffers. I can't say I'm really happy about it, because such drivers
must inevitably deal with all the complexities like IOMMU, decrypting,
etc. But you're right - bounce buffers can be implemented by individual
drivers.

I have now read the sentence again, and the problematic wording is
"existing device drivers". I know it's not the same as "legacy device
drivers", but it still leaves some doubt whether "future device drivers"
should make use of SWIOTLB. So either remove "existing" or be more
explicit and say "existing and future device drivers". Then it sounds
fine to me.

HTH
Petr T