Re: [PATCH v6 00/11] DDW + Indirect Mapping

From: David Christensen
Date: Tue Aug 31 2021 - 16:39:54 EST




On 8/31/21 1:18 PM, Leonardo Brás wrote:
Hello David,

Sorry for the delay, I did not get your mail because I was not CC'd
in your reply (you sent the mail just to the mailing list).

Replies bellow:

On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote:
On 8/16/21 11:39 PM, Leonardo Bras wrote:
So far it's assumed possible to map the guest RAM 1:1 to the bus,
which
works with a small number of devices. SRIOV changes it as the user
can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW
creation
is skipped and the default DMA window "ibm,dma-window" is used
instead.

Using the DDW instead of the default DMA window may allow to expand
the
amount of memory that can be DMA-mapped, given the number of pages
(TCEs)
may stay the same (or increase) and the default DMA window offers
only
4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).

So if I'm reading this correctly, VFIO applications requiring hugepage
DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR
after this change, is that correct?

Different DDW IOMMU page sizes were already supported in Linux (4k,
64k, 16M) for a while now, and the remaining page sizes in LoPAR were
enabled in the following patch:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobras.c@xxxxxxxxx/
(commit 472724111f0f72042deb6a9dcee9578e5398a1a1)

The thing is there are two ways of using DMA:
- Direct DMA, mapping the whole memory space of the host, which
requires a lot of DMA space if the guest memory is huge. This already
supports DDW and allows using the bigger pagesizes.
This happens on device/bus probe.

- Indirect DMA with IOMMU, mapping memory regions on demand, and un-
mapping after use. This requires much less DMA space, but causes an
overhead because an hcall is necessary for mapping and un-mapping.
Before this series, Indirect DMA was only possible with the 'default
DMA window' which allows using only 4k pages.

This series allow Indirect DMA using DDW when available, which usually
means bigger pagesizes and more TCEs, and so more DMA space.

How is the mapping method selected? LPAR creation via the HMC, Linux kernel load parameter, or some other method?

The hcall overhead doesn't seem too worrisome when mapping 1GB pages so the Indirect DMA method might be best in my situation (DPDK).

Dave