Re: [PATCH v3 0/5] enhance DMA CMA on x86

From: konrad wilk
Date: Fri Oct 03 2014 - 12:35:14 EST


On 10/3/2014 12:06 PM, Akinobu Mita wrote:
2014-10-03 23:27 GMT+09:00 Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>:
On 10/02/2014 07:08 PM, Akinobu Mita wrote:
2014-10-03 7:03 GMT+09:00 Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>:
On 10/02/2014 12:41 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Sep 30, 2014 at 09:49:54PM -0400, Peter Hurley wrote:
On 09/30/2014 07:45 PM, Thomas Gleixner wrote:

Which is different than if the plan is to ship production units for x86;
then a general purpose solution will be required.

As to the good design of a general purpose solution for allocating and
mapping huge order pages, you are certainly more qualified to help Akinobu
than I am.

What Akinobu's patches intend to support is:

phys_addr = dma_alloc_coherent(dev, 64 * 1024 * 1024, &bus_addr, GFP_KERNEL);

which raises three issues:

1. Where do coherent blocks of this size come from?
2. How to prevent fragmentation of these reserved blocks over time by
existing DMA users?
3. Is this support generically required across all iommu implementations on x86?

Questions 1 and 2 are non-trivial, in the general case, otherwise the page
allocator would already do this. Simply dropping in the contiguous memory
allocator doesn't work because CMA does not have the same policy and performance
as the page allocator, and is already causing performance regressions even
in the absence of huge page allocations.

Could you take a look at the patches I sent? Can they fix these issues?
https://lkml.org/lkml/2014/9/28/110

With these patches, normal alloc_pages() is used for allocation first
and dma_alloc_from_contiguous() is used as a fallback.

Sure, I can test these patches this weekend.
Where are the unit tests?

Thanks a lot. I would like to know whether the performance regression
you see will disappear or not with these patches as if CONFIG_DMA_CMA is
disabled.

So that's why I raised question 3; is making the necessary compromises to support
64MB coherent DMA allocations across all x86 iommu implementations actually
required?

Prior to Akinobu's patches, the use of CMA by x86 iommu configurations was
designed to be limited to testing configurations, as the introductory
commit states:

commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
Author: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Date: Thu Dec 29 13:09:51 2011 +0100

X86: integrate CMA with DMA-mapping subsystem

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Signed-off-by: Kyungmin Park <kyungmin.park@xxxxxxxxxxx>
CC: Michal Nazarewicz <mina86@xxxxxxxxxx>
Acked-by: Arnd Bergmann <arnd@xxxxxxxx>


Which brings me to my suggestion: if support for huge coherent DMA is
required only for a special test platform, then could not this support
be specific to a new iommu configuration, namely iommu=cma, which would
get initialized much the same way that iommu=calgary is now.

The code for such a iommu configuration would mostly duplicate
arch/x86/kernel/pci-swiotlb.c and the CMA support would get removed from
the other x86 iommu implementations.

I'm not sure I read correctly, though. Can boot option 'cma=0' also
help avoiding CMA from IOMMU implementation?

Maybe, but that's not an appropriate solution for distro kernels.

Nor does this address configurations that want a really large CMA so
1GB huge pages can be allocated (not for DMA though).

Now I see the point of iommu=cma you suggested. But what should we do
when CONFIG_SWIOTLB is disabled, especially for x86_32?
Should we just introduce yet another flag to tell not using DMA_CMA
instead of adding new swiotlb-like iommu implementation?


If you implement an DMA API producer - aka dma_ops (which is what Peter is thinking I believe) it won't matter which IOMMUs / DMA producers are selected right?

Or are you saying that CMA needs SWIOTLB to handle certain type of
pages as a fallback mechanism - and hence there needs to be a tight
relationship?

In which case I would look at making SWIOTLB be more library like - the Xen-SWIOTLB already does that by using certain parts of the SWIOTLB code
which are exposed to the rest of the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/