Re: Regression after commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")

From: Matthew Wilcox
Date: Sun Feb 11 2018 - 07:17:45 EST


On Sun, Feb 11, 2018 at 03:28:08AM -0800, Matthew Wilcox wrote:
> Now, longer-term, perhaps we should do the following:
>
> #ifdef CONFIG_ZONE_DMA32
> #define OPT_ZONE_DMA32 ZONE_DMA32
> #elif defined(CONFIG_64BIT)
> #define OPT_ZONE_DMA OPT_ZONE_DMA
> #else
> #define OPT_ZONE_DMA32 ZONE_NORMAL
> #endif
>
> Then we wouldn't need the ifdef here and could always use GFP_DMA32
> | GFP_KERNEL. Would need to audit current users and make sure they
> wouldn't be broken by such a change.

Argh, I forgot to say the most important thing. (For those newly invited
to the party, we're talking about drivers/media, in particular
drivers/media/common/saa7146/saa7146_core.c, functions
saa7146_vmalloc_build_pgtable and vmalloc_to_sg)

I think we're missing a function in our DMA API. These drivers don't
actually need physical memory below the 4GB mark. They need DMA addresses
which are below the 4GB mark. For machines with IOMMUs, this can mean
no restrictions on physical memory. If we don't have an IOMMU, then a
bounce buffer could be used (but would be slow) -- like the swiotlb.
So we should endeavour to allocate memory below the 4GB boundary on
systems with no IOMMU, but can allocate memory anywhere on systems with
an IOMMU.

For consistent / coherent memory, we have an allocation function.
But we don't have an allocation function for streaming memory, which is
what these drivers want. They also flush the DMA memory and then access
the memory through a different virtual mapping, which I'm not sure is
going to work well on virtually-indexed caches like SPARC and PA-RISC
(maybe not MIPS either?)

I think we want something like

struct scatterlist *dma_alloc_sg(struct device *dev, int *nents);
void dma_free_sg(struct device *dev, struct scatterlist *sg, int nents);

That lets individual architectures decide where to allocate, and handle
the tradeoff between allocating below 4GB and using bounce buffers.

I don't have a good answer to synchronising between device-view of
memory and CPU-view-through-vmalloc though. They're already calling
dma_sync_*_for_cpu(); do they need to also call a new vflush(void *p,
unsigned long len) function which can be a no-op on x86 and flushes the
range on SPARC/PA-RISC/... ?