Re: [BUG 5.14] arm64/mm: dma memory mapping fails (in some cases)

From: David Hildenbrand
Date: Tue Aug 24 2021 - 14:59:31 EST


On 24.08.21 20:46, Robin Murphy wrote:
On 2021-08-24 19:28, Mike Rapoport wrote:
On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
Hi Alex,

Thanks for the report.

On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
it seems there is a regression in arm64 memory mapping in 5.14, since it
fails on Rockchip RK3328 when the pl330 dmac tries to map with:

[��� 8.921909] ------------[ cut here ]------------
[��� 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
[��� 8.921973] Modules linked in: spi_rockchip(+) fuse
[��� 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
[��� 8.922004] Hardware name: Pine64 Rock64 (DT)
[��� 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[��� 8.922018] pc : dma_map_resource+0x68/0xc0
[��� 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
[��� 8.922040] sp : ffff800012102ae0
[��� 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
[��� 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
[��� 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
[��� 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
[��� 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[��� 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
[��� 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
[��� 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
[��� 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
[��� 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
[��� 8.922158] Call trace:
[��� 8.922163]� dma_map_resource+0x68/0xc0
[��� 8.922173]� pl330_prep_slave_sg+0x58/0x220
[��� 8.922181]� rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
[��� 8.922208]� rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
[...]
Note: This does not relate to the spi driver - when disabling this device in
the device tree it fails for any other (i2s, for instance) which uses dma.
Commenting out the failing check at [1], however, helps and the mapping
works again.

Do you know which address dma_map_resource() is trying to map (maybe
add some printk())? It's not supposed to map RAM, hence the warning.
Random guess, the address is 0xff190800 (based on the x1 above but the
regs might as well be mangled).

0xff190800 will cause this warning for sure. It has a memory map, but it is
not RAM so old version of pfn_valid() would return 0 and the new one
returns 1.

How does that happen, though? It's not a memory address, and it's not
even within the bounds of anywhere there should or could be memory. This
SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
the DRAM controller (which may not all be populated, and may have pieces
carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
do we have pages (or at least the assumption of pages) for somewhere
which by all rights should not have them?

Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid any such hacks. If there is a memory hole, it gets a memmap as well.

Tricking pfn_valid() into returning "false" where we actually have a memmap only makes it look like there is no memmap; but there is one, and
it's PG_reserved.

[...]

Either pfn_valid() gets confused in 5.14 or something is wrong with the
DT. I have a suspicion it's the former since reverting the above commit
makes it disappear.

I think pfn_valid() actually behaves as expected but the caller is wrong
because pfn_valid != RAM (this applies btw to !arm64 as well).

/* Don't allow RAM to be mapped */
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;

Alex, can you please try this patch:

That will certainly paper over the issue, but it's avoiding the question
of what went wrong with the memory map in the first place. The comment
is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
addresses that would be wrong to pass to dma_map_page(), so I believe
pfn_valid() is still the correct check.

If we want to check for RAM, pfn_valid() would be wrong. If we want to check for "is there a memmap, for whatever lives or does not live there", pfn_valid() is the right check.


--
Thanks,

David / dhildenb