[PATCH v2] resource: limit request_free_mem_region based on arch_get_mappable_range

From: D Scott Phillips
Date: Mon Jul 08 2024 - 20:28:17 EST


On arm64 prior to commit 32697ff38287 ("arm64: vmemmap: Avoid base2 order
of struct page size to dimension region"), the amdgpu driver could trip
over the warning of:

`WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`

in vmemmap_populate()[1]. After that commit, it becomes a translation fault
and panic[2].

The cause is that the amdgpu driver allocates some unused space from
iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
devm_memremap_pages() it. An address above those backed by the arm64
vmemmap is picked.

Limit request_free_mem_region() so that only addresses within the
arch_get_mappable_range() can be chosen as device private addresses.

[1]: Call trace:
vmemmap_populate+0x30/0x48
__populate_section_memmap+0x40/0x90
sparse_add_section+0xfc/0x3e8
__add_pages+0xb4/0x168
pagemap_range+0x300/0x410
memremap_pages+0x184/0x2d8
devm_memremap_pages+0x30/0x90
kgd2kfd_init_zone_device+0xe0/0x1f0 [amdgpu]
amdgpu_device_ip_init+0x674/0x888 [amdgpu]
amdgpu_device_init+0x7bc/0xed8 [amdgpu]
amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
amdgpu_pci_probe+0x194/0x580 [amdgpu]
local_pci_probe+0x48/0xb8
work_for_cpu_fn+0x24/0x40
process_one_work+0x170/0x3e0
worker_thread+0x2ac/0x3e0
kthread+0xf4/0x108
ret_from_fork+0x10/0x20

[2]: Unable to handle kernel paging request at virtual address
000001ffa6000034
Mem abort info:
ESR = 0x0000000096000044
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
CM = 0, WnR = 1, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000
[000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000
Call trace:
__init_zone_device_page.constprop.0+0x2c/0xa8
memmap_init_zone_device+0xf0/0x210
pagemap_range+0x1e0/0x410
memremap_pages+0x18c/0x2e0
devm_memremap_pages+0x30/0x90
kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu]
amdgpu_device_ip_init+0x674/0x888 [amdgpu]
amdgpu_device_init+0x7a4/0xea0 [amdgpu]
amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
amdgpu_pci_probe+0x1a0/0x560 [amdgpu]
local_pci_probe+0x48/0xb8
work_for_cpu_fn+0x24/0x40
process_one_work+0x170/0x3e0
worker_thread+0x2ac/0x3e0
kthread+0xf4/0x108
ret_from_fork+0x10/0x20

Signed-off-by: D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
---
Link to v1: https://lore.kernel.org/all/20240703210707.1986816-1-scott@xxxxxxxxxxxxxxxxxxxxxx/
Changes since v1:
- Change from fiddling the architecture's MAX_PHYSMEM_BITS to checking
arch_get_mappable_range().

kernel/resource.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index fcbca39dbc450..6f256aa0191b4 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1832,25 +1832,28 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
if (flags & GFR_DESCENDING) {
resource_size_t end;

- end = min_t(resource_size_t, base->end,
+ end = min3(base->end, arch_get_mappable_range().end,
(1ULL << MAX_PHYSMEM_BITS) - 1);
return end - size + 1;
}

- return ALIGN(base->start, align);
+ return ALIGN(max_t(resource_size_t, base->start,
+ arch_get_mappable_range().start), align);
}

static bool gfr_continue(struct resource *base, resource_size_t addr,
resource_size_t size, unsigned long flags)
{
+
if (flags & GFR_DESCENDING)
- return addr > size && addr >= base->start;
+ return addr > size && addr >= base->start &&
+ addr >= arch_get_mappable_range().start;
/*
* In the ascend case be careful that the last increment by
* @size did not wrap 0.
*/
return addr > addr - size &&
- addr <= min_t(resource_size_t, base->end,
+ addr <= min3(base->end, arch_get_mappable_range().end,
(1ULL << MAX_PHYSMEM_BITS) - 1);
}

--
2.45.2