[PATCHv2] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()

From: Kirill A. Shutemov
Date: Mon Jun 24 2019 - 08:31:51 EST


Kyle has reported that kernel crashes sometimes when it boots in
5-level paging mode with KASLR enabled:

WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea
RIP: 0010:phys_p4d_init+0x1d4/0x1ea
Call Trace:
__kernel_physical_mapping_init+0x10a/0x35c
kernel_physical_mapping_init+0xe/0x10
init_memory_mapping+0x1aa/0x3b0
init_range_memory_mapping+0xc8/0x116
init_mem_mapping+0x225/0x2eb
setup_arch+0x6ff/0xcf5
start_kernel+0x64/0x53b
? copy_bootdata+0x1f/0xce
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x8a/0x8d
secondary_startup_64+0xb6/0xc0

which causes later:

BUG: unable to handle page fault for address: ff484d019580eff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
BAD
Oops: 0000 [#1] SMP NOPTI
RIP: 0010:fill_pud+0x13/0x130
Call Trace:
set_pte_vaddr_p4d+0x2e/0x50
set_pte_vaddr+0x6f/0xb0
__native_set_fixmap+0x28/0x40
native_set_fixmap+0x39/0x70
register_lapic_address+0x49/0xb6
early_acpi_boot_init+0xa5/0xde
setup_arch+0x944/0xcf5
start_kernel+0x64/0x53b

Kyle bisected the issue to commit b569c1843498 ("x86/mm/KASLR: Reduce
randomization granularity for 5-level paging to 1GB")

Before the commit PAGE_OFFSET is always aligned to P4D_SIZE if we boot in
5-level paging mode. But now only PUD_SIZE alignment is guaranteed.

For phys_p4d_init() it means that 'paddr_next' after the first iteration
can belong to the same p4d entry.

In the case I was able to reproduce the 'vaddr' on the first iteration
is 0xff4228027fe00000 and 'paddr' is 0x33fe00000. On the second
iteration 'vaddr' becomes 0xff42287f40000000 and 'paddr' is
0x8000000000. The 'vaddr' in both cases belong to the same p4d entry.

It screws up 'i' count: we miss the last entry in the page table
completely. And it confuses the branch under 'paddr >= paddr_end'
condition: the p4d entry can be cleared where must not to.

The patch makes phys_p4d_init() walk by virtual address space, like
__kernel_physical_mapping_init() does. It makes it work correctly with
phys-virt mismatch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Reported-and-tested-by: Kyle Pelton <kyle.d.pelton@xxxxxxxxx>
Fixes: b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB")
Cc: Baoquan He <bhe@xxxxxxxxxx>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
---
arch/x86/mm/init_64.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 693aaf28d5fe..0f01c7b1d217 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -671,23 +671,25 @@ static unsigned long __meminit
phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
unsigned long page_size_mask, bool init)
{
- unsigned long paddr_next, paddr_last = paddr_end;
- unsigned long vaddr = (unsigned long)__va(paddr);
- int i = p4d_index(vaddr);
+ unsigned long vaddr, vaddr_end, vaddr_next, paddr_next, paddr_last;
+
+ paddr_last = paddr_end;
+ vaddr = (unsigned long)__va(paddr);
+ vaddr_end = (unsigned long)__va(paddr_end);

if (!pgtable_l5_enabled())
return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end,
page_size_mask, init);

- for (; i < PTRS_PER_P4D; i++, paddr = paddr_next) {
- p4d_t *p4d;
+ for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+ p4d_t *p4d = p4d_page + p4d_index(vaddr);
pud_t *pud;

- vaddr = (unsigned long)__va(paddr);
- p4d = p4d_page + p4d_index(vaddr);
- paddr_next = (paddr & P4D_MASK) + P4D_SIZE;
+ vaddr_next = (vaddr & P4D_MASK) + P4D_SIZE;
+ paddr = __pa(vaddr);

if (paddr >= paddr_end) {
+ paddr_next = __pa(vaddr_next);
if (!after_bootmem &&
!e820__mapped_any(paddr & P4D_MASK, paddr_next,
E820_TYPE_RAM) &&
@@ -699,13 +701,13 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,

if (!p4d_none(*p4d)) {
pud = pud_offset(p4d, 0);
- paddr_last = phys_pud_init(pud, paddr, paddr_end,
- page_size_mask, init);
+ paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
+ page_size_mask, init);
continue;
}

pud = alloc_low_page();
- paddr_last = phys_pud_init(pud, paddr, paddr_end,
+ paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
page_size_mask, init);

spin_lock(&init_mm.page_table_lock);
--
2.21.0